On Continuous-Time Gaussian Channels

A continuous-time white Gaussian channel can be formulated using a white Gaussian noise, and a conventional way for examining such a channel is the sampling approach based on the Shannon–Nyquist sampling theorem, where the original continuous-time channel is converted to an equivalent discrete-time channel, to which a great variety of established tools and methodology can be applied. However, one of the key issues of this scheme is that continuous-time feedback and memory cannot be incorporated into the channel model. It turns out that this issue can be circumvented by considering the Brownian motion formulation of a continuous-time white Gaussian channel. Nevertheless, as opposed to the white Gaussian noise formulation, a link that establishes the information-theoretic connection between a continuous-time channel under the Brownian motion formulation and its discrete-time counterparts has long been missing. This paper is to fill this gap by establishing causality-preserving connections between continuous-time Gaussian feedback/memory channels and their associated discrete-time versions in the forms of sampling and approximation theorems, which we believe will play important roles in the long run for further developing continuous-time information theory. As an immediate application of the approximation theorem, we propose the so-called approximation approach to examine continuous-time white Gaussian channels in the point-to-point or multi-user setting. It turns out that the approximation approach, complemented by relevant tools from stochastic calculus, can enhance our understanding of continuous-time Gaussian channels in terms of giving alternative and strengthened interpretation to some long-held folklore, recovering “long-known” results from new perspectives, and rigorously establishing new results predicted by the intuition that the approximation approach carries. More specifically, using the approximation approach complemented by relevant tools from stochastic calculus, we first derive the capacity regions of continuous-time white Gaussian multiple access channels and broadcast channels, and we then analyze how feedback affects their capacity regions: feedback will increase the capacity regions of some continuous-time white Gaussian broadcast channels and interference channels, while it will not increase capacity regions of continuous-time white Gaussian multiple access channels.


Introduction
Continuous-time Gaussian channels were considered at the very inception of information theory. In his celebrated paper [83] birthing information theory, Shannon studied the following point-to-point continuous-time white Gaussian channels: where X(t) is the channel input with average power limit P , Z(t) is the white Gaussian noise with flat power spectral density 1 and Y (t) are the channel output. Shannon actually only considered the case that the channel has bandwidth limit ω, namely, the channel input X and the noise Z, and therefore the output Y all have bandwidth limit ω (alternatively, as in (9.54) of [17], this can be interpreted as the original channel (1) concatenated with an ideal bandpass filter with bandwidth limit ω). Using the celebrated Shannon-Nyquist sampling theorem [67,84], the continuous-time channel (1) can be equivalently represented by a parallel Gaussian channel: where the noise process {Z (ω) n } is i.i.d. with variance 1 [17]. Regarding the "space" index n as time, the above parallel channel can be interpreted as a discrete-time Gaussian channel associated with the continuous-time channel (1). It is well known from the theory of discretetime Gaussian channels that the capacity of the channel (2) can be computed as Then, the capacity C of the channel (1) can be computed by taking the limit of the above expression as ω tends to infinity: The sampling approach consisting of (1)-(4) as above, which serves as a link between the continuous-time channel (1) and the discrete-time channel (2), typifies a conventional way to examine continuous-time Gaussian channels: convert them into associated discrete-time Gaussian channels, for which we have ample ammunition at hands. Note that when P tends to 0, using the fact that when P is "close" to 0, ω log 1 + P 2ω is "close" to P 2 , one also reaches (4), which roughly explains the following long-held folklore within the information theory community: a continuous-time infinite-bandwidth Gaussian channel without feedback or memory is "equivalent" to a discrete-time Gaussian channel without feedback or memory at low signal-to-noise ratio (SNR).
Moments of reflection, however, reveals that the sampling approach for the channel capacity (with bandwidth limit or not) is heuristic in nature: For one thing, a bandwidth-limited signal cannot be time-limited, which renders it infeasible to define the data transmission rate if assuming a channel has bandwidth limit. In this regard, rigorous treatments coping with this issue and other technicalities can be found in [94,26]; see also [86] for a relevant in-depth discussion. Another issue is that, even disregarding the above technical nuisance arising from the bandwidth limit assumption, the sampling approach only gives a lower bound for the capacity of (1): it shows that P/2 is achievable via a class of special coding schemes, but it is not clear that why transmission rate higher than P/2 cannot be achieved by other coding schemes. The capacity of (1) was rigorously studied in [25,10], and a complete proof establishing P/2 as its de facto capacity can be found in [4,5]. Alternatively, the continuous-time white Gaussian channel (1) can be examined [45] under the Brownian motion formulation: where, slightly abusing the notation, we still use Y (t) to denote the output corresponding to the input X(s), and B(t) denotes the standard Brownian motion (Z(t) can be viewed as a generalized derivative of B(t)); equivalently, the channel (5) can be seen as the original channel (1) concatenated with an integrator circuit. As opposed to white Gaussian noises, which only exist as generalized functions [75], Brownian motions are well-defined stochastic processes and have been extensively studied in probability theory. Here we remark that, via a routine orthonormal decomposition argument, both of the two channels are equivalent to a parallel channel consisting of infinitely many Gaussian sub-channels [6]. An immediate and convenient consequence of such a formulation is that many notions in discrete time, including mutual information and typical sets, carry over to the continuoustime setting, which will rid us of the nuisances arising from the bandwidth limit assumption. Indeed, such a framework yields a fundamental formula for the mutual information of the channel (5) [18,49] and a clean and direct proof [49] that the capacity of (5) is P/2; moreover, as evidenced by numerous results collected in [45] and some recent representative work [90,91] on point-to-point Gaussian channels, the use of Brownian motions elevates the level of rigor of our treatment, and equip us with a wide range of established techniques and tools from stochastic calculus. Here we remark that Girsanov's theorem, one of the most important theorems in stochastic calculus, lays the foundation of our rigourous treatment; for those who are interested in the technical details in our proofs, we refer to [61,45], where Girsanov's theorem (and its numerous variants) and its wide range of applications in information theory are discussed in great details.
Furthermore, as elaborated in Remark 3.8, the Brownian motion formulation is also versatile enough to accommodate feedback and memory; in particular, the point-to-point continuous-time white Gaussian memory/feedback channel can be characterized by the following stochastic differential equation: where g is a function from [0, T ] × C[0, T ] × C[0, T ] to R. Note that (6) can be interpreted 1) either as a feedback channel, where W s 0 can be rewritten as M , interpreted as the message to be transmitted through the channel, and g(s) can be rewritten as X(s), interpreted as the channel input, which depends on M and Y s 0 , the channel output up to time s that is fed back to the sender, 2) or as a memory channel, where W s 0 can rewritten as X s 0 , interpreted as the channel input, g is "part" of the channel, and Y (t), the channel output at time t, depends on X t 0 and Y t 0 , the channel input and output up to time t that are present in the channel as memory, respectively.
Note that, strictly speaking, the third parameter of g in (6) should be Y s− 0 , which, however, can be equivalently replaced by Y s 0 due to the continuity of sample paths of {Y (t)}. Note that, with the presence of feedback/memory, the existence and uniqueness of Y is in fact a tricky mathematical problem, however, we will in this paper simply assume that the input X is appropriately chosen such that Y uniquely exists. For more detailed discussion about the Brownian motion formulation, we refer the reader to [45].
As opposed to the white Gaussian noise formulation, under the Brownian motion formulation, memory and feedback can be naturally translated to the discrete-time setting: the pathwise continuity of a Brownian motion allows the inheritance of temporal causality when the channel is sampled (see Section 2) or approximated (see Section 3). On the other hand, the white Gaussian noise formulation is facing inherent difficulty as far as inheriting temporal causality is concerned: in converting (1) to (2), while X (w) n are obtained as "time" samples of X(t), Z (w) n are in fact "space" samples of Z(t), as they are merely the coefficients of the (extended) Karhunen-Loeve decomposition of Z(t) [27,39,40]; see also [51] for an in-depth discussion on this.
On the other hand though, as opposed to the white Gaussian noise formulation, a link that establishes the information-theoretic connection between the continuous-time channel (6) and its discrete-time counterparts has long been missing, which may explain why discretetime and continuous-time information theory (under the Brownian motion formulation) have largely gone separate ways with little interaction for the past several decades. In this paper, we will fill this gap by establishing causality-preserving connections between the channel (5) and its associated discrete-time versions in the forms of sampling and approximation theorems, which we believe will serve as the above-mentioned missing links and play important roles in the long run for further developing continuous-time information theory, particularly for the communication scenarios when feedback/memory is present.
As an immediate application of the approximation theorem, we propose the approximation approach to examine continuous-time Gaussian feedback channels with the average power constraint and infinite bandwidth (again, by comparison, the conventional sampling approach cannot handle feedback). It turns out that this approach, when complemented by relevant tools from stochastic calculus, can greatly enhance our understanding of continuoustime Gaussian channels in terms of giving alternative and strengthened interpretations to the low SNR equivalence in (A), recovering "long known" results (Theorems 5.1(for the nonfeedback case), 5.4 and 5.7) from new and rigorous perspectives, and deriving new results (Theorems 5.1(for the feedback case), 5.6, 5.9 and 5.10) inspired by the intuition that the approximation approach carries.
Below, we summarize the contributions of this paper in greater details.
In Section 2, we prove Theorems 2.1 and 2.3, sampling theorems for a continuous-time Gaussian feedback/memory channel, which naturally connect such a channel with their sampled discrete-time versions. And in Section 3, we prove Theorems 3.1 and 3.2, the so-called approximation theorems, which connect a continuous-time Gaussian feedback/memory channel with its approximated discrete-time versions (in the sense of the Euler-Maruyama approximation [41]). Roughly speaking, a sampling theorem says that a time-sampled channel is "close" to the original channel if the sampling is fine enough, and an approximation theorem says that an approximated channel is "close" to the original channel if the approximation is fine enough, both in an information-theoretic sense. Note that, as elaborated in Remark 3.7, certain version of the approximation theorem boils down to the sampling theorem when there is no memory and feedback in the channel.
Apparently a sampling theorem, whose spirit is in line with the Shannon-Nyquist sampling theorem, is of practical and theoretical value due to the fact it deals with the "real" values of the channel output; and, as will be elaborated later, approximation theorems seem to be surprisingly useful in a number of respects despite the fact it only deals with the "approximated" values of the channel output: it can certainly provide alternative rigorous tools in translating results from discrete time to continuous time; more importantly, as elaborated in Section 4, it lays the foundation for the approximation approach, which gives us the intuition in the point-to-point continuous-time setting, which will further help us to deliver rigorous treatments of multi-user continuous-time Gaussian channels in Section 5.
More specifically, in Section 5, we derive the capacity regions of a continuous-time white Gaussian multiple access channel (Theorem 5.1), a continuous-time white Gaussian interference channel (Theorem 5.4), and a continuous-time white Gaussian broadcast channel (Theorem 5.7). Here, we note that when there is no feedback, as discussed in Remark 5.2, the results above are "long known" in the sense that they are roughly suggested by the conventional sampling approach, or alternatively, the low SNR equivalence in (A). However, to the best of our knowledge, explicit formulations and statements of such results are missing in the literature and their rigourous proofs are non-trivial (for instance, when establishing Theorem 5.7, we have to resort to the continuous-time I-MMSE relationship [28], which has been established only recently). By comparison, the presence of feedback necessitates the use of the approximation approach, which help us to connect relevant results and proofs in discrete time to analyze how feedback affects the capacity regions of families of continuoustime multi-user one-hop Gaussian channels: feedback will increase the capacity regions of some continuous-time Gaussian broadcast channels (Theorem 5.10) and interference channels (Theorem 5.6), while it will not increase capacity regions of a continuous-time physically degraded Gaussian broadcast channel (Theorem 5.9) and a continuous-time Gaussian multiple access channels (Theorem 5.1).

Sampling Theorems
A very natural question is whether, similarly for the white Gaussian noise formulation, sampling theorems hold for continuous-time white Gaussian channels under Brownian motion formulation. In this section, we will establish sampling theorems for the channel (6), which naturally connect such channels with their discrete-time versions obtained by sampling.
Consider the following regularity conditions for channel (6): (a) The solution {Y (t)} to the stochastic differential equation (6) uniquely exists; Note that all the three above conditions are rather weak: Condition (a) is necessary for the channel to be meaningful, and Conditions (b) and (c) are very mild integrability assumptions. Now, for any n ∈ N, choose time points t n,0 , t n,1 , . . . , t n,n ∈ R such that 0 = t n,0 < t n,1 < · · · < t n,n−1 < t n,n = T, and let ∆ n {t n,0 , t n,1 , . . . , t n,n }. Sampling the channel (6) over the time interval [0, T ] with respect to ∆ n , we obtain its sampled discrete-time version as follows: For any time point sequence ∆ n , we will use δ ∆n to denote its minimal stepsize, namely, ∆ n is said to be evenly spaced if t n,i − t n,i−1 = T /n for all feasible i, and we will use the shorthand notation δ n to denote its stepsize, i.e., δ n t n,1 − t n,0 = T /n. Apparently, evenly spaced time point sequences are natural candidates with respect to which a continuous-time Gaussian channel can be sampled. We are primarily concerned with the mutual information for the channel (6), whose standard definition (see, e.g., [73,45]) is given below: where the subscripted µ denotes the measure induced on C[0, T ] or C[0, T ] × C[0, T ] by the corresponding stochastic process and Roughly speaking, the following sampling theorem states that for any sequence of "increasingly refined" samplings, the mutual information of the sampled discrete-time channel (7) will converge to that of the original channel (6).
Theorem 2.1. Assume Conditions (a)-(c). Suppose that ∆ n ⊂ ∆ n+1 for all n and that δ ∆n → 0 as n tends to infinity. Then, we have Proof. The proof is rather technical and thereby postponed to Appendix A.
Regarding the assumptions of Theorem 2.1, as mentioned before, Conditions (a)-(c) are very weak, but the condition that "∆ n ⊂ ∆ n+1 for all n" is somewhat restrictive, which, in particular, is not satisfied by the set {∆ n } of all evenly spaced time point sequences. We next show that this condition can be replaced by some extra regularity conditions: The same theorem holds as long as the stepsize of the sampling tends to 0, which, in particular, is satisfied by the set of all evenly spaced sampling sequences.
Below and hereafter, defining the distance U s we may assume the following three regularity conditions for the channel (6): (d) Uniform Lipschitz condition: There exists a constant L > 0 such that for any (e) Uniform linear growth condition: There exists a constant L > 0 such that for any |Y (r)|; (f) Regularity conditions on W : There exists ε > 0 such that and for any K > 0, there exists ε > 0 such that and there exists a constant L > 0 such that for any ε > 0, The following lemma, whose proof is postponed to Appendix B, says that Conditions (d)-(f) are stronger than Conditions (a)-(c). We however remark that Conditions (d)-(f) are still rather mild assumptions: The uniform Lipschitz condition, uniform linear growth condition and their numerous variants are typical assumptions that can guarantee the existence and uniqueness of the solution to a given stochastic differential equation. In theory, these two conditions are considered mild in the sense there are examples that the corresponding stochastic differential equation may not have solutions at all if these two conditions are not satisfied (see, e.g., [63]). Note that the third condition is a mild integrability condition; as a matter of fact, for a feedback channel where W is interpreted as the message, this condition is trivially satisfied. All three conditions above will be taken for granted in most practical communication situations: as might be expected, the signals employed in practice will be much better-behaving. Lemma 2.2. Assume Conditions (d)-(f ). Then, there exists a unique strong solution of (6) with initial value Y (0) = 0. Moreover, there exists ε > 0 such that which immediately implies Conditions (b) and (c).
Roughly speaking, the following sampling theorem states that if the stepsizes of the samplings tend to 0, the mutual information of the channel (7) will converge to that of the channel (6). Note that in this theorem, we do not need the assumption that "∆ n ⊂ ∆ n+1 for all n", which is required in Theorem 2.1.
For any sequence {∆ n } with δ ∆n → 0 as n tends to infinity, we have lim Proof. The proof is rather technical and lengthy, and thereby postponed to Appendix C. We note that, as detailed in Remark C.1, the arguments in the proof can be adapted to yield a sampling theorem in estimation theory.

Approximation Theorems
In this section, we will establish approximation theorems for the channel (6), which naturally connect such channels with their discrete-time versions obtained by approximation. As elaborated in later sections, the approximation theorem will underpin the approximation approach that will be introduced in Section 4. An application of the Euler-Maruyama approximation [41] with respect to ∆ n to (6) will yield a discrete-time sequence {Y (n) (t n,i ) : i = 0, 1, . . . , n} and a continuous-time process {Y (n) (t) : t ∈ [0, T ]}, a linear interpolation of {Y (t n,i )}, as follows: Initializing with Y (n) (0) = 0, we recursively compute, for each i = 0, 1, . . . , n − 1, We are now ready to prove the following theorem: Proof. The proof is rather technical and lengthy, and thereby postponed to Appendix D. We note that, as detailed in Remark D.3, the arguments in the proof can be adapted to yield an approximation theorem in estimation theory.
When the channel (6) is interpreted as a feedback channel, both W (n) and W are precisely M . When the channel (6) is interpreted as a memory channel, Theorem 3.2 states that the mutual information between its input and output is the limit of that of its approximated input and output (in the sense of the above-mentioned modified Euler-Maruyama approximation).
Other variants of the Euler-Maruyama approximation can also be applied to the channel to yield variants of the approximation theorem. For instance, under Conditions (d)-(f), for the following variant of the Euler-Maruyama approximation, a parallel argument as in the proof of Theorem 3.1 will give the following variant of Theorem 3.1: Moreover, for we have the following variant of Theorem 3.2: Theorem 3.5. Assume Conditions (d)-(f ). Then, we have Regarding the approximation theorem and its variants, we make the following several remarks.
Remark 3.6. Continuous-time directed information has been defined in [91] for continuoustime white Gaussian channels with positively delayed feedback. In this remark, we show that our approximation theorem can be used to give an alternative definition of continuous-time directed information, even for the case that the feedback is instantaneous.
Consider the following continuous-time Gaussian feedback channel: For any ∆ n , we defineX (n) (·) as follows: for any t with t n,i ≤ t < t n,i+1 , ) asX (n) (t) for simplicity, (11) can be rewritten as for which it can be readily checked that Theorem 3.1 and the above observation can be used to define continuous-time directed mutual information.
To be more precise, the continuous-time directed information from X T 0 to Y T 0 of the channel (19) can be defined as Consider the following continuous-time Gaussian channel with possibly delayed feedback: where D ≥ 0 denotes the delay of the feedback. In [91], the notion of continuous-time directed information from X T 0 to Y T 0 is defined as follows: It is proven that for the case D > 0, using this notion, a connection between information theory and estimation theory can be established as follows: On the other hand though, it is easy to see that for the case D = 0, i.e., there is no delay in the feedback as in (19), the definition in (23) and the equality as in (24) may run into some problems: Consider the extreme scenario and choose X(t) = −Y (t) for any feasible t, then clearly the right hand side of (24) should be equal to 0. On the other hand though, for the left hand side, each small interval in (23) will yield which further implies that, I D=0 (X T 0 → Y T 0 ), the left-hand side of (24) at D = 0 is infinite, a contradiction. On the other hand, be it the case D > 0 or D = 0, with the definition in (21), Theorem 3.1 however promises: Remark 3.7. When there is no feedback or memeory, Theorem 3.1 boils down to Theorem 2.3: obviously we will have for any feasible i Y (n) (t n,i ) = Y (t n,i ), which means that Theorem 3.1 actually states which is precisely the conclusion of Theorem 2.3. And moreover, by Remark 3.6, we also have lim Remark 3.8. In this remark, we briefly discuss the possible applications of our sampling and approximation theorems, both of which we believe will important roles in the long run for further developing continuous-time information theory, particularly for scenarios where feedback and memory are present. Taking advantage of the pathwise continuity of a Brownian motion, our sampling theorems, Theorems 2.1 and 2.3, naturally connect continuous-time Gaussian memory/feedback channels with their discrete-time counterparts, whose outputs are precisely sampled outputs of the original continuous-time Gaussian channel. In discrete time, the Shannon-McMillan-Breiman theorem provides an effective way to approximate the entropy rate of a stationary ergodic process, and numerical computation and optimization of mutual information of discrete-time channel using the Shannon-McMillan-Breiman theorem and its extensions have been extensively studied (see, e.g., [32,33] and references therein), which suggests our sampling theorems may well serve as a bridge to capitalize on relevant results in discrete time to numerically compute and optimize the mutual information of continuous-time Gaussian channels. In short, despite numerous technical barriers that one needs to overcome, we believe that in the long run the sampling theorems can help us in terms of numerically computing the mutual information and capacity of continuous-time Gaussian channels.
By comparison, our approximation theorems, Theorems 3.1 and 3.2, are somewhat "artificial" in the sense that the outputs of the associated discrete-time channels are only approximated outputs of the original continuous-time channels. Nonetheless, as the Euler-Maruyama approximation of a continuous-time channel yields the form of a discrete-time channel typically takes, our approximation theorems allow a smooth translation from the results and ideas from the discrete-time setting to the continuous-time setting. As a result, the approximation theorems underpin the so-called approximation approach (to be introduced in Section 4) and readily yield results for continuous-time Gaussian channels in the multi-user setting, which will be elaborated in the following sections.

The Approximation Approach
Consider the following continuous-time white Gaussian channel with feedback satisfying the power constraint: there exists P > 0 such that for any T , with probability 1 As mentioned in Section 1, it is well-known that the capacity of the above channel is P/2 (The same result can be established under alternative power constraints; see, e.g., [45]). As elaborated in Section 1, when there is no feedback in the channel, i.e., the channel (25) is actually equivalent to (1), and one can "derive" the non-feedback capacity heuristically using the conventional sampling approach as in (1)- (4). But this approach is unable to tackle feedback since an application of the Shannon-Nyquist sampling theorem will destroy the temporal causality.
In this section, we use our approximation theorems to give an alternative way to "derive" the capacity of (25), which will be referred to as the approximation approach in the remainder of the paper. Compared to the sampling approach, the approximation approach can handle feedback due to the fact the Euler-Maruyama approximation preserves temporal causality. Below we briefly explain this new approach, which will be further developed and used, either heuristically or rigorously, in Section 5, where multiple users may be involved in a communication system.
For fixed T > 0, consider the evenly spaced sequence ∆ n with stepsize δ n = T /n. Applying the Euler-Maruyama approximation (15) to the channel (25) over the time window [0, T ], we obtain By Theorem 3.4, we have Our strategy is to "establish" the capacity for the discrete-time channel (27) first, and then the capacity for the continuous-time channel (25) using the "closeness" between the two channels, as claimed by approximation theorems. For the converse part, we first note that It then follows from the fact which, by (28), immediately yields which establishes the converse part. For the availability part, note that if we assume all X (n) (t n,i−1 ) are independent of the Brownian motion B with E[(X (n) (t n,i−1 )) 2 ] = P , then the inequalities in (29) and (30) will become equalities. The part then follows from a usual random coding argument with codes generated by the distribution of X (n) (or more precisely, a linear interpolation of X (n) ). It is clear that as n tends to infinity, the process X (n) behaves more and more like a white Gaussian process. This observation echoes Theorem 6.4.1 in [45], whose proof rigorously shows that an Ornstein-Uhlenbeck process that oscillates "extremely" fast will achieve the capacity of (25).
Roughly speaking, similar to the conventional sampling approach, the above approximation approach establishes a continuous-time Gaussian feedback channel as the limit of the associated discrete-time channels as the SNR for each channel use shrink to zero proportionately (note that in the above arguments, the SNR for each channel use is δ n ). In other words, we have strengthened the low SNR equivalence in (A) as follows: a continuous-time infinite-bandwidth Gaussian channel with feedback is "equivalent" to a discrete-time Gaussian channel with feedback at low SNR. (B) We remark however that for the purpose of deriving the capacity of (25) though, the approximation approach, like the conventional sampling approach, is heuristic in nature: Theorem 3.1 does require Conditions (d)-(f), which are much stronger than the power constraint (26). Nevertheless, this approach is of fundamental importance to our treatment of continuous-time Gaussian channels: as elaborated in Section 5, not only can it channel the ideas and techniques in discrete time to rigorously establish new results in continuous time, more importantly, it can also provide insights and intuition for our rigorous treatments where we will employ established tools and develop new tools in stochastic calculus.

Continuous-Time Multi-User Gaussian Channels
Extending Shannon's fundamental theorems on point-to-point communication channels to general networks with multiple sources and destinations, network information theory aims to establish the fundamental limits on information flows in networks and the optimal coding schemes that achieve these limits. The vast majority of researches on network information theory to date have been focusing on networks in discrete time. In a way, this phenomenon can find its source from Shannon's original treatment of continuous-time pointto-point channels, where such channels were examined through their associated discrete-time versions. This insightful viewpoint has exerted major influences on the bulk of the related literature on continuous-time Gaussian channels, oftentimes prompting a model shift from the continuous-time setting to the discrete-time one right from the beginning of a research attempt.
The primary focus of this section is to illustrate the possible applications of the approximation approach: 1) Guided by this approach, we will rigorously derive the capacity regions of families of continuous-time multi-user one-hop white Gaussian channels, including continuous-time multi-user white Gaussian multiple access channels (MACs) and broadcast channels (BCs). To deliver the rigourous proofs of our results, we will directly work within the continuous-time setting, employing established tools and developing new tools (see Theorems E.1 and G.1) in stochastic calculus to complement the approximation approach. 2) We can also rigorously apply this approach to examine how feedback affects the capacity regions of the above-mentioned channels via translations of results and techniques in discrete time. It turns out that some results can be translated from the discrete-time setting to the continuous-time setting, such as that feedback increases the capacity region of Gaussian BCs, and that feedback does not increase the capacity region of physically degraded BCs. Nevertheless, there is a seeming "exception": as opposed to discrete-time Gaussian MACs, feedback does not increase the capacity region some of continuous-time Gaussian MACs, which, somewhat surprisingly, can also be explained by the approximation approach as well.
Below, we summarize the results in this section. To put our results into a relevant context, we will first list some related results in discrete time, and for obvious reasons, we can only list those that are most relevant to ours.
Gaussian MACs. When there is no feedback, the capacity region of a discrete-time memoryless MAC is relatively better understood: a single-letter characterization has been established by Ahlswede [2] and the capacity region of a Gaussian MAC was explicitly derived in Wyner [95] and Cover [15]. On the other hand, the capacity region of MACs with feedback still demands more complete understanding, despite several decades of great effort by many authors: Cover and Leung [16] derived an achievable region for a memoryless MAC with feedback. In [92], Willems showed that Cover and Leung's region is optimal for a class of memoryless MACs with feedback where one of the inputs is a deterministic function of the output and the other input. More recently, Bross and Lapidoth [11] improved Cover and Leung's region, and Wu et al. [93] extended Cover and Leung's region for the case where non-causal state information is available at both senders. An interesting result has been obtained by Ozarow [69], who derived the capacity region of a memoryless Gaussian MAC with two users via a modification of the Schalkwijk-Kailath scheme [79]; moreover, Ozarow's result showed that in general, the capacity region for a discrete memoryless MAC is increased by feedback.
In Section 5.1, guided by the approximation approach, we first establish Lemma E.1, a key lemma which roughly says that "other users can be simply treated as noises", and we then employ established tools from stochastic calculus to derive the capacity region of a continuous-time white Gaussian MAC with m senders and with/without feedback. It turns out that for such a channel, feedback does not increase the capacity region, which, at first sight, may seem at odds with the aforementioned Ozarow's result and the conclusion of our approximation theorems. This however can be roughly explained by the well-known fact that ">" may become "=" when taking the limit (indeed, a n > b n does not necessarily imply lim n→∞ a n > lim n→∞ b n ); see Remark 5.3 for a more detailed explanation.
Gaussian ICs. The capacity regions of discrete-time Gaussian ICs are largely unknown except for certain special scenarios: The capacity region of Gaussian ICs with strong interference has been established in Sato [78], Han and Kobayashi [31]. The sum-capacity of Gaussian ICs with weak interference has been simultaneously derived in [82,3,66]. The half-bit theorem on the tightness of the Han-Kobayashi bound [31] was proven in [23]. The approximation of the Gaussian IC by the q-ary expansion deterministic channel was first proposed by Avestimehr, Diggavi, and Tse [7]. Outer and inner bounds on the feedback capacity region of Gaussian interference channels are established by Suh and Tse [88]. Note that all the above-mentioned work deal with ICs with two pairs of senders and receivers. For more than two user pairs, special classes of Gaussian ICs have been examined using the scheme of interference alignment; see an extensive list of references in [22].
In Section 5.2, using a similar approach that we developed for continuous-time Gaussian MACs, we derive the capacity region of a continuous-time white Gaussian IC with m pairs of senders and receivers and without feedback. And we also use a translated version of the argument in [88] and the approximation approach to show that feedback does increase the capacity region of certain continuous-time white Gaussian IC.
Gaussian BCs. The capacity regions of discrete-time Gaussian BCs without feedback are well known [14,9]. And it has been shown by El Gamal [21] that feedback cannot increase the capacity region of a physically degraded Gaussian BC. On the other hand, it was shown by Ozarow and Leung [70] that feedback can increase the capacity of stochastically degraded Gaussian BCs, whose capacity regions are far less understood.
In Section 5.3, we first establish a continuous-time version of entropy power inequality (Theorem G.1) and then derive the capacity region of a continuous-time Gaussian non-feedback BC with m receivers. Employing the approximation approach, we use a modified argument in [21] to show that feedback does not increase the capacity region of a physically degraded continuous-time Gaussian BC, and on the other hand, a translated version of the argument in [70] to show that feedback does increase the capacity region of certain continuous-time Gaussian BC.
Here we remark that the above-mentioned capacity results for the non-feedback case (Theorems 5.1 (for the non-feedback case) and 5.7) are "long known" in the sense that they are "predicted" by the conventional sampling approach and their proofs follow from the usual framework. On the other hand though, explicit formulations and statements of these results and their rigorous and complete proofs, to the best of our knowledge, do not exist in the literature. The reason, we believe, is that there are a number of technical difficulties that one has to overcome to prove such results: Lemma E.1 and Lemma G.1 (which is based on the I-MMSE relationship that has only been established in [28]) are newly developed in this work and their proofs are non-trivial.
In contrast, the approximation approach can be applied to continuous-time Gaussian feedback channels, either heuristically or rigorously. More specifically, it can be heuristically applied to "explain" Theorems 5.1 (on feedback capacity) and 5.7) and give us intuition (as elaborated in Remark 5.3, it helps "predict" the optimal channel input distribution, which has been made rigorous in Lemma E.1), and it can also be applied to establish Theorems 5.9 and 5.10.

Gaussian MACs
Consider a continuous-time white Gaussian MAC with m users, which can be characterized by where X i is the channel input from sender i, which depends on M i , the message sent from sender i, which is independent of all messages from other senders, and possibly on the feedback Y s 0 , the channel output up to time s. For T, R 1 , . . . , R m , P 1 , . . . , P m > 0, a (T, (e T R 1 , . . . , e T Rm ), (P 1 , . . . , P m ))-code for the MAC (32) consists of m sets of integers M i = {1, 2, . . . , e T R i }, the message alphabet for user i, i = 1, 2, . . . , m, and m encoding functions, X i : M i → C[0, T ], which satisfy the following power constraint: for any i = 1, 2, . . . , m, with probability 1, and a decoding function, The average probability of error for the above code is defined as A rate tuple (R 1 , R 2 , . . . , R m ) is said to be achievable for the MAC if there exists a sequence of (T, (e T R 1 , . . . , e T Rm ), (P 1 , . . . , P m ))-codes with P (T ) e → 0 as T → ∞. The capacity region of the MAC is the closure of the set of all the achievable (R 1 , R 2 , . . . , R m ) rate tuples.
The following theorem, whose proof is postponed to Appendix E, gives an explicit characterization of the capacity region of (32).
Theorem 5.1. Whether there is feedback or not, the capacity region of the continuous-time white Gaussian MAC (32) is Remark 5.2. When there is no feedback, Theorem 5.1 can be heuristically explained using the sampling approach as in (2)-(4) (this heuristical approach in this example should be well-known; see, e.g., Exercise 15.26 in [17]). For simplicity only, we consider the following continuous-time white Gaussian multiple access channel with two senders: where X i , i = 1, 2, is the input from the i-th user with average power limit P i . Similarly as before, consider its associated discrete-time version corresponding to bandwidth limit ω: Then, it is well known [22] that the outer bound on the capacity region can be computed as and the inner bound as (Here, it is known [95,15] that the outer bound can be tightened to coincide with the inner bound, which, however, is not needed for this example.) It is easy to verify that the two bounds also collapse into the same region as ω tends to infinity: which is "expected" to be the capacity region of (34); or alternatively, one can apply the low SNR equivalence in (A) and take the limit as P tends to 0, reaching the same conclusion. Note that similar arguments hold for more than two senders as well through a parallel extension.
Remark 5.3. When the feedback is present in the channel, the approximation approach, rather than the conventional sampling approach, is necessary to explain Theorem 5.1.
Again, for simplicity only, we consider the following continuous-time Gaussian MAC with two senders: with the power constraints: there exist P 1 , P 2 > 0 such that for all T , Applying the Euler-Maruyama approximation to the above channel over the time window [0, T ] with respect to the evenly spaced ∆ n with δ n = T /n, we obtain Now, straightforward computations and a usual concavity argument then yields that for large n, A completely parallel argument will yield that It then follows from Theorem 3.1 the region below give an outer bound of the capacity region: To see that this outer bound can be achieved, set X 1 (s), X 2 (s), t n,i ≤ s ≤ t n,i+1 , in (35) to be independent Gaussian random variables with variances P 1 , P 2 , respectively. Then, one verifies that for large n, where we have used the fact that δ n is "close" to 0 for large enough n for (40), which, parallelly as in Remark 5.2, can be alternatively explained by taking P 1 to 0 and then applying the low SNR equivalence in (B). With a similar argument, one can prove that It then follows that the outer bound in (37) can be achieved.
Here we remark that similarly as in Section 4, for n large enouch, the constructed processes X 1 and X 2 behave like "fast-oscillating" Ornstein-Uhlenbeck processes, and moreover, from (38) and (39), one can tell that for one user to achieve the maximum transmission rate, the other user can simply be ignored. Predicting the optimal channel input, these facts echo Remark E.2 and give another explanation to Lemma E.1, a key lemma in our rigourous proof of Theorem 5.1 in Appendix E.

Gaussian ICs
Consider the following continuous-time white Gaussian interference channel having no feedback and with m pairs of senders and receivers: for i = 1, 2, . . . , m, where X i is the channel input from sender i, which depends on M i , the message sent from sender i, which is independent of all messages from other senders, and a ij ∈ R, i, j = 1, 2, . . . , m, is the channel gain from sender j to receiver i, all B i (t) are (possibly correlated) standard Brownian motions.
The following theorem explicitly characterizes the capacity region of the above IC, whose proof has been postponed to Appendix F.  (41) is Remark 5.5. Theorem 5.4 can be heuristically derived using a similar argument employing the approximation approach as in Remark 5.3.
With the explicit non-feedback capacity region stated in Theorem 5.4, we are now ready to use the approximation approach to analyze the effects of feedback on continuous-time Gaussian ICs.
The following theorem says that feedback does help continuous-time Gaussian ICs, whose proof uses a translated version of the argument in [88] coupled with the approximation approach as in Section 4, and so we only provide a sketch of the proof.
Theorem 5.6. Feedback strictly increases the capacity region of certain continuous-time Gaussian interference channel.
Proof. Consider the following symmetric continuous-time Gaussian interference channel with two pairs of senders and receivers: where snr, inr denote the signal-to-noise, interference-to-noise ratios, respectively, B 1 (t), B 2 (t) are independent standard Brownian motions, and the average power of X 1 , X 2 are assumed to be 1. Following [88], we consider the following coding scheme over two stages, each of length T 0 . In the first stage, transmitters 1 and 2 send codewords X T 0 1,0 and X T 0 2,0 with rates R 1 and R 2 , respectively. In the second stage, using feedback, transmitters 1 and 2 decode X T 0 2,0 and X T 0 1,0 , respectively. This can be decoded if Then, transmitters 1 and 2 send X 2T 0 1,T 0 and X 2T 0 2,T 0 , respectively such that for any 0 ≤ t ≤ T 0 , Then during the two stages, receiver 1 receives which immediately gives rise to We then have that for any 0 ≤ t ≤ T 0 , which means the codeword X T 0 1,0 can be decoded at the second stage if A completely parallel argument yields that the codeword X T 0 2,0 can be decoded at the second stages if All in all, after the two stages, the two codewords X T 0 1,0 and X T 0 2,0 can be decoded as long as in other words, coding rate ( inr 2 , inr 2 ) is achievable, which, if assuming inr > snr, will imply that feedback strictly increases the capacity region.

Gaussian BCs
In this section, we consider a continuous-time white Gaussian BC with m receivers, which is characterized by: for i = 1, 2, . . . , m, where X is the channel input, which depends on M i , the message sent from sender i, which is uniformly distributed over a finite alphabet M i and independent of all messages from other senders, snr i is the signal-to-noise ratio in the channel for user i, B i (t) are (possibly correlated) standard Brownian motions. For T, R 1 , R 2 , . . . , R m , P > 0, a (T, (e T R 1 , . . . , e T Rm ), P )-code for the BC (43) consists of m set of integers M i = {1, 2, . . . , e T R i }, the message set for receiver i, i = 1, 2, . . . , m, and an encoding function, X : M 1 × M 2 × · · · × M m → C[0, T ], which satisfies the following power constraint: with probability 1, and m decoding functions, g i : The average probability of error for the (T, (e T R 1 , e T R 2 , . . . , e T Rm ), P )-code is defined as A rate tuple (R 1 , R 2 , . . . , R m ) is said to be achievable for the BC if there exists a sequence of (T, (e T R 1 , e T R 2 , . . . , e T Rm ), P )-codes with P (T ) e → 0 as T → ∞. The capacity region of the BC is the closure of the set of all the achievable (R 1 , R 2 , . . . , R m ) rate tuples.
The following theorem explicitly characterizes the capacity region of the above BC, whose proof is postponed to Appendix G.
Remark 5.8. Theorem 5.7 can be heuristically derived using a similar argument employing the approximation approach as in Remark 5.3.
We are now ready to use the approximation approach to analyze the effects of feedback on continuous-time Gaussian BCs.
The following theorem says that feedback does not help physically degraded Gaussian BCs, whose proof is inspired by the ideas in Section 4 and parallels the argument in [21]. Theorem 5.9. Consider the following continuous-time physically degraded Gaussian broadcast channel with one sender and two receivers: where N 1 , N 2 > 0, and B 1 , B 2 are independent standard Brownian motions, and the channel input X(s) is assumed to satisfy Conditions (d)-(f ). Then, feedback does not increase the capacity region of the above channel.
Proof. Let X be a (T, (e T R 1 , e T R 2 ), P )-code. By the code construction, for i = 1, 2, it is possible to estimate the messages M i from the channel output Y T i,0 with an arbitrarily low probability of error. Hence, by Fano's inequality, for i = 1, 2, where ε i,T → 0 as T → ∞. It then follows that T . Now the Euler-Maruyama approximation with respect to the evenly spaced ∆ n of stepsize δ n = T /n applied to the continuous-time physically degraded Gaussian BC yields: Then, by Theorem 3.1, we have 2 (t n,i−1 ) : i = 1, 2, · · · , n}. Note that log(2πe(P δ 2 n + N 2 δ n )), and H(∆Y which implies that there exists an α ∈ [0, 1] such that n 2 log(2πe(αP δ 2 n + N 2 δ n )).
It then follows from Theorem 3.1 that Next we consider Now, using Lemma 1 in [21] (an extension of the entropy power inequality), we obtain log(2πe(αP δ 2 n + N 1 δ n )) and furthermore, by Theorem 3.1, Now, by Theorem 5.7, we conclude that feedback capacity region is exactly the same nonfeedback capacity region; in other words, feedback does not increase the capacity region of a physically degraded continuous-time Gaussian BC.
The following theorem says that feedback does help some stochastically degraded Gaussian BCs, whose proof, instead of directly employing the approximation theorem, uses the connections between continuous-time and discrete-time Gaussian channels and the notion of continuous-time directed information in Remark 3.6, both of which can find their source from the approximation theorem. We only provide the sketch of the proof, since it is largely based on a translated version of the argument in [70]. Proof. Consider the following symmetric continuous-time Gaussian broadcast channel: where B 1 , B 2 are independent standard Brownian motions, and X satisfies the average power constraint P . By Theorem 5.7, without feedback, the capacity region is the set of rate pairs (R 1 , R 2 ) such that With feedback, one can use the following variation [70] of the Schalkwijk-Kailath coding scheme [79] over [0, T ] at discrete time points {t n,i } that form an evenly spaced ∆ n of stepsize δ n : For the channel input, after some proper initialization, at time t ∈ [t n,i , t n,i+1 ), we send X (n) (t) = X (n) i (t)] = P for each i; and for the channel outputs, we have, for any t ∈ [t n,i , t n,i+1 ], and Y (n) Going through a completely parallel argument as in [70] and capitalizing on the fact that the SNR in the channels (46) and (47) tend to 0 as n tends to infinity, we derive that and parallelly, where ρ * > 0 satisfies the condition ρ * (1 + (P + 1)(1 + P (1 − ρ * )/2)) = P (P + 2)(1 − ρ * ) 2 .
Note that, by Remark 3.6, we have which immediately implies that are achievable. The claim that feedback strictly increases the capacity region then follows from (45), and (48) and the fact that ρ * > 0.

Conclusions and Future Work
For a continuous-time white Gaussian channel without feedback, the classical Shannon-Nyquist sampling theorem can convert it to a discrete-time Gaussian channel, however such a link has long been missing when feedback/memory is present in the channel. In this paper, we establish sampling and approximation theorems as the missing links, which we believe will play important roles in the long run for further developing continuous-time information theory, particularly for the communication scenarios where feedback/memory is present.
As an immediate application of our approximation theorem, we propose the approximation approach, an analog of the conventional sampling approach, for Gaussian feedback channels. It turns out that, like its non-feedback counterpart, the approximation approach can bring insights and intuition to investigation of continuous-time Gaussian channels with possible feedback, and moreover, when complemented with relevant tools from stochastic calculus, can deliver rigorous treatments in the point-to-point or multi-user setting.
On the other hand though, there are many questions that remain unanswered and a number of directions that need to be further explored. Below we list a number of research directions that look promising in the near future.
1) The first direction is to strengthen and generalize our sampling and approximation theorems.
Note that both Theorem 2.3 and Theorem 3.1 require Conditions (d)-(f), which are stronger than the typical average power constraint. While Conditions (d)-(f) are rather mild for practical considerations, the stronger assumptions in our theorems will narrow their reach in some theoretical situations. For instance, despite the fact that our approximation theorem gives intuitive explanations to the rigorous treatment of continuous-time multi-user Gaussian channels in Section 5, it fails to rigorously establish Theorems 5.1 and 5.7. The stochastic calculus approach employed in Section 5 requires only the power constraints, which can be loosely explained by the fact that Girsanov's theorem (or, more precisely, its several variants) only requires as weak conditions. It is certainly worthwhile to explore whether the assumptions in our sampling and approximation theorems can be relaxed either in general or for some special settings.
Another topic in this direction is the rate of convergence in the sampling and approximation theorems. While the current versions of our theorems have merely established some limits, the rate of convergence will certainly yield a more quantitative description of how fast those limits will be approached.
One can also consider generalizing these two theorems to general Gaussian channels [42,36]. For this topic, note that there exist in-depth studies [37,48,38,43,47,44,8,46] on continuous-time point-to-point general Gaussian channels with possible feedback, for which information-theoretic connections with the discrete-time setting are somehow lacking. A first step in this direction can be establishing sampling or approximation theorems for stationary Gaussian processes. Obviously, such theorems for stationary Gaussian processes can connect continuous-time stationary Gaussian channels to their discrete-time counterparts, for which the variational formulation of discrete-time stationary Gaussian feedback capacity in [53] proves to rather effective.
2) The second direction is to further explore the possible applications of our sampling and approximation theorems in the following respects.
We have shown that feedback may increase the capacity region of some continuoustime Gaussian BC, but the capacity regions of such channels remain unknown in general. An immediate problem is to explicitly find the exact capacity regions of continuous-time Gaussian BCs using the approach employed in this work, as we have done for continuoustime Gaussian MACs. Of course, further topics also include exploring whether the ideas and techniques in this paper can be applied to other families of continuous-time multi-user Gaussian channels with possible feedback.
So far we have implicitly assumed infinite bandwidth and average power constraints, but our theorems can certainly go beyond these assumptions. For instance, one can consider examining continuous-time Gaussian channels with both bandwidth limit and peak power constraint, which are more reasonable assumptions for many practical communication scenarios as they give a more accurate description of the limitations of the communication system. Little is known about the capacity of continuous-time Gaussian channels with such constraints except some upper and lower bounds established in [71,80]. In stark contrast, discrete-time peak power constrained channels (including, but not limited to Gaussian channels) have been better investigated: there has been a series of work on their capacity, such as [87,81,1,12,76,85,24,20], which feature relatively thorough discussions about different aspects of channel capacity including capacity achieving distribution, bounds and asymptotics of capacity, and numerical computation of capacity. An immediate question is to explore whether the approximation approach can translate the aforementioned existing results in discrete time, or more probably, help channel the ideas and techniques therein to the continuous-time setting. A next question is to explore whether there exists any randomized algorithm for computation of the capacity of such a channel, for which, as discussed in Remark 3.8, we believe our sampling theorems can be particularly helpful in terms of numerically computing and optimizing the mutual information of a continuous-time Gaussian channel with bandwidth limit and peak power constraint.

Appendices A Proof of Theorem 2.1
First of all, an application of Theorem 7.14 of [61] with Conditions (b) and (c) yields that Then one verifies that the assumptions of Lemma 7.7 of [61] are all satisfied (this lemma is stated under very general assumptions, which are exactly Conditions (b), (c) and (49) when restricted to our settings), which implies that for any w, where "∼" is the standard notation for two measures being equivalent (i.e., one is absolutely continuous with respect to the other and vice versa), and moreover, with probability 1, where we have rewritten g(s, W s 0 , Y s 0 ) as g(s) for notational simplicity. Here we remark that T 0 g(s) 2 ds , but we keep it the way it is as above for an easy comparison.
Note that it follows from T 0 g(s) 2 ds ] = 1. Then, a parallel argument as in the proof of Theorem 7.1 of [61] further implies that for any ∆ n , with probability 1, where we have defined B(∆ n ) {B(t n,0 ), B(t n,1 ), . . . , B(t n,n )}, and moreover, Then, by definition, we have Notice that it can be easily checked that e − T 0 g(s)dY (s)+ 1 2 T 0 g(s) 2 ds integrable, which, together with the fact that ∆ n ⊂ ∆ n+1 for all n, further implies that are both martingales, and therefore, by Doob's martingale convergence theorem [19], Now, by Jensen's inequality, we have and, by the fact that log x ≤ x for any x > 0, we have It then follows from (52) and (53) that Applying the general Lebesgue dominated convergence theorem (see, e.g., Theorem 19 on Page 89 of [77]), we then have A completely parallel argument yields that

B Proof of Lemma 2.2
With Conditions (d)-(f), the proof of the existence and uniqueness of the solution to (6) is somewhat standard; see, e.g., Section 5.4 in [63]. So, in the following, we will only prove (10). For the stochastic differential equation (6), applying Condition (e), we deduce that there exists L 1 > 0 such that Then, applying the Gronwall inequality followed by a straightforward bounding analysis, we deduce that there exists L 2 > 0 such that Now, for any ε > 0, applying Doob's submartingale inequality, we have which, by Condition (f), is finite provided that ε is small enough.

C Proof of Theorem 2.3
We proceed in the following steps.
Step 1. In this step, we establish the theorem assuming that there exists C > 0 such that for all w T 0 ∈ C[0, T ] and all y T By the definition of mutual information, (50) and (51), we have where, for notational simplicity, we have rewritten g(s, W s 0 , Y s 0 ) as g(s).
Step 1.1. In this step, we prove that as n tends to infinity, in probability. LetȲ T ∆n,0 denote the piecewise linear version of Y T 0 with respect to ∆ n ; more precisely, for any i = 0, 1, . . . , n,Ȳ ∆n (t n,i ) = Y (t n,i ), and for any t n,i−1 < s < t n,i with s = λt n,i−1 + (1 − λ)t n,i for some 0 < λ < 1,Ȳ ∆n (s) = λY (t n,i−1 ) + (1 − λ)Y (t n,i ). Letḡ ∆n (s, W s 0 ,Ȳ s ∆n,0 ) denote the piecewise "flat" version of g(s, W s 0 ,Ȳ s ∆n,0 ) with respect to ∆ n ; more precisely, for any t n,i−1 ≤ s < t n,i ,ḡ ∆n (s, where we have used the fact that To establish (57), notice that, by the Itô isometry [68], we have which means we only need to prove that as n → ∞, To see this, we note that, by Conditions (d) and (e), there exists L 1 > 0 such that for any s ∈ [0, T ] with t n,i−1 ≤ s < t n,i , Moreover, by Lemma 2.2 and Condition (f), both Y T 0 4 and W T 0 4 are integrable. And furthermore, by Condition (f), we deduce that for any t n,i−1 ≤ s < t n,i , for some L 2 > 0, and one easily verifies that It can be readily checked that (60), (61) and (62) imply (58), which in turn implies (56), as desired.
We now prove that as n tends to infinity, which will imply that in probability and furthermore (55). To establish (63), we first note that By (56), we have that as n tends to infinity, It then follows that, to prove (63), we only need to prove that if δ ∆n is small enough, Since we only have to prove that the two terms in the above upper bound are both finite provided that δ ∆n is small enough. Note that for the first term, applying the Cauchy-Schwarz inequality, we have It is well known that an application of Fatou's lemma yields that and by (60), we deduce that there exists L 3 > 0 such that Note that it follows from Doob's submartingale inequality that if δ ∆n is small enough, and by Lemma 2.2, we also deduce that if δ ∆n is small enough, which, together with Condition (f), yields that for the first term in (65) A completely parallel argument will yield that for the second term in (65) E e (2 T 0 (g(s)−ḡ ∆n (s))dB(s)+ 1 2 T 0 (g(s)−ḡ ∆n (s)) 2 ds) < ∞, which, together with (67), immediately implies (64), which in turn implies (63), as desired.
Step 1.2. In this step, we prove that as n tends to infinity, in probability. First, note that by Theorem 7.23 of [61], we have, where we have rewritten g(s, w s 0 , Y s 0 ) as g(w s 0 ) for notational simplicity. It then follows from Similarly, we have It then again follows from (51) that Now, we consider the following difference: Applying the inequality that for any x, y ∈ R, we have Now, using (60), Condition (f) and the Itô isometry, we deduce that as n → ∞, and as n tends to infinity, Now, using a similar argument as above with (54) and Lemma 2.2, we can show that for any constant K, provided that n is large enough, which, coupled with a similar argument as in the derivation of (67), proves that for n large enough, and furthermore which further implies that as n tends to infinity, Now, using the shorthand notations A n , A for Now, a similar argument as in (70)- (74), together with the well-known fact (see, e.g., Theorem 6.2.2 in [45]) that Now, we are ready to conclude that as n tends to infinity, T 0 g 2 (s)ds |Y T 0 ] in probability and furthermore (68), as desired.
Step 1.3. In this step, we show the convergence of {E[F n ]} and {E[G n ]} and further establish the theorem under the condition (54). Now, using the concavity of the log function and the fact that log x ≤ x, we can obtain the upper bounds and lower bounds of F n and G n as follows: And furthermore, using a similar argument as in Step 1.1, we can show that as n tends to infinity, A parallel argument can be used to show that So, under the condition (54), we have shown that Step 2. In this step, we will use the convergence in Step 1 and establish the theorem without the condition (54).
Following Page 264 of [61], we define, for any k, Then, we again follow [61] and define a truncated version of g as follows: . Now, define a truncated version of Y as follows: which, as elaborated on Page 265 in [61], can be rewritten as Note that for fixed k, the system in (77) satisfies the condition (54), and so the theorem holds true. To be more precise, note that where µ τ k ,Y and µ τ k ,B respectively denote the truncated versions of µ Y and µ B (from time 0 to time τ n ). Applying Theorem 7.10 in [61], we obtain . It then follows that Notice that it can be easily verified that τ k → T as k tends to infinity, which, together with the monotone convergence theorem, further yields that monotone increasingly, as k tends to infinity. By Step 1, for any fixed k i , which means that there exists a sequence {n i } such that, as i tends to infinity, we have, monotone increasingly, Since, by the fact that Y τ k 0 coincides with Y T 0 on the interval [0, τ k ∧ T ], we have Now, using the fact that , we conclude that as i tends to infinity, A similar argument can be readily applied to any subsequence of {I(W T 0 ; Y (∆ n ))}, which will establish the existence of its further subsubsequence that converges to I(W T 0 ; Y T 0 ), which implies that lim . The proof of the theorem is then complete.
Remark C.1. The arguments in the proof of Theorem 2.3 can be adapted to yield a sampling theorem for continuous-time minimum mean square error (MMSE), a quantity of central importance in estimation theory.
More precisely, consider the following continuous-time Gaussian feedback channel under the assumptions of Theorem 2.3: The MMSE is the limit of the MMSE based on the samples with respect to ∆ n , namely, To see this, note that the above-mentioned convergence follows from the fact that and the proven fact that dµ Y (Y (∆ n ))/dµ B and dµ Y |M (Y (∆ n ))/dµ B respectively converge to Similarly, we can also conclude that under the assumptions of Theorem 2.3, the causal MMSE is the limit of the sampled causal MMSE, namely, In this section, we give the detailed proof of Theorem 3.1. We will first need the following lemma, which is parallel to Lemma 2.2.
We also need the following lemma, which is parallel to Theorem 10.2.2 in [41].
Lemma D.2. Assume Conditions (d)-(f ). Then, there exists a constant C > 0 such that for all n, Proof. Note that for any n, we have It then follows that (79) Now, for any t, choose n 0 such that t n,n 0 ≤ t < t n,n 0 +1 . Now, a recursive application of (79), coupled with Conditions (d) and (e), yields that for some L > 0, Noticing that for any s with t n,i ≤ s < t n,i+1 , we have which, together with Condition (e) and the fact that for all n and i, implies that Noting that the constants in the two terms O(δ ∆n ) in (80) and (81) can be chosen uniform over all n, a usual argument with the Gronwall inequality and Condition (f) applied to ] completes the proof of the theorem.
We are now ready for the proof of Theorem 3.1.
Proof of Theorem 3.1. We proceed in two steps.
Step 1. In this step, we establish the theorem assuming that there exists a constant C > 0 such that for all w T 0 ∈ C[0, T ] and all y T We first note that straightforward computations yield (here we have used the shorter notations y (n) t n,i , y (n) t n,i−1 for y (n) (t n,i ), y (n) (t n,i−1 ), respectively) and which further lead to and (84) With (83) and (84), we have On the other hand, it is well known (see, e.g., [45]) that

Now, we compute
It can be easily checked that the second term of the right hand side of the above equality converges to 0 in mean. For the first term, we have (85) And using a similar argument as above, we deduce that It then follows from (85) and (86) that as n tends to infinity, We now establish the following convergence: where and let Note that using a parallel argument as the derivation of (87), we can establish as n tends to infinity; and similarly as in the derivation of (75), from Conditions (d), (e) and (f), Lemmas 2.2, D.1 and D.2, we deduce that as n tends to infinity. And note that we always have So, by the general Lebesgue dominated convergence theorem with (89), (90) and (91), we have So, under the condition (82), we have established the theorem.
Step 2. In this step, we will use the convergence in Step 1 and establish the theorem without the condition (82).
Defining the stopping τ k , g (k) and Y (k) as in the proof of Theorem 2.3, we again have: For any fixed k, applying the Euler-Maruyama approximation as in (11) and (12) to the above channel with respect to ∆ n , we obtain the process Y (n) (k) (·). Now, by the fact that where s ∆n denote the unique number n 0 such that t n,n 0 ≤ s < t n,n 0 +1 . Now, using the easily verifiable fact that and Jensen's inequality, we deduce that where for the last inequality, we have applied Fatou's lemma as in deriving (66). It then follows that which further implies that Now, using the fact that Y (n) and Y (n) (k) coincide over [0, τ k ∧ T ], one verifies that for any ε > 0, Using the easily verifiable fact that {τ k } converges to T in probability uniformly over all n and the fact that ε can be arbitrarily small, we conclude that as k tends to infinity, uniformly over all n, Next, an application of the monotone convergence theorem, together with the fact that τ k → T as k tends to infinity, yields that monotone increasingly as n tends to infinity. By Step 1, for any fixed k i , which means that there exists a sequence {n i } such that, as i tends to infinity, Moreover, by (92), which further implies that The theorem then follows from a usual subsequence argument as in the proof of Theorem 2.3.
Remark D.3. Parallel to Remark C.1, the arguments in the proof of Theorem 3.1 can be adapted to yield an approximation theorem in estimation theory. More precisely, consider the following continuous-time Gaussian feedback channel under the assumptions in Theorem 3.1: The MMSE is the limit of the approximated MMSE, namely, In more detail, the above-mentioned convergence follows from the fact that Then, using a similar argument as in the proof of Theorem 3.1, we can show which implies the claimed convergence. Similarly, we can also conclude that with the assumptions in Theorem 3.1, the causal MMSE is the limit of the approximated causal MMSE, namely, In the section, we give the proof of Theorem 5.1. For notational convenience only, we will assume m = 2, the case with a generic m being completely parallel. We will first need the following lemma, which is a key component in our treatment of both continuous-time Gaussian MACs.
Lemma E.1. For any > 0, there exist two independent Ornstein-Uhlenbeck processes {X i (s) : s ≥ 0}, i = 1, 2, satisfying the following power constraint: such that for all T , and moreover, where Here (and often in the remainder of the paper) the subscript T means that the (conditional) mutual information is computed over the time period [0, T ].
Proof. For a > 0, consider the following two independent Ornstein-Uhlenbeck processes X i (t), i = 1, 2, given by where B i , i = 1, 2, are independent standard Brownian motions. Obviously, for X i defined as above, (93) is satisfied. A parallel version of the proof of Theorem 6.2.1 of [45] yields that It then follows from Theorem 6.4.1 in [45] (applied to the Ornstein-Uhlenbeck process X 1 (t)+ X 2 (t) that as a → ∞, uniformly in T , which establishes (94). For i = 1, 2, defineỸ As in the proof of Theorem 6.4.1 in [45], we deduce that for i = 1, 2, I T (X i ;Ỹ i )/T tend to P i /2 uniformly in T . Now, since X 1 and X 2 are independent, we have for any fixed T , and which immediately implies (95). Now, by the chain rule of mutual information, which, together with (94) and (95), implies (96).
Remark E.2. With X i , i = 1, 2, regarded as channel inputs, (97) can be reinterpreted as a white Gaussian MAC. For i = j, I(X i ; Y ), the reliable transmission rate of X i when X j is not known can be arbitrarily close to I(X i ; Y |X j ), the reliable transmission rate of X i when X j is known. In other words, for white Gaussian MACs, knowledge about other user's inputs will not help to achieve faster transmission rate, and therefore, they can be simply treated as noises. An more intuitive explanation of this result is as follows: for the Ornstein-Uhlenbeck process X i as specified in the proof, its power spectral density can be computed as which is "negligible" compared to that of the white Gaussian noise (which is the constant 1) as a tends to infinity. Lemma E.1 is a key ingredient for deriving the capacity regions of white Gaussian MACs.
We also need some result on the information stability of continuous-time Gaussian processes. Let (U, V ) = {(U (t), V (t)), t ≥ 0} be a continuous Gaussian system (which means U (t), V (t) are pairwise Gaussian stochastic processes). Define U V denote the probability distributions of U T 0 , V T 0 and their joint distribution, respectively. For any ε > 0, we denote by T (T ) ε the ε-typical set: The pair (U, V ) is said to be information stable [73] if for any ε > 0, The following theorem is a rephrased version of Theorem 6.6.2. in [45].
Lemma E.3. The Gaussian system (U, V ) is information stable provided that Lemma E.3 will be used in the proof of Theorem 5.1 to establish, roughly speaking, that almost all sequences are jointly typical.
With Lemmas E.1 and E.3, Theorem 5.1 largely follows from a lengthy yet almost routine argument, which is included below due to a number of technical challenges in the proof.
Proof of Theorem 5.1. The converse part. In this part, we will show that for any sequence of (T, (e T R 1 , e T R 2 ), (P 1 , P 2 ))-codes with P (T ) e → 0 as T → ∞, the rate pair (R 1 , R 2 ) will have to satisfy Fix T and consider the above-mentioned (T, (e T R 1 , e T R 2 ), (P 1 , P 2 ))-code. By the code construction, it is possible to estimate the messages (M 1 , M 2 ) from the channel output Y T 0 with a low probability of error. Hence, the conditional entropy of (M 1 , M 2 ) given Y T 0 must be small; more precisely, by Fano's inequality, where ε T → 0 as T → ∞. Then, we have Now, we can bound the rate R 1 as follows: Conditioning on M 2 and applying Theorem 6.2.1 in [45], we have Noticing that X 2 =X 2 , we then have which, together with (33), implies that R 1 ≤ P 1 /2. A completely parallel argument will yield that R 2 ≤ P 2 /2. The achievability part. In this part, we will show that as long as (R 1 , R 2 ) satisfying we can find a sequence of (T, (e T R 1 , e T R 2 ), (P 1 , P 2 ))-codes with P (T ) e → 0 as T → ∞. The argument consists of several steps as follows.
Codebook generation: For a fixed T > 0 and ε > 0, assume that X 1 and X 2 are independent Ornstein-Uhlenbeck processes over [0, T ] with respective variances P 1 −ε and P 2 −ε, and that (R 1 , R 2 ) satisfying (98). Generate e T R 1 independent codewords X 1,i , i ∈ {1, 2, . . . , e T R 1 }, of length T , according to the distribution of X 1 . Similarly, generate e T R 2 independent codewords X 2,j , j ∈ {1, 2, . . . , e T R 2 }, of length T , according to the distribution of X 2 . These codewords (which may not satisfy the power constraint in (33)) form the codebook, which is revealed to the senders and the receiver.
Encoding: To send message i ∈ M 1 , sender 1 sends the codeword X 1,i . Similarly, to send j ∈ M 2 , sender 2 sends X 2,j .
Decoding: For any fixed ε > 0, let T (T ) ε denote the set of jointly typical (x 1 , x 2 , y) sequences, which is defined as follows: Here we remark that it is easy to check that the above Radon-Nykodym derivatives are all well-defined; see, e.g., Theorem 7.7 of [61] for sufficient conditions for their existence. Based on the received output y ∈ C[0, T ], the receiver chooses the pair (i, j) such that if such a pair (i, j) exists and is unique; otherwise, an error is declared. Moreover, an error will be declared if the chosen codeword does not satisfy the power constraint in (33).
Then,P (T ) e , the error probability for the above coding scheme (where codewords violating the power constraint are allowed), can be upper bounded as follows: So, for any i, j = 1, we havê P (T ) e ≤ P (π (T ) ) + P (E c 11 ) + e T R 1 P (E i1 ) + e T R 2 P (E 1j ) + e T R 1 +T R 2 P (E ij ) Using the well-known fact that an Ornstein-Uhlenbeck process is ergodic [60,56], we deduce that P (π (T ) ) → 0 as T → ∞. And by Lemma E.3 and Theorem 6.2.1 in [45], we have where we have used the independence of X 1 and X 2 , and the consequent fact that I T (X 1 ; X 2 , Y ) = I T (X 1 ; X 2 ) + I T (X 1 ; Y |X 2 ) = I T (X 1 ; Y |X 2 ).
By Lemma E.1, one can choose independent OU processes X 1 , X 2 such that I T (X 1 ; Y |X 2 )/T → (P 1 − )/2, I T (X 2 ; Y |X 1 )/T → (P 2 − )/2 and I T (X 1 , X 2 ; Y )/T → (P 1 + P 2 − 2 ) uniformly in T . This implies that with chosen sufficiently small, we haveP (T ) e → 0, as T → ∞. In other words, there exists a sequence of good codes (which may not satisfy the power constraint) with low average error probability. Now, from each of the above codes, we delete the worse half of the codewords (any codeword violating the power constraint will be deleted since it must have error probability 1). Then, with only slightly decreased transmission rate, the remaining codewords will satisfy the power constraint and will have small maximum error probability (and thus small average error probability P (T ) e ), which implies that the rate pair (R 1 , R 2 ) is achievable.
Remark E.4. The achievability part can be proven alternatively, which will be roughly described as follows: for arbitrarily small > 0, by Lemma E.1, one can choose independent Ornstein-Uhlenbeck processes X i with respective variances P i − , i = 1, 2, such that I T (X i ; Y )/T approaches (P i − )/2. Then, a parallel random coding argument with X j , j = i, being treated as noise at receiver i shows that the rate pair ((P 1 − )/2, (P 2 − )/2) can be approached, which yields the achievability part.

F Proof of Theorem 5.4
For notational convenience only, we only prove the case when n = 2; the case when n is generic is similar.
The converse part. In this part, we will show that for any sequence of (T, (e T R 1 , e T R 2 ), (P 1 , P 2 )) codes with P (T ) e → 0, the rate pair (R 1 , R 2 ) will have to satisfy R 1 ≤ a 2 11 P 1 /2, R 2 ≤ a 2 22 P 2 /2.
Fix T and consider the above-mentioned (T, (e T R 1 , e T R 2 ), (P 1 , P 2 )) code. By the code construction, for i = 1, 2, it is possible to estimate the messages M i from the channel output Y T i,0 with an arbitrarily low probability of error. Hence, by Fano's inequality, for i = 1, 2, where ε i,T → 0 as T → ∞. We then have which implies that R 1 ≤ a 2 11 P 1 /2. With a parallel argument, one can derive that R 2 ≤ a 2 22 P 2 /2. The proof for the converse part is then complete. The achievability part. We only sketch the proof of this part. For arbitrarily small > 0, by Lemma E.1, one can choose independent Ornstein-Uhlenbeck processes X i with respective variances P i − , i = 1, 2, such that I T (X i ; Y )/T approaches a 2 ii (P i − )/2. Then, a parallel random coding argument as in the proof of Theorem 5.1 with X j , j = i, being treated as noise at receiver i shows that the rate pair (a 2 11 (P 1 − )/2, a 2 22 (P 2 − )/2) can be approached, which yields the achievability part.

G Proof of Theorem 5.7
One of the important tools that plays a key role in discrete-time network information theory is the entropy power inequality [17,22], which can be applied to compare information-theoretic quantities involving different users. The following lemma, which, despite its strikingly different form, serves the typical function of a discrete-time entropy power inequality. We are now ready for the proof of Theorem 5.7.
Proof of Theorem 5.7. For notational convenience only, we prove the case when n = 2, the case when n is generic being parallel. The converse part. Without loss of generality, we assume that snr 1 ≥ snr 2 .
We will show that for any sequence of (T, (e T R 1 , e T R 2 ), P ) codes with P (T ) e → 0 as T → ∞, the rate pair (R 1 , R 2 ) will have to satisfy Fix T and consider the above-mentioned (T, (e T R 1 , e T R 2 ), P )-code. By the code construction, for i = 1, 2, it is possible to estimate the messages M i from the channel output Y T  103) and (44), immediately implies the converse part. The achievability part. We only sketch the proof of this part. For an arbitrarily small > 0, by Theorem 6.4.1 in [45], one can choose an Ornstein-Uhlenbeck processesX with variance P − , such that I T (X; Y i )/T approaches snr i (P − )/2. For any 0 ≤ λ ≤ 1, let where X 1 and X 2 are independent copies ofX. Then, by a similar argument as in the proof of Lemm E.1, we deduce that I T (X 1 ; Y 1 )/T, I T (X 2 ; Y 2 )/T approach snr 1 λ(P − )/2, snr 2 (1 − λ)(P − )/2, respectively. Then, a parallel random coding argument as in the proof of Theorem 5.1 such that • when encoding, X i only carries the message meant for receiver i; • when decoding, receiver i treats X j , j = i, as noise, shows that the rate pair (snr 1 λ(P − )/2, snr 2 (1 − λ)(P − )/2) can be approached, which immediately establishes the achievability part.
Remark G.2. For the achievability part, instead of using the power sharing scheme as in the proof, one can also employ the following time sharing scheme: set X to be X 1 for λ fraction of the time, and X 2 for 1 − λ fraction of the time. Then, it is straightforward to check this scheme also achieves the rate pair (snr 1 λ(P − )/2, snr 2 (1 − λ)(P − )/2). This, from a different perspective, echoes the observation in [58] that time sharing achieves the capacity region of a white Gaussian BC as the bandwidth limit tends to infinity.