Continuity of Channel Parameters and Operations under Various DMC Topologies

We study the continuity of many channel parameters and operations under various topologies on the space of equivalent discrete memoryless channels (DMC). We show that mutual information, channel capacity, Bhattacharyya parameter, probability of error of a fixed code and optimal probability of error for a given code rate and block length are continuous under various DMC topologies. We also show that channel operations such as sums, products, interpolations and Arıkan-style transformations are continuous.


I. INTRODUCTION
Let X and Y be two finite sets and let W be a fixed channel with input alphabet X and output alphabet Y.It is well known that the input-output mutual information is continuous on the simplex of input probability distributions.Many other parameters that depend on the input probability distribution were shown to be continuous on the simplex in [1].
If X and Y are finite sets, the space of channels with input alphabet X and output alphabet Y can be naturally endowed with the topology of the Euclidean metric, or any other equivalent metric.It is well known that the channel capacity is continuous in this topology.If X and Y are arbitrary, one can construct a topology on the space of channels using the weak- * topology on the output alphabet.It was shown in [2] that the capacity is lower semi-continuous in this topology.
The continuity results that are mentioned in the previous paragraph do not take into account "equivalence" between channels.Two channels are said to be equivalent if they are degraded from each other.This means that each channel can be simulated from the other by local operations at the receiver.Two channels that are degraded from each other are completely equivalent from an operational point of view: both channels have exactly the same probability of error under optimal decoding for any fixed code.Moreover, any sub-optimal decoder for one channel can be transformed to a sub-optimal decoder for the other channel with the same probability of error and essentially the same computational complexity.This is why it makes sense, from an information-theoretic point of view, to identify equivalent channels and consider them as one point in the space of "equivalent channels".
In [3], equivalent binary-input channels were identified with their L-density (i.e., the density of log-likelihood ratios).The space of equivalent binary-input channels was endowed with the topology of convergence in distribution of L-densities.
Since the symmetric capacity 1 and the Bhattacharyya parameter can be written as an integral of a continuous function with respect to the L-density [3], it immediately follows that these parameters are continuous in the L-density topology.
In [4], many topologies were constructed for the space of equivalent channels sharing a fixed input alphabet.In this paper, we study the continuity of several channel parameters and operations under these topologies.
The continuity of channel parameters is important both theoretically and practically.If a parameter (such as the optimal probability of error of a given code) is difficult to compute for a channel W , one can approximate it by computing the same parameter for a sequence of channels (W n ) n≥0 that converges to W in some topology.Another application of the continuity of channel parameters and operations is the study of robustness of various system designs against the imperfect specification of the channel.

II. PRELIMINARIES
We assume that the reader is familiar with the basic concepts of general topology.If the reader finds any notation or concept unfamiliar, he can find its formal definition in the preliminaries section of [4].Moreover, due to the space limitation, we are only able to explain the intuition behind the definitions and state the main results.The proofs can be found in [5].

A. Measure-theoretic notations
If (M, Σ) is a measurable space, we denote the set of probability measures on (M, Σ) as P(M, Σ).Let P ∈ P(M, Σ), and let f : M → M be a measurable mapping from (M, Σ) to another measurable space (M , Σ ).The push-forward probability measure of P by f is the probability measure f # P on (M , Σ ) defined as We denote the product of two probability measures P 1 ∈ P(M 1 , Σ 1 ) and P 2 ∈ P(M 2 , Σ 2 ) as P 1 × P 2 .

B. Random mappings
Let M and M be two arbitrary sets and let Σ be a σalgebra on M .A random mapping from M to (M , Σ ) is a mapping R from M to P(M , Σ ), i.e., for every x ∈ M we associate a probability distribution R(x) on M .R(x) can be interpreted as the probability distribution of the random output given that the input is x.
Let Σ be a σ-algebra on M .We say that R is a measurable random mapping from (M, Σ) to (M , Σ ) if the mapping R B : M → R defined as R B (x) = (R(x))(B) is measurable for every B ∈ Σ .Note that this definition of measurability is consistent with the measurability of ordinary mappings: let f be a mapping from M to M and let D f : M → P(M , Σ ) be the random mapping defined as D f (x) = δ f (x) for every x ∈ M , where δ f (x) ∈ P(M , Σ ) is a Dirac measure centered at f (x).D f is a measurable random mapping if and only if f is a measurable mapping.
Let P be a probability measure on (M, Σ) and let R be a measurable random mapping from (M, Σ) to (M , Σ ).The push-forward probability measure of P by R is the probability measure R # P on (M , Σ ) defined as: Note that this definition is consistent with the push-forward of ordinary mappings: if f and D f are as above, then (D f ) # P = f # P .

C. Meta-probability measures
Let X be a finite set.We denote the set of probability distributions on X as Δ X .A meta-probability measure on X is a probability measure on the Borel sets of Δ X .It is called a meta-probability measure because it is a probability measure on the space of probability distributions on X .We denote the set of meta-probability measures on X as MP(X ).
Let f be a mapping from a finite set X to another finite set X .f induces a push-forward mapping f # taking probability distributions in Δ X to probability distributions in Δ X .f # in turn induces another push-forward mapping taking metaprobability measures in MP(X ) to meta-probability measures in MP(X ).We denote this mapping as f ## and we call it the meta-push-forward mapping induced by f .The mapping f ## is continuous under both the weak- * and the total variation topologies.
Let X 1 and X 2 be two finite sets.Let Mul : under both the weak- * and the total variation topologies.

D. DMC topologies
In this subsection, we describe the topologies that we introduced in [4].A more detailed summary can be found in Section III of [5].
We denote the set of all channels with input alphabet X and output alphabet Y as DMC X ,Y .The space of channels with input alphabet X is defined as where [n] = {1, . . ., n} and is the disjoint union symbol.Two channels with input alphabet X are said to be equivalent if they are degraded from each other.The equivalence relation on DMC X ,Y is denoted as The topology on DMC X ,Y that is induced by d X ,Y is denoted as T X ,Y .The quotient topological space obtained from X , * is the finest topology that makes the inclusion mappings from DMC X , * continuous for every n ≥ 1.
In [4], we introduced the noisiness metric d This metric compares the "noisiness levels" between channels.The noisiness topology Let W be a channel with input alphabet X , and assume a uniform prior probability distribution on X .The metaprobability measure that describes the possible posterior probability distributions is called the Blackwell measure 2 of W , and it is denoted as MP W . Two channels are equivalent if and only if they have the same Blackwell measures [6].Moreover, there is a canonical bijection between DMC (o) X , * and the set of balanced and finitely supported meta-probability measures MP bf (X ) [6].This allows us to construct the weak- * topology and the total variation topology T T V,X , * on DMC X , * through the canonical bijection.We showed in [4] that the weak- * topology is exactly the same as the noisiness topology T (o) X , * .

A. Useful parameters
For every p ∈ Δ X and every W ∈ DMC X ,Y , define I(p, W ) as the mutual information I(X; Y ) computed using the natural logarithm, where X is distributed as p and Y is the output of W when X is the input.The capacity of W is defined as For every p ∈ Δ X , the error probability of the MAP decoder of W under prior p is defined as: Clearly, 0 ≤ P e (p, W ) ≤ 1.
For every W ∈ DMC X ,Y , define the Bhattacharyya parameter of W as It was shown in [7] and [8] that where π X is the uniform distribution on X .An (n, M )-code C on the alphabet X is a subset of X n satisfying |C| = M .The blocklength of C is n, and M is the size of the code.The rate of C is 1 n log M , and it is measured in nats.The error probability of the ML decoder for the code C when it is used for a channel W ∈ DMC X ,Y is given by: The optimal error probability of (n, M )-codes for a channel W is given by: It is well know that all these parameters depend only on the R X ,Y .We can show that these parameters are continuous on (DMC [5]).

B. Channel operations
For every two channels W 1 ∈ DMC X1,Y1 and W 2 ∈ DMC X2,Y2 , define the channel sum W 1 ⊕ W 2 ∈ DMC X1 X2,Y1 Y2 of W 1 and W 2 as: where ) is the disjoint union of X 1 and X 2 .W 1 ⊕ W 2 arises when the transmitter has two channels W 1 and W 2 at his disposal and he can use exactly one of them at each channel use.We define the channel product W 1 ⊗ W 2 arises when the transmitter has two channels W 1 and W 2 at his disposal and he uses both of them at each channel use.Channel sums and products were first introduced by Shannon in [9].
For every W 1 ∈ DMC X ,Y1 , W 2 ∈ DMC X ,Y2 and every Channel interpolation arises when a channel behaves as W 1 with probability α and as W 2 with probability 1 − α.The transmitter has no control on which behavior the channel chooses, but on the other hand, the receiver knows which behavior was chosen.Channel interpolations were used in [10] to construct interpolations between polar codes and Reed-Muller codes.Fix a binary operation * on X .For every W ∈ DMC X ,Y , define W − ∈ DMC X ,Y 2 and W + ∈ DMC X ,Y 2 ×X as: and These operations generalize Arıkan's polarization transformations [11].
All the channel operations that are defined here can be "quotiented" by the equivalence relations.We just need to realize that the equivalence class of the resulting channel depends only on the equivalence classes of the channels that were used in the operation (see [5]).For example, for every We can similarly define the other operations on the quotient spaces similarly.All these operations are continuous on the quotient spaces (see [5]).

IV. CONTINUITY IN THE STRONG TOPOLOGY
Since the channel parameters I, C, P e , Z, P e,C and P e,n,M are defined on DMC Theorem 1.Let U X be the standard topology on Δ X .We have: s,X , * ).
• For every code C on X , P e,C : DMC s,X , * ).
• For every n > 0 and every 1 ≤ M ≤ |X| n , the mapping P e,n,M : DMC It is also possible to extend the definition of all the channel operations that were defined in section III-B to DMC Theorem 2. Assume that all spaces of equivalent channels are endowed with the strong topology.We have: X , * is continuous.• For any binary operation * on X , the mapping X , * is continuous.• For any binary operation * on X , the mapping X , * is continuous.
s,X , * ) is strongly contractible 3 to every point in DMC We need to express the channel parameters and operations in terms of the Blackwell measures.

A. Channel parameters
The following proposition shows that many channel parameters can be expressed as an integral of a continuous function with respect to the Blackwell measure: X , * , we have: For every code C ⊂ X n , we have .
where H(p) is the entropy of p, and MP n Ŵ is the product measure on Δ n X obtained by multiplying MP Ŵ with itself n times.Note that we adopt the standard convention that 0 log 0 0 = 0. Theorem 3. Let U X be the standard topology on Δ X and let T V,X , * .We have: 3 See [5] for the definition of strong contractibility.
• For every code C on X , P e,C : DMC X , * , T ).
• For every n > 0 and every 1 ≤ M ≤ |X| n , the mapping P e,n,M : DMC

B. Channel operations
In the following, we show that we can express the channel operations in terms of Blackwell measures.We have all the tools to achieve this for the channel sum, channel product and channel interpolation.In order to express the channel polarization transformations in terms of the Blackwell measures, we need to introduce new definitions.
Let X be a finite set and let * be a binary operation on X .We say that * is uniformity preserving if the mapping (a, b) → (a * b, b) is a bijection from X 2 to itself [12].For every a, b ∈ X , we denote the unique element c ∈ X satisfying c * b = a as c = a/ * b.It was shown in [8] that a binary operation * is polarizing if and only if it is uniformity preserving and / * is strongly ergodic.Binary operations that are not uniformity preserving are not interesting for polarization theory because they do not preserve the symmetric capacity [8].Therefore, we will focus only on polarization transformations that are based on uniformity preserving binary operations.
Let * be a fixed uniformity preserving binary operation on X .Define the mapping C −, * : Δ X × Δ X → Δ X as The probability distribution C −, * (p 1 , p 2 ) can be interpreted as follows: let X 1 and X 2 be two independent random variables in X that are distributed as p 1 and p 2 respectively, and let (U 1 , U 2 ) be the random pair in X 2 defined as For every MP 1 , MP 2 ∈ MP(X ), we define the (−, * )convolution of MP 1 and MP 2 as: For every p 1 , p 2 ∈ Δ X and every ) .
For every MP 1 , MP 2 ∈ MP(X ), we define the (+, * )convolution of MP 1 and MP 2 as: We have: X2, * , we have: where MP Ŵ1 (respectively MP Ŵ2 ) is the meta-pushforward of MP Ŵ1 (respectively MP Ŵ2 ) by the canonical injection from X 1 (respectively • For every α ∈ [0, 1] and every Ŵ1 , Ŵ2 ∈ DMC X , * , we have • For every uniformity preserving binary operation * on X , and every Ŵ ∈ DMC Note that the polarization transformation formulas in Proposition 2 generalize the formulas given by Raginsky in [13] for binary-input channels.
Theorem 4. Assume that all spaces of equivalent channels are endowed with the noisiness/weak- * or the total variation topology.We have: It is worth mentioning that Theorem 1 can be proven from Theorem 3 because the noisiness topology is coarser than the strong topology.The main reason why we provided another proof for Theorem 1 in [5] is to show that the quotient formulation of the strong topology makes it easy to work with.
The continuity of the channel sum and the channel product on the whole product space (DMC

X
, * by transporting the corresponding topologies from MP bf (X ) to DMC (o)

X
,Y -equivalence class of W .Therefore, we can define those parameters for every Ŵ ∈ DMC (o)

X
,[n]  for every n ≥ 1 (see Section III-A), they are also defined on DMC

X
, * .2017IEEE International Symposium on Information Theory (ISIT) IN THE NOISINESS/WEAK- * AND THE TOTAL VARIATION TOPOLOGIES