Topological Structures on DMC Spaces †

Nasser, Rajai

doi:10.3390/e20050343

Open AccessArticle

Topological Structures on DMC Spaces ^†

by

Rajai Nasser

^†

École Polytechnique Fédérale de Lausanne, Route Cantonale, 1015 Lausanne, Switzerland

^†

This paper is an extended version of our paper that is published in the International Symposium on Information Theory 2017 (ISIT 2017).

Entropy 2018, 20(5), 343; https://doi.org/10.3390/e20050343

Submission received: 25 March 2018 / Revised: 19 April 2018 / Accepted: 27 April 2018 / Published: 4 May 2018

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

:

Two channels are said to be equivalent if they are degraded from each other. The space of equivalent channels with input alphabet

X

and output alphabet

Y

can be naturally endowed with the quotient of the Euclidean topology by the equivalence relation. A topology on the space of equivalent channels with fixed input alphabet

X

and arbitrary but finite output alphabet is said to be natural if and only if it induces the quotient topology on the subspaces of equivalent channels sharing the same output alphabet. We show that every natural topology is

σ

-compact, separable and path-connected. The finest natural topology, which we call the strong topology, is shown to be compactly generated, sequential and

T_{4}

. On the other hand, the strong topology is not first-countable anywhere, hence it is not metrizable. We introduce a metric distance on the space of equivalent channels which compares the noise levels between channels. The induced metric topology, which we call the noisiness topology, is shown to be natural. We also study topologies that are inherited from the space of meta-probability measures by identifying channels with their Blackwell measures.

Keywords:

discrete memoryless channels; topology; Blackwell measure; total-variation distance

1. Introduction

This paper is an extended version of the paper published in International Symposium on Information Theory 2017 (ISIT 2017) [1].

A topology on a given set is a mathematical structure that allows us to formally talk about the neighborhood of a given point of the set. This makes it possible to define continuous mappings and converging sequences. Topological spaces generalize metric spaces which are mathematical structures that specify distances between the points of the space. Links between information theory and topology were investigated in [2]. In this paper, we aim to construct meaningful topologies and metrics for the space of equivalent channels sharing a common input alphabet.

Let

X

and

Y

be two fixed finite sets. Every discrete memoryless channel (DMC) with input alphabet

X

and output alphabet

Y

can be determined by its transition probabilities. Since there are

| X | \times | Y |

such probabilities, the space of all channels from

X

to

Y

can be seen as a subset of

R^{| X | \times | Y |}

. Therefore, this space can be naturally endowed with the Euclidean metric, or any other equivalent metric. A generalization of this topology to infinite input and output alphabets was considered in [3].

There are a few drawbacks to this approach. For example, consider the case where

X = Y = F_{2} : = {0, 1}

. The binary symmetric channels

BSC (ϵ)

and

BSC (1 - ϵ)

have non-zero Euclidean distance if

ϵ \neq \frac{1}{2}

. On the other hand,

BSC (ϵ)

and

BSC (1 - ϵ)

are completely equivalent from an operational point of view: both channels have exactly the same probability of error under optimal decoding for any fixed code. Moreover, any sub-optimal decoder for one channel can be transformed to a sub-optimal decoder for the other channel without changing the probability of error nor the computational complexity. This is why it makes sense, from an information-theoretic point of view, to identify equivalent channels and consider them as one point in the space of “equivalent channels”.

The limitation of the Euclidean metric is clearer when we consider channels with different output alphabets. For example,

BSC (\frac{1}{2})

and

BEC (1)

are completely equivalent but they do not have the same output alphabet, and so there is no way to compare them with the Euclidean metric because they do not belong to the same space.

The standard approach to solve this problem is to find a “canonical sufficient statistic” and find a representation of each channel in terms of this sufficient statistic. This makes it possible to compare channels with different output-alphabets. One standard sufficient statistic that has been widely used for binary-input channels is the log-likelihood ratio. Each binary-input channel can be represented as a density of log-likelihood ratios (called L-density in [4]). This representation makes it possible to “topologize” the space of “equivalent” binary-input channels by considering the topology of convergence in distribution [4]. A similar approach can be adopted for non-binary-input channels (see [5,6]). Another (equivalent) way to “topologize” the space of equivalent channels is by using the Le Cam deficiency distance [7].

One issue (which is secondary and only relevant for conceptual purposes) with the current formulation of this topology is that it does not allow us to see it as a “natural topology”. Consider a fixed output alphabet

Y

and let us focus on the space of “equivalent channels” from

X

to

Y

. Since this space is the quotient of the space of channels from

X

to

Y

, which is naturally topologized by the Euclidean metric, it seems that the most natural topology on this space is the quotient of the Euclidean topology by the equivalence relation. This motivates us to consider a topology on the space of “equivalent channels” with input alphabet

X

and arbitrary but finite output alphabet as natural if and only if it induces the quotient topology on the subspaces of “equivalent channels” from

X

to

Y

for any output alphabet

Y

. A legitimate question to ask now is whether the L-density topology is natural in this sense or not.

In this paper, we study general and particular natural topologies on DMC spaces. In Section 2, we provide a brief summary of the basic concepts and theorems in general topology. The measure-theoretic notations that we use are introduced in Section 3. The space of channels from

X

to

Y

and its topology is studied in Section 4. We formally define the equivalence relation between channels in Section 5. It is shown that the equivalence class of a channel can be determined by the distribution of its posterior probability distribution. This is the standard generalization of L-densities to non-binary-input channels. This distribution is called the Blackwell measure of the channel. In Section 6, we study the space of equivalent channels from

X

to

Y

and the quotient topology.

In Section 7, we define the space of equivalent channels with input alphabet

X

and we study the properties of general natural topologies. The finest natural topology, which we call the strong topology, is studied in Section 8. A metric for the space of equivalent channels is proposed in Section 9. The induced topology by this metric is called the noisiness topology. In Section 10, we study the topologies that are inherited from the space of meta-probability measures by identifying equivalent channels with their Blackwell measures. We show that the weak-* topology (which is the standard generalization of the L-density topology to non-binary-input channels) is exactly the same as the noisiness topology. The total variation topology is also investigated in Section 10. The Borel

σ

-algebra of Hausdorff natural topologies is studied in Section 11.

The continuity (under the topologies introduced here) of mappings that are relevant to information theory (such as capacity, mutual information, Bhattacharyya parameter, probability of error of a fixed code, optimal probability of error of a given rate and blocklength, channel sums and products, etc, …) is studied in [8].

2. Preliminaries

In this section, we recall basic definitions and well known theorems in general topology. The reader who is already familiar with the basic concepts of topology may skip this section and refer to it later if necessary. Proofs of all non-referenced facts can be found in any standard textbook on General Topology (e.g., [9]). Definitions and theorems that may not be widely known can be found in Section 2.10, Section 2.14 and Section 2.15.

2.1. Set-Theoretic Notations

For every integer

n > 0

, we denote the set

{1, \dots, n}

as

[n]

.

The set of mappings from a set A to a set B is denoted as

B^{A}

.

Let A be a subset of B. The indicator mapping

𝟙_{A, B} : B \to {0, 1}

of A in B is defined as:

𝟙_{A, B} (x) = 𝟙_{x \in A} = \{\begin{matrix} 1 & if x \in A, \\ 0 & otherwise . \end{matrix}

If the superset B is clear from the context, we simply write

𝟙_{A}

to denote the indicator mapping of A in B.

The power set of B is the set of subsets of B. Since every subset of B can be identified with its indicator mapping, we denote the power set of B as

2^{B} : = {0, 1}^{B}

.

A collection

A \subset 2^{B}

of subsets of B is said to be finer than another collection

A^{'} \subset 2^{B}

if

A^{'} \subset A

. If this is the case, we also say that

A^{'}

is coarser than

A

.

Let

{(A_{i})}_{i \in I}

be a collection of arbitrary sets indexed by I. The disjoint union of

{(A_{i})}_{i \in I}

is defined as

\underset{i \in I}{∐} A_{i} = ⋃_{i \in I} (A_{i} \times {i})

. For every

i \in I

, the

i^{t h}

-canonical injection is the mapping

ϕ_{i} : A_{i} \to \underset{j \in I}{∐} A_{j}

defined as

ϕ_{i} (x_{i}) = (x_{i}, i)

. If no confusions can arise, we can identify

A_{i}

with

A_{i} \times {i}

through the canonical injection. Therefore, we can see

A_{i}

as a subset of

\underset{j \in I}{∐} A_{j}

for every

i \in I

.

A relation R on a set T is a subset of

T \times T

. For every

x, y \in T

, we write

x R y

to denote

(x, y) \in R

.

A relation is said to be reflexive if

x R x

for every

x \in T

. It is symmetric if

x R y

implies

y R x

for every

x, y \in T

. It is anti-symmetric if

x R y

and

y R x

imply

x = y

for every

x, y \in T

. It is transitive if

x R y

and

y R z

imply

x R z

for every

x, y, z \in T

.

An order relation is a relation that is reflexive, anti-symmetric and transitive. An equivalence relation is a relation that is reflexive, symmetric and transitive.

Let R be an equivalence relation on T. For every

x \in T

, the set

\hat{x} = {y \in T : x R y}

is the R-equivalence class of x. The collection of R-equivalence classes, which we denote as

T / R

, forms a partition of T, and it is called the quotient space of T by R. The mapping

{Proj}_{R} : T \to T / R

defined as

{Proj}_{R} (x) = \hat{x}

for every

x \in T

is the projection mapping onto

T / R

.

2.2. Topological Spaces

A topological space is a pair

(T, U)

, where

U \subset 2^{T}

is a collection of subsets of T satisfying:

$\emptyset \in U$ and $T \in U$ .
The intersection of a finite collection of members of $U$ is also a member of $U$ .
The union of an arbitrary collection of members of $U$ is also a member of $U$ .

If

(T, U)

is a topological space, we say that

U

is a topology on T.

The power set

2^{T}

of T is clearly a topology. It is called the discrete topology on T.

If

A

is a an arbitrary collection of subsets of T, we can construct a topology on T starting from

A

as follows:

⋂_{\begin{matrix} A \subset V \subset 2^{T}, \\ V is a topology on T \end{matrix}} V .

This is the coarsest topology on T that contains

A

. It is called the topology on T generated by

A

.

Let

(T, U)

be a topological space. The subsets of T that are members of

U

are called the open sets of T. Complements of open sets are called closed sets. We can easily see that the closed sets satisfy the following:

∅ and T are closed.
The union of a finite collection of closed sets is closed.
The intersection of an arbitrary collection of closed sets is closed.

Let A be an arbitrary subset of T. The closure

cl (A)

of A is the smallest closed set containing A:

cl (A) = ⋂_{\begin{matrix} A \subset F \subset T, \\ F is closed \end{matrix}} F .

The interior

A^{\circ}

of A is the largest open subset of A:

A^{\circ} = ⋃_{\begin{matrix} U \subset A, \\ U is open \end{matrix}} U .

If

A \subset T

and

cl (A) = T

, we say that A is dense in T.

(T, U)

is said to be separable if there exists a countable subset of T that is dense in T.

A subset O of T is said to be a neighborhood of

x \in T

if there exists an open set

U \in U

such that

x \in U \subset O

.

A neighborhood basis of

x \in T

is a collection

O

of neighborhoods of x such that for every neighborhood O of x, there exists

O^{'} \in O

such that

O^{'} \subset O

.

We say that

(T, U)

is first-countable if every point

x \in T

has a countable neighborhood basis.

A collection of open sets

B \subset U

is said to be a base for the topology

U

if every open set

U \in U

can be written as the union of elements of

B

.

We say that

(T, U)

is a second-countable space if the topology

U

has a countable base.

It is a well known fact that every second-countable space is first-countable and separable.

We say that a sequence

{(x_{n})}_{n \geq 0}

of elements of T converges to

x \in T

if for every neighborhood O of x, there exists

n_{0} \geq 0

such that for every

n \geq n_{0}

, we have

x_{n} \in O

. We say that x is a limit of the sequence

{(x_{n})}_{n \geq 0}

. Note that the limit does not need to be unique if there is no constraint on the topology.

2.3. Separation Axioms

(T, U)

is said to be a

T_{1}

-space if for every

x, y \in T

, there exists an open set

U \in U

such that

x \in U

and

y \notin U

. It is easy to see that

(T, U)

is

T_{1}

if and only if all singletons are closed.

(T, U)

is said to be a Hausdorff space (or

T_{2}

-space) if for every

x, y \in T

, there exist two open sets

U, V \in U

such that

x \in U

,

y \in V

and

U \cap V = \emptyset

.

If

(T, U)

is Hausdorff, the limit of every converging sequence is unique.

(T, U)

is said to be regular if for every

x \in T

and every closed set F not containing x, there exist two open sets

U, V \in U

such that

x \in U

,

F \subset V

and

U \cap V = \emptyset

.

(T, U)

is said to be normal if for every two disjoint closed sets A and B, there exist two open sets

U, V \in U

such that

A \subset U

,

B \subset V

and

U \cap V = \emptyset

.

If

(T, U)

is normal, disjoint closed sets can be separated by disjoint closed neighborhoods. i.e., for every two disjoint closed sets A and B, there exist two open sets

U, U^{'} \in U

and two closed sets

K, K^{'}

such that

A \subset U \subset K

,

B \subset U^{'} \subset K^{'}

and

K \cap K^{'} = \emptyset

.

(T, U)

is said to be a

T_{3}

-space if it is both

T_{1}

and regular.

(T, U)

is said to be a

T_{4}

-space if it is both

T_{1}

and normal.

It is easy to see that

T_{4} \Rightarrow T_{3} \Rightarrow T_{2} \Rightarrow T_{1}

.

2.4. Relativization

If

(T, U)

is a topological space and A is an arbitrary subset of T, then A inherits a topology

U_{A}

from

(T, U)

as follows:

U_{A} = {A \cap U : U \in U} .

It is easy to check that

U_{A}

is a topology on A.

If

(T, U)

is first-countable (respectively second-countable, or Hausdorff), then

(A, U_{A})

is first-countable (respectively second-countable, or Hausdorff).

If

(T, U)

is normal and A is closed, then

(A, U_{A})

is normal.

The union of a countable number of separable subspaces is separable.

2.5. Continuous Mappings

Let

(T, U)

and

(S, V)

be two topological spaces. A mapping

f : T \to S

is said to be continuous if for every

V \in V

, we have

f^{- 1} (V) \in U

.

f : T \to S

is an open mapping if

f (U) \in V

whenever

U \in U

.

f : T \to S

is a closed mapping if

f (F)

is closed in S whenever F is closed in T.

A bijection

f : T \to S

is a homeomorphism if both f and

f^{- 1}

are continuous. In this case, for every

A \subset T

,

A \in U

if and only if

f (A) \in V

. This means that

(T, U)

and

(S, V)

have the same topological structure and share the same topological properties.

2.6. Compact Spaces and Sequentially Compact Spaces

(T, U)

is a compact space if every open cover of T admits a finite sub-cover, i.e., if

{(U_{i})}_{i \in I}

is a collection of open sets such that

T = ⋃_{i \in I} U_{i}

then there exists

n > 0

and

i_{1}, \dots, i_{n} \in I

such that

T = ⋃_{j = 1}^{n} U_{i_{j}}

.

If

(T, U)

is compact, then every closed subset of T is compact (with respect to the inherited topology).

If

f : T \to S

is a continuous mapping from a compact space

(T, U)

to an arbitrary topological space

(S, V)

, then

f (T)

is compact.

If A is a compact subset of a Hausdorff topological space, then A is closed.

(T, U)

is said to be locally compact if every point has at least one compact neighborhood. A compact space is automatically locally compact.

If

(T, U)

is Hausdorff and locally compact, then for every point

x \in T

and every neighborhood O of x, O contains a compact neighborhood of x.

A compact Hausdorff space is always normal.

(T, U)

is a σ-compact space if it is the union of a countable collection of compact subspaces.

(T, U)

is countably compact if every countable open cover of T admits a finite sub-cover. This is a weaker condition compared to compactness.

(T, U)

is said to be sequentially compact if every sequence in T has a converging subsequence. In general, compactness does not imply sequential compactness nor the other way around.

2.7. Connected Spaces

(T, U)

is a connected space if it satisfies one of the following equivalent conditions:

T cannot be written as the union of two disjoint non-empty open sets.
T cannot be written as the union of two disjoint non-empty closed sets.
The only subsets of T that are both open and closed are ∅ and T.
Every continuous mapping from T to ${0, 1}$ is constant, where ${0, 1}$ is endowed with the discret topology.

(T, U)

is path-connected if every two points of T can be joined by a continuous path. I.e., for every

x, y \in T

, there exists a continuous mapping

f : [0, 1] \to T

such that

f (0) = x

and

f (1) = y

, where

[0, 1]

is endowed with the well known Euclidean topology (See Section 2.11 for the definition of the Euclidean metric and its induced topology).

A path-connected space is connected but the converse is not true in general.

A subset A of T is said to be connected (respectively path-connected) if

(A, U_{A})

is connected (respectively path-connected).

If

{(A_{i})}_{i \in I}

is a collection of connected (respectively path-connected) subsets of T such that

⋂_{i \in I} A_{i} \neq \emptyset

, then

⋃_{i \in I} A_{i}

is connected (respectively path-connected).

2.8. Product of Topological Spaces

Let

{(T_{i}, U_{i})}_{i \in I}

be a collection of topological spaces indexed by I. Let

T = \prod_{i \in I} T_{i}

be the product of this collection. For every

j \in I

, the

j^{t h}

-canonical projection is the mapping

{Proj}_{j} : T \to T_{j}

defined as

{Proj}_{j} ({(x_{i})}_{i \in I}) = x_{j}

.

The product topology

U : = ⨂_{i \in I} U_{i}

on T is the coarsest topology that makes all the canonical projections continuous. It can be shown that

U

is generated by the collection of sets of the form

\prod_{i \in I} U_{i}

, where

U_{i} \in U_{i}

for all

i \in I

, and

U_{i} \neq T_{i}

for only finitely many

i \in I

.

The product of

T_{1}

(respectively, Hausdorff, regular,

T_{3}

, compact, connected, or path-connected) spaces is

T_{1}

(respectively, Hausdorff, regular,

T_{3}

, compact, connected, or path-connected).

2.9. Disjoint Union

Let

{(T_{i}, U_{i})}_{i \in I}

be a collection of topological spaces indexed by I. Let

T = \underset{i \in I}{∐} T_{i}

be the disjoint union of this collection. The disjoint union topology

U : = ⨁_{i \in I} U_{i}

on T is the finest topology which makes all the canonical injections continuous. It can be shown that

U \in U

if and only if

U \cap T_{i} \in U_{i}

for every

i \in I

.

A mapping

f : T \to S

from

(T, U)

to a topological space

(S, V)

is continuous if and only if it is continuous on

T_{i}

for every

i \in I

.

The disjoint union of

T_{1}

(respectively Hausdorff) spaces is

T_{1}

(respectively Hausdorff). The disjoint union of two or more non-empty spaces is always disconnected.

Products are distributive with respect to the disjoint union, i.e., if

(S, V)

is a topological space then

S \times (\underset{i \in I}{∐} T_{i}) = \underset{i \in I}{∐} (S \times T_{i})

and

V \otimes (⨁_{i \in I} U_{i}) = ⨁_{i \in I} (V \otimes U_{i})

.

2.10. Quotient Topology

Let

(T, U)

be a topological space and let R be an equivalence relation on T. The quotient topology on

T / R

is the finest topology that makes the projection mapping

{Proj}_{R}

continuous. It is given by

U / R = \{\hat{U} \subset T / R : {Proj}_{R}^{- 1} (\hat{U}) \in U\} .

Lemma 1.

Let

f : T \to S

be a continuous mapping from

(T, U)

to

(S, V)

. If

f (x) = f (x^{'})

for every

x, x^{'} \in T

satisfying

x R x^{'}

, then we can define a transcendent mapping

f : T / R \to S

such that

f (\hat{x}) = f (x^{'})

for any

x^{'} \in \hat{x}

. f is well defined on

T / R

. Moreover, f is a continuous mapping from

(T / R, U / R)

to

(S, V)

.

If

(T, U)

is compact (respectively, connected, or path-connected), then

(T / R, U / R)

is compact (respectively, connected, or path-connected).

T / R

is said to be upper semi-continuous if for every

\hat{x} \in T / R

and every open set

U \in U

satisfying

\hat{x} \subset U

, there exists an open set

V \in U

such that

\hat{x} \subset V \subset U

, and V can be written as the union of members of

T / R

.

The following Lemma characterizes upper semi-continuous quotient spaces:

Lemma 2.

[9]

T / R

is upper semi-continuous if and only if

{Proj}_{R}

is a closed mapping.

The following theorem is very useful to prove many topological properties for the quotient space:

Theorem 1.

[9] Let

(T, U)

be a topological space, and let R be an equivalence relation on T such that

T / R

is upper semi-continuous and

\hat{x}

is a compact subset of T for every

\hat{x} \in T / R

. If

(T, U)

is Hausdorff (respectively, regular, locally compact, or second-countable) then

(T / R, U / R)

is Hausdorff (respectively, regular, locally compact, or second-countable).

2.11. Metric Spaces

A metric space is a pair

(M, d)

, where

d : M \times M \to R^{+}

satisfies:

$d (x, y) = 0$ if and only if $x = y$ for every $x, y \in M$ .
Symmetry: $d (x, y) = d (y, x)$ for every $x, y \in M$ .
Triangle inequality: $d (x, z) \leq d (x, y) + d (y, z)$ for every $x, y, z \in M$ .

If

(M, d)

is a metric space, we say that d is a metric (or distance) on M.

The Euclidean metric on

R^{n}

is defined as

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

, where

x = {(x_{i})}_{1 \leq i \leq n}

and

y = {(y_{i})}_{1 \leq i \leq n}

.

R^{n}

is second countable. Moreover, a subset of

R^{n}

is compact if and only if it is bounded and closed.

For every

x \in M

and every

ϵ > 0

, we define the open ball of center x and radius

ϵ

as:

B_{ϵ} (x) = {y \in M : d (x, y) < ϵ} .

The metric topology

U_{d}

on Minduced by d is the coarsest topology on M which makes d a continuous mapping from

M \times M

to

R^{+}

. It is generated by all the open balls.

The metric topology is always

T_{4}

and first-countable. Moreover,

(M, U_{d})

is separable if and only if it is second-countable.

Since every metric space is Hausdorff, we can see that every subset of a compact metric space is closed if and only if it is compact.

Every

σ

-compact metric space is second-countable.

For metric spaces, compactness and sequential compactness are equivalent.

A function

f : M_{1} \to M_{2}

from a metric space

(M_{1}, d_{1})

to a metric space

(M_{2}, d_{2})

is said to be uniformly continuous if for every

ϵ > 0

, there exists

δ > 0

such that for every

x, x^{'} \in M_{1}

satisfying

d_{1} (x, x^{'}) < δ

we have

d_{2} (f (x), f (x^{'})) < ϵ

.

If

f : M_{1} \to M_{2}

is a continuous mapping from a compact metric space

(M_{1}, d_{1})

to an arbitrary metric space

(M_{2}, d_{2})

, then f is uniformly continuous.

A topological space

(T, U)

is said to be metrizable if there exists a metric d on T such that

U

is the metric topology on T induced by d.

The disjoint union of metrizable spaces is always metrizable.

The following theorem shows that all separable metrizable spaces are characterized topologically:

Theorem 2.

[9] A topological space

(T, U)

is metrizable and separable if and only if it is Hausdorff, regular and second countable.

2.12. Complete Metric Spaces

A sequence

{(x_{n})}_{n \geq 0}

is said to be a Cauchy sequence in

(M, d)

if for every

ϵ > 0

, there exists

n_{0} \geq 0

such that for every

n_{1}, n_{2} \geq n_{0}

we have

d (x_{n_{1}}, x_{n_{2}}) < ϵ

.

Every converging sequence is Cauchy, but the converse is not true in general.

A metric space is said to be complete if every Cauchy sequence converges in it.

A closed subset of a complete space is always complete.

A complete subspace of an arbitrary metric space is always closed.

Every compact metric space is complete, but the converse is not true in general.

For every metric space

(M, d)

, there exists a superspace

(\bar{M}, \bar{d})

containing M such that:

$(\bar{M}, \bar{d})$ is complete.
M is dense in $(\bar{M}, \bar{d})$ .
$\bar{d} (x, y) = d (x, y)$ for every $x, y \in M$ .

The space

(\bar{M}, \bar{d})

is said to be a completion of

(M, d)

.

2.13. Polish Spaces and Baire Spaces

A topological space

(T, U)

that is both separable and completely metrizable (i.e., has a metrization that is complete) is called a Polish space.

A topological space is said to be a Baire space if the intersection of countably many dense open subsets is dense. The following facts can be found in [10]:

Every completely metrizable space is Baire.
Every compact Hausdorff space is Baire.
Every open subset of a Baire space is Baire.

2.14. Sequential Spaces

Sequential spaces were introduced by Franklin [11] to answer the following question: Assume we know all the converging sequences of a topological space. Is this enough to uniquely determine the topology of the space? Sequential spaces are the most general category of spaces for which converging sequences suffice to determine the topology.

Let

(T, U)

be a topological space. A subset

U \subset T

is said to be sequentially open if for every sequence

{(x_{n})}_{n \geq 0}

that converges to a point of U lies eventually in U, i.e., there exists

n_{0} \geq 0

such that

x_{n} \in U

for every

n \geq n_{0}

. Clearly, every open subset of T is sequentially open, but the converse is not true in general.

A topological space

(T, U)

is said to be sequential if every sequentially open subset of T is open.

A mapping

f : T \to S

from a sequential topological space

(T, U)

to an arbitrary topological space

(S, V)

is continuous if and only if for every sequence

{(x_{n})}_{n \geq 0}

in T that converges to

x \in T

, the sequence

{(f (x_{n}))}_{n \geq 0}

converges to

f (x)

in

(S, V)

[11].

The following facts were shown in [11]:

Every first-countable space is sequential. Therefore, every metrizable space is sequential.
The quotient of a sequential space is sequential.
All closed and open subsets of a sequential space are sequential.
Every countably compact sequential Hausdorff space is sequentially compact.
A topological space is sequential if and only if it is the quotient of a metric space.

2.15. Compactly Generated Spaces

A topological space

(T, U)

is compactly generated if it is Hausdorff and for every subset F of T, F is closed if and only if

F \cap K

is closed for every compact subset K of T. Equivalently,

(T, U)

is compactly generated if it is Hausdorff and for every subset U of T, U is open in T if and only if

U \cap K

is open in K for every compact subset K of T.

The following facts can be found in [12]:

All locally compact Hausdorff spaces are compactly generated.
All first-countable Hausdorff spaces are compactly generated. Therefore, every metrizable space is compactly generated.
A Hausdorff quotient of a compactly generated space is compactly generated.
If $(T, U)$ is compactly generated and $(S, V)$ is Hausdorff locally compact, then $(T \times S, U \otimes V)$ is compactly generated.

3. Measure-Theoretic Notations

In this section, we introduce the measure-theoretic notations that we are using. We assume that the reader is familiar with the basic definitions and theorems of Measure Theory.

3.1. Probability Measures

If

A \subset 2^{M}

is a collection of subsets of M, we denote the

σ

-algebra that is generated by

A

as

σ (A)

.

The set of probability measures on

(M, Σ)

is denoted as

P (M, Σ)

. If the

σ

-algebra

Σ

is known from the context, we simply write

P (M)

to denote the set of probability measures.

If

P \in P (M, Σ)

and

{x}

is a measurable singleton, we simply write

P (x)

to denote

P ({x})

.

For every

P_{1}, P_{2} \in P (M, Σ)

, the total variation distance between

P_{1}

and

P_{2}

is defined as:

∥ P_{1} - P_{2} ∥_{T V} = sup_{A \in Σ} | P_{1} (A) - P_{2} (A) | .

The space

P (M, Σ)

is a complete metric space under the total variation distance.

3.2. Probabilities on Finite Sets

We always endow finite sets with their finest

σ

-algebra, i.e., the power set. In this case, every probability measure is completely determined by its value on singletons, i.e., if P is a measure on a finite set

X

, then for every

A \subset X

, we have

P (A) = \sum_{x \in A} P (x) .

If

X

is a finite set, we denote the set of probability distributions on

X

as

Δ_{X}

. Note that

Δ_{X}

is an

(| X | - 1)

-dimensional simplex in

R^{X}

. We always endow

Δ_{X}

with the total variation distance and its induced topology. For every

p_{1}, p_{2} \in Δ_{X}

, we have:

∥ p_{1} - p_{2} ∥_{T V} = \frac{1}{2} \sum_{x \in X} | p_{1} (x) - p_{2} (x) | = \frac{1}{2} {∥ p_{1} - p_{2} ∥}_{1} .

Note that the total variation topology on

Δ_{X}

is the same as the one inherited from the Euclidean topology of

R^{X}

by relativisation. Since

Δ_{X}

is a closed and bounded subset of

R^{X}

, it is compact.

3.3. Borel Sets and the Support of A Measure

Let

(T, U)

be a Hausdorff topological space. The Borel σ-algebra of

(T, U)

is the

σ

-algebra generated by

U

. We denote the Borel

σ

-algebra of

(T, U)

as

B (T, U)

. If the topology

U

is known from the context, we simply write

B (T)

to denote the Borel

σ

-algebra. The sets in

B (T)

are called the Borel sets of T.

The support of a probability measure

P \in P (T, B (T))

is the set of all points

x \in T

for which every neighborhood has a strictly positive measure:

supp (P) = {x \in T : P (O) > 0 for every neighborhood O of x} .

If P is a probability measure on a Polish space, then

P (T ∖ supp (P)) = 0

.

3.4. Convergence of Probability Measures and the Weak-* Topology

We have many notions of convergence of probability measures. If the measurable space does not have a topological structure, we have two notions of convergence:

The total-variation convergence: we say that a sequence ${(P_{n})}_{n \geq 0}$ of probability measures in $P (M, Σ)$ converges in total variation to $P \in P (M, Σ)$ if and only if $lim_{n \to \infty} {∥ P_{n} - P ∥}_{T V} = 0$ .
The strong convergence: we say that a sequence ${(P_{n})}_{n \geq 0}$ in $P (M, Σ)$ strongly converges to $P \in P (M, Σ)$ if and only if $lim_{n \to \infty} P_{n} (A) = P (A)$ for every $A \in Σ$ .

Clearly, total-variation convergence implies strong convergence. The converse is not true in general. However, if we are working in the Borel

σ

-algebra of a Polish space T and

{(P_{n})}_{n \geq 0}

strongly converges to a finitely supported probability measure P, then

\begin{matrix} ∥ P_{n} {- P ∥}_{T V} & = sup_{B \in B (T)} | P_{n} (B) - P (B) | \\ \leq sup_{B \in B (T)} (| P_{n} (B ∖ supp (P)) - P (B ∖ supp (P)) | + \sum_{x \in supp (P)} | P_{n} (x) - P (x) |) \\ = sup_{B \in B (T)} (| P_{n} (B ∖ supp (P)) | + \sum_{x \in supp (P)} | P_{n} (x) - P (x) |) \\ \leq | P_{n} (T ∖ supp (P)) | + \sum_{x \in supp (P)} | P_{n} (x) - P (x) | \\ = | P_{n} (T ∖ supp (P)) - P (T ∖ supp (P)) | + \sum_{x \in supp (P)} | P_{n} (x) - P (x) | \overset{n \to \infty}{⟶} 0, \end{matrix}

which implies that

{(P_{n})}_{n \geq 0}

also converges to P in total variation. Therefore, in a Polish space, total variation convergence and strong convergence to finitely supported probability measures are equivalent.

Let

(T, U)

be a Hausdorff topological space. We say that a sequence

{(P_{n})}_{n \geq 0}

of probability measures in

P (T, B (T))

weakly-* converges to

P \in P (T, B (T))

if and only if for every bounded and continuous function f from T to

R

, we have

lim_{n \to \infty} \int_{T} f \cdot d P_{n} = \int_{T} f \cdot d P .

Note that many authors call this notion “weak convergence" rather than weak-* convergence. We will refrain from using the term “weak convergence" in order to be consistent with the functional analysis notation.

The weak-* topology on

P (T, B (T))

is the coarsest topology which makes the mappings

P \to \int_{Δ_{X}} f \cdot d P

continuous over

P (T, B (T))

, for every bounded and continuous function f from T to

R

.

3.5. Metrization of the Weak-* Topology

If

(T, U)

is a Polish space, the weak-* topology on

P (T, B (T))

is also Polish [13]. There are many known metrizations for the weak-* topology. One metrization that is particularly convenient for us is the Wasserstein metric.

The 1^st-Wasserstein distance on

P (T, B (T))

is defined as

W_{1} (P, P^{'}) = inf_{γ \in Γ (P, P^{'})} \int_{T \times T} d (x, x^{'}) \cdot d γ (x, x^{'}),

where

Γ (P, P^{'})

is the collection of all probability measures on

T \times T

with marginals P and

P^{'}

on the first and second factors respectively, and d is a metric on T that induces the topology

U

.

Γ (P, P^{'})

is called the set of couplings of P and

P^{'}

.

If d is bounded and

(T, d)

is separable and complete, then

W_{1}

metrizes the weak-* topology [13]. If

(T, U)

is compact, then

(P (T), W_{1})

is also compact [13].

If

D = sup_{x, x^{'} \in T} d (x, x^{'})

is the diameter of

(T, d)

, then

W_{1} (P, P^{'}) \leq D {∥ P - P^{'} ∥}_{T V}

[13]. In other words, the Wasserstein metric is controlled by total variation.

3.6. Meta-Probability Measures

Let

X

be a finite set. A meta-probability measure on

X

is a probability measure on the Borel sets of

Δ_{X}

. It is called a meta-probability measure because it is a probability measure on the space of probability distributions on

X

.

We denote the set of meta-probability measures on

X

as

MP (X)

. Clearly,

MP (X) = P (Δ_{X})

.

A meta-probability measure MP on

X

is said to be balanced if it satisfies

\int_{Δ_{X}} p \cdot d MP (p) = π_{X},

where

π_{X}

is the uniform probability distribution on

X

.

We denote the set of all balanced meta-probability measures on

X

as

{MP}_{b} (X)

. The set of all balanced and finitely supported meta-probability measures on

X

is denoted as

{MP}_{b f} (X)

.

4. The Space of Channels from $X$ to $Y$

A discrete memoryless channel W is a 3-tuple

W = (X, Y, p_{W})

where

X

is a finite set that is called the input alphabet of W,

Y

is a finite set that is called the output alphabet of W, and

p_{W} : X \times Y \to [0, 1]

is a function satisfying

\forall x \in X, \sum_{y \in Y} p_{W} (x, y) = 1

.

For every

(x, y) \in X \times Y

, we denote

p_{W} (x, y)

as

W (y | x)

, which we interpret as the conditional probability of receiving y at the output, given that x is the input.

Let

{DMC}_{X, Y}

be the set of all channels having

X

as input alphabet and

Y

as output alphabet.

For every

W, W^{'} \in {DMC}_{X, Y}

, define the distance between W and

W^{'}

as follows:

d_{X, Y} (W, W^{'}) = \frac{1}{2} max_{x \in X} \sum_{y \in Y} | W^{'} (y | x) - W (y | x) | .

It is easy to check the following properties of

d_{X, Y}

:

$0 \leq d_{X, Y} (W, W^{'}) \leq 1$ .
$d_{X, Y} : {DMC}_{X, Y} \times {DMC}_{X, Y} \to R^{+}$ is a metric distance on ${DMC}_{X, Y}$ .

Throughout this paper, we always associate the space

{DMC}_{X, Y}

with the metric distance

d_{X, Y}

and the metric topology

T_{X, Y}

induced by it.

For every

x \in X

, the mapping

y \to W (y | x)

is a probability distributions on

Y

. Therefore, every channel W can be seen as a collection of probability distributions on

Y

, and the collection is indexed by

x \in X

. This allows us to identify the space

{DMC}_{X, Y}

with

{(Δ_{Y})}^{X} = \prod_{x \in X} Δ_{Y}

, where

Δ_{Y}

is the set of probability distributions on

Y

. It is easy to see that the topology given by the metric

d_{X, Y}

on

{DMC}_{X, Y}

is the same as the product topology on

{(Δ_{Y})}^{X}

, which is also the same as the topology inherited from the Euclidean topology of

R^{X \times Y}

by relativization.

It is known that

Δ_{Y}

is a closed and bounded subset of

R^{Y}

. Therefore,

Δ_{Y}

is compact, which implies that

{(Δ_{Y})}^{X}

is compact. We conclude that the metric space

{DMC}_{X, Y} \equiv {(Δ_{Y})}^{X}

is compact. Moreover, since

Δ_{Y}

a convex subset of

R^{Y}

, it is path-connected, hence

{DMC}_{X, Y} \equiv {(Δ_{Y})}^{X}

is path-connected as well.

If

W \in {DMC}_{X, Y}

and

V \in {DMC}_{Y, Z}

, we define the composition

V \circ W \in {DMC}_{X, Z}

of W and V as follows:

(V \circ W) (z | x) = \sum_{y \in Y} V (z | y) W (y | x), \forall x \in X, \forall z \in Z .

It is easy to see that the mapping

(W, V) \to V \circ W

from

{DMC}_{X, Y} \times {DMC}_{Y, Z}

to

{DMC}_{X, Z}

is continuous.

For every mapping

f : X \to Y

, define the deterministic channel

D_{f} \in {DMC}_{X, Y}

as follows:

D_{f} (y | x) = \{\begin{matrix} 1 & if y = f (x), \\ 0 & otherwise . \end{matrix}

It is easy to see that if

f : X \to Y

and

g : Y \to Z

, then

D_{g} \circ D_{f} = D_{g \circ f}

.

5. Equivalent Channels and Their Representation

Let

W \in {DMC}_{X, Y}

and

W^{'} \in {DMC}_{X, Z}

be two channels having the same input alphabet. We say that

W^{'}

is degraded from W if there exists a channel

V \in {DMC}_{Y, Z}

such that

W^{'} = V \circ W

. The channels W and

W^{'}

are said to be equivalent if each one is degraded from the other. In the rest of this section, we describe one way to check whether two given channels are equivalent.

Let

Δ_{X}

and

Δ_{Y}

be the space of probability distributions on

X

and

Y

respectively. Define

P_{W}^{o} \in Δ_{Y}

as

P_{W}^{o} (y) = \frac{1}{| X |} \sum_{x \in X} W (y | x), \forall y \in Y .

This can be interpreted as the probability distribution of the output when the input is uniformly distributed in

X

. The image of W is the set of output-symbols

y \in Y

having strictly positive probabilities:

Im (W) = {y \in Y : P_{W}^{o} (y) > 0} .

For every

y \in Im (W)

, define

W_{y}^{- 1} \in Δ_{X}

as follows:

W_{y}^{- 1} (x) = \frac{W (y | x)}{| X | P_{W}^{o} (y)}, \forall x \in X .

W_{y}^{- 1} (x)

can be interpreted as the posterior probability of x, given that the output is y, and assuming a uniform prior distribution on the input. In other words, if X is a random variable uniformly distributed in

X

and Y is the output of the channel W when X is the input, then:

$P_{W}^{o} (y) = P_{Y} (y)$ for every $y \in Y$ .
$W_{y}^{- 1} (x) = P_{X | Y} (x | y)$ for every $(x, y) \in X \times Im (W)$ .

Let

(x, y) \in X \times Y

. If

P_{W}^{o} (y) = P_{Y} (y) > 0

, we have

W (y | x) = P_{Y | X} (y | x) = \frac{P_{X, Y} (x, y)}{P_{X} (x)} = | X | P_{Y} (y) P_{X | Y} (x | y) = | X | P_{W}^{o} (y) W_{y}^{- 1} (x) .

On the other hand, if

P_{W}^{o} (y) = 0

, then we must have

W (y | x) = 0

. We conclude that

P_{W}^{o}

and the collection

{W_{y}^{- 1}}_{y \in Im (W)}

uniquely determine W.

The Blackwell measure, denoted

{MP}_{W}

, (In an earlier version of this work, I called

{MP}_{W}

the posterior meta-probability distribution of W. Maxim Raginsky thankfully brought to my attention the fact that

{MP}_{W}

is called Blackwell measure) of W is a probability distribution on

Δ_{X}

having masses

P_{W}^{o} (y)

on

W_{y}^{- 1}

for each

y \in Im (W)

:

{MP}_{W} (B) = \sum_{\begin{matrix} y \in Im (W), \\ W_{y}^{- 1} \in B \end{matrix}} P_{W}^{o} (y), \forall B \in B (Δ_{X}) .

Another way to express

{MP}_{W}

is as follows:

{MP}_{W} = \sum_{y \in Im (W)} P_{W}^{o} (y) \cdot δ_{W_{y}^{- 1}},

where

δ_{W_{y}^{- 1}}

is a Dirac measure centered at

W_{y}^{- 1} \in Δ_{X}

.

{MP}_{W}

can be interpreted as follows: after the receiver obtains the output of the channel, he can compute the posterior probabilities of the input as the conditional probability distribution of the input given the output symbol that he received. However, before receiving the output symbol, the receiver does not know what he we will receive. He just has different probabilities for different possible output symbols. Therefore, the posterior probability distribution that will be computed by the receiver is itself random, and so we need a meta-probability measure to describe it.

{MP}_{W}

is exactly this meta-probability measure.

Since

Im (W)

is finite, the support of

{MP}_{W}

is finite and it consists of all points in

Δ_{X}

having strictly positive mass:

supp ({MP}_{W}) = {p \in Δ_{X} : {MP}_{W} (p) > 0} .

The rank of W is the size of the support of its Blackwell measure:

rank (W) = | supp ({MP}_{W}) | .

Notice that for every

x \in X

, we have

\begin{matrix} \int_{Δ_{X}} p (x) \cdot d {MP}_{W} (p) & = \sum_{p \in supp ({MP}_{W})} {MP}_{W} (p) \cdot p (x) = \sum_{y \in Im (W)} P_{W}^{o} (y) W_{y}^{- 1} (x) \\ = \sum_{y \in Im (W)} \frac{1}{| X |} W (y | x) \overset{(a)}{=} \sum_{y \in Y} \frac{1}{| X |} W (y | x) = \frac{1}{| X |}, \end{matrix}

where (a) follows from the fact that

W (y | x) = 0

for every

y \notin Im (W)

. Therefore, we can write

\int_{Δ_{X}} p \cdot d {MP}_{W} (p) = π_{X},

(1)

where

π_{X}

is the uniform probability distribution on

X

. This shows that

{MP}_{W}

is a balanced meta-probability measure.

The following proposition characterizes the Blackwell measures of DMCs with input alphabet

X

:

Proposition 1.

[14] A meta-probability measure MP on

X

is the Blackwell measure of some DMC with input alphabet

X

if and only if MP is balanced and finitely supported.

Proof.

This proposition is known [14], but we provide a proof for completeness.

The above discussion shows that if MP is the Blackwell measure of some channel with input alphabet

X

, then it is balanced and finitely supported.

Now assume that MP is balanced and finitely supported, and let

Y = supp (MP)

. Define the channel

W \in {DMC}_{X, Y}

as

W (p | x) = | X | MP (p) p (x)

for every

x \in X

and every

p \in Y = supp (MP)

. For every

x \in X

, we have:

\sum_{p \in Y} W (p | x) = \sum_{p \in supp (MP)} | X | p (x) MP (p) = | X | \int_{Δ_{X}} p (x) \cdot d MP (p) = | X | π_{X} (x) = 1 .

Therefore, W is a valid channel. For every

p \in Y

, we have

P_{W}^{o} (p) = \frac{1}{| X |} \sum_{x \in X} W (p | x) = \frac{1}{| X |} \sum_{x \in X} | X | p (x) MP (p) = \sum_{x \in X} p (x) MP (p) = MP (p) > 0,

which implies that

Im (W) = Y

. For every

(x, p) \in X \times Y

we have:

W_{p}^{- 1} (x) = \frac{W (p | x)}{| X | P_{W}^{o} (p)} = \frac{| X | MP (p) p (x)}{| X | MP (p)} = p (x) .

Therefore,

W_{p}^{- 1} = p

for every

p \in Y

. For every Borel subset B of

Δ_{X}

, we have:

{MP}_{W} (B) = \sum_{\begin{matrix} p \in Im (W), \\ W_{p}^{- 1} \in B \end{matrix}} P_{W}^{o} (p) = \sum_{\begin{matrix} p \in supp (MP), \\ p \in B \end{matrix}} MP (p) = MP (B) .

We conclude that

{MP}_{W} = MP

. □

In [4], equivalent representations for binary memoryless symmetric (BMS) channels (namely L, D and G densities) were provided. A necessary and sufficient condition for the degradation of a BMS channel

W^{'}

with respect to another BMS channel W was given in [4] in terms of the

| D |

-densities of W and

W^{'}

. It immediately follows from this condition that two BMS channels are equivalent if and only if they have the same

| D |

-densities. One can deduce from this that two BMS channels (with finite output alphabets) are equivalent if and only if they have the same Blackwell measure. The following proposition shows that this is also true for channels with arbitrary (but finite) input and output alphabets:

Proposition 2.

[14] Let

X, Y

and

Z

be three finite sets. Two channels

W \in {DMC}_{X, Y}

and

W^{'} \in {DMC}_{X, Z}

are equivalent if and only if

{MP}_{W} = {MP}_{W^{'}}

.

Proof.

This proposition is known [14], but we provide a proof in Appendix A for completeness. □

Corollary 1.

If

W \in {DMC}_{X, Y}

and

rank (W) > | Z |

, then W is not equivalent to any channel in

{DMC}_{X, Z}

.

Proof.

Since

rank (W^{'}) = | supp ({MP}_{W^{'}}) | \leq | Z |

for every

W^{'} \in {DMC}_{X, Z}

, it is impossible for W to be equivalent to any channel

W^{'}

in

{DMC}_{X, Z}

. □

Corollary 2.

If

| X | = 1

, all channels with input alphabet

X

are equivalent.

6. Space of Equivalent Channels from $X$ to $Y$

6.1. The ${DMC}_{X, Y}^{(o)}$ Space

Let

X

and

Y

be two finite sets. Define the relation

R_{X, Y}^{(o)}

on

{DMC}_{X, Y}

as follows:

\forall W, W^{'} \in {DMC}_{X, Y}, W R_{X, Y}^{(o)} W^{'} \Leftrightarrow W is equivalent to W^{'} .

It is easy to see that

R_{X, Y}^{(o)}

is an equivalence relation on

{DMC}_{X, Y}

.

Definition 1.

The space of equivalent channels with input alphabet

X

and output alphabet

Y

is the quotient of the space of channels from

X

to

Y

by the equivalence relation:

{DMC}_{X, Y}^{(o)} = {DMC}_{X, Y} / R_{X, Y}^{(o)} .

We define the topology

T_{X, Y}^{(o)}

on

{DMC}_{X, Y}^{(o)}

as the quotient topology

T_{X, Y} / R_{X, Y}^{(o)}

.

Unless we explicitly state otherwise, we always associate

{DMC}_{X, Y}^{(o)}

with the quotient topology

T_{X, Y}^{(o)}

.

For every

W \in {DMC}_{X, Y}

, let

\hat{W} \in {DMC}_{X, Y}^{(o)}

be the

R_{X, Y}^{(o)}

-equivalence class containing W.

Lemma 3.

The projection mapping

Proj : {DMC}_{X, Y} \to {DMC}_{X, Y}^{(o)}

defined as

Proj (W) = \hat{W}

is continuous and closed.

Proof.

See Appendix B. □

Corollary 3.

For every

W \in {DMC}_{X, Y}

,

\hat{W}

is a compact subset of

{DMC}_{X, Y}

.

Proof.

Since

{DMC}_{X, Y}

is compact, then

{DMC}_{X, Y}^{(o)} = {DMC}_{X, Y} / R_{X, Y}^{(o)}

is compact as well.

Let

Proj : {DMC}_{X, Y} \to {DMC}_{X, Y}^{(o)}

be as in Lemma 3. Since Proj is closed and since

{W}

is closed in

{DMC}_{X, Y}

,

{\hat{W}} = Proj ({W})

is closed in

{DMC}_{X, Y}^{(o)}

. Therefore,

\hat{W} = {Proj}^{- 1} ({\hat{W}})

is closed in

{DMC}_{X, Y}

because Proj is continuous. Now since

{DMC}_{X, Y}

is compact,

\hat{W}

is compact as well. □

Theorem 3.

{DMC}_{X, Y}^{(o)}

is a compact, path-connected and metrizable space.

Proof.

Since

{DMC}_{X, Y}

is compact and path-connected,

{DMC}_{X, Y}^{(o)} = {DMC}_{X, Y} / R_{X, Y}^{(o)}

is compact and path-connected as well.

Since the projection map Proj of Lemma 3 is closed, Lemma 2 implies that the quotient space

{DMC}_{X, Y}^{(o)} = {DMC}_{X, Y} / R_{X, Y}^{(o)}

is upper semi-continuous. On the other hand, Corollary 3 shows that all the members of

{DMC}_{X, Y}^{(o)}

are compact in

{DMC}_{X, Y}

. Therefore, the conditions of Theorem 1 are satisfied.

Since

{DMC}_{X, Y}

is a metric space, it is Hausdorff and regular. Moreover, since it can be seen as a subspace of

R^{| X | \cdot | Y |}

, it is also second-countable. By Theorem 1 we get that

{DMC}_{X, Y}^{(o)} = {DMC}_{X, Y} / R_{X, Y}^{(o)}

is Hausdorff, regular and second-countable, and from Theorem 2 we conclude that

{DMC}_{X, Y}^{(o)}

is separable and metrizable. □

6.2. Canonical Embedding and Canonical Identification

Let

X, Y_{1}

and

Y_{2}

be three finite sets such that

| Y_{1} | \leq | Y_{2} |

. We will show that there is a canonical embedding from

{DMC}_{X, Y_{1}}^{(o)}

to

{DMC}_{X, Y_{2}}^{(o)}

. In other words, there exists an explicitly constructable compact subset A of

{DMC}_{X, Y_{2}}^{(o)}

such that A is homeomorphic to

{DMC}_{X, Y_{1}}^{(o)}

. A and the homeomorphism depend only on

X, Y_{1}

and

Y_{2}

(this is why we say that they are canonical). Moreover, we can show that A depends only on

| Y_{1} |

,

X

and

Y_{2}

.

Lemma 4.

For every

W \in {DMC}_{X, Y_{1}}

and every injection f from

Y_{1}

to

Y_{2}

, W is equivalent to

D_{f} \circ W

.

Proof.

Clearly

D_{f} \circ W

is degraded from W. Now let

f^{'}

be any mapping from

Y_{2}

to

Y_{1}

such that

f^{'} (f (y_{1})) = y_{1}

for every

y_{1} \in Y_{1}

. We have

W = (D_{f^{'}} \circ D_{f}) \circ W = D_{f^{'}} \circ (D_{f} \circ W)

, and so W is also degraded from

D_{f} \circ W

. □

Corollary 4.

For every

W, W^{'} \in {DMC}_{X, Y_{1}}

and every two injections

f, g

from

Y_{1}

to

Y_{2}

, we have:

W R_{X, Y_{1}}^{(o)} W^{'} \Leftrightarrow (D_{f} \circ W) R_{X, Y_{2}}^{(o)} (D_{g} \circ W^{'}) .

Proof.

Since W is equivalent to

D_{f} \circ W

and

W^{'}

is equivalent to

D_{g} \circ W^{'}

, then W is equivalent to

W^{'}

if and only if

D_{f} \circ W

is equivalent to

D_{g} \circ W^{'}

. □

For every

W \in {DMC}_{X, Y_{1}}

, we denote the

R_{X, Y_{1}}^{(o)}

-equivalence class of W as

\hat{W}

, and for every

W \in {DMC}_{X, Y_{2}}

, we denote the

R_{X, Y_{2}}^{(o)}

-equivalence class of W as

\tilde{W}

.

Proposition 3.

Let

f : Y_{1} \to Y_{2}

be any fixed injection between

Y_{1}

and

Y_{2}

. Define the mapping

F : {DMC}_{X, Y_{1}}^{(o)} \to {DMC}_{X, Y_{2}}^{(o)}

as

F (\hat{W}) = \tilde{D_{f} \circ W^{'}} = {Proj}_{2} (D_{f} \circ W^{'}),

where

W^{'} \in \hat{W}

and

{Proj}_{2} : {DMC}_{X, Y_{2}} \to {DMC}_{X, Y_{2}}^{(o)}

is the projection onto the

R_{X, Y_{2}}^{(o)}

-equivalence classes. We have:

F is well defined, i.e., ${Proj}_{2} (D_{f} \circ W^{'})$ does not depend on $W^{'} \in \hat{W}$ .
F is a homeomorphism between ${DMC}_{X, Y_{1}}^{(o)}$ and $F ({DMC}_{X, Y_{1}}^{(o)})$ .
F does not depend on f, i.e., F depends only on $X, Y_{1}$ and $Y_{2}$ .
$F ({DMC}_{X, Y_{1}}^{(o)})$ depends only on $| Y_{1} |$ , $X$ and $Y_{2}$ .
For every $W^{'} \in \hat{W}$ and every $W^{″} \in F (\hat{W})$ , $W^{'}$ is equivalent to $W^{″}$ .

Proof.

Corollary 4 implies that

{Proj}_{2} (D_{f} \circ W) = {Proj}_{2} (D_{f} \circ W^{'})

if and only if

W R_{X, Y_{1}}^{(o)} W^{'}

. Therefore,

{Proj}_{2} (D_{f} \circ W^{'})

does not depend on

W^{'} \in \hat{W}

, hence F is well defined. Corollary 4 also shows that

{Proj}_{2} (D_{f} \circ W^{'})

does not depend on the particular choice of the injection f, hence it is canonical (i.e., it depends only on

X, Y_{1}

and

Y_{2}

).

On the other hand, the mapping

W \to D_{f} \circ W

is a continuous mapping from

{DMC}_{X, Y_{1}}

to

{DMC}_{X, Y_{2}}

, and

{Proj}_{2}

is continuous. Therefore, the mapping

W \to {Proj}_{2} (D_{f} \circ W)

is a continuous mapping from

{DMC}_{X, Y_{1}}

to

{DMC}_{X, Y_{2}}^{(o)}

. Now since

{Proj}_{2} (D_{f} \circ W)

depends only on the

R_{X, Y_{1}}^{(o)}

-equivalence class

\hat{W}

of W, Lemma 1 implies that F is continuous. Moreover, we can see from Corollary 4 that F is an injection.

For every closed subset B of

{DMC}_{X, Y_{1}}^{(o)}

, B is compact since

{DMC}_{X, Y_{1}}^{(o)}

is compact, hence

F (B)

is compact because F is continuous. This implies that

F (B)

is closed in

{DMC}_{X, Y_{2}}^{(o)}

since

{DMC}_{X, Y_{2}}^{(o)}

is Hausdorff (as it is metrizable). Therefore, F is a closed mapping.

Now since F is an injection that is both continuous and closed, we can deduce that F is a homeomorphism between

{DMC}_{X, Y_{1}}^{(o)}

and

F ({DMC}_{X, Y_{1}}^{(o)}) \subset {DMC}_{X, Y_{2}}^{(o)}

. We would like now to show that

F ({DMC}_{X, Y_{1}}^{(o)})

depends only on

| Y_{1} |

,

X

and

Y_{2}

. Let

Y_{1}^{'}

be a finite set such that

| Y_{1} | = | Y_{1}^{'} |

. For every

W \in {DMC}_{X, Y_{1}^{'}}

, let

\bar{W} \in {DMC}_{X, Y_{1}^{'}}^{(o)}

be the

R_{X, Y_{1}^{'}}^{(o)}

-equivalence class of W.

Let

g : Y_{1}^{'} \to Y_{1}

be a fixed bijection from

Y_{1}^{'}

to

Y_{1}

and let

f^{'} = f \circ g

. Define

F^{'} : {DMC}_{X, Y_{1}^{'}}^{(o)} \to {DMC}_{X, Y_{2}}^{(o)}

as

F^{'} (\bar{W}) = \tilde{D_{f^{'}} \circ W^{'}} = {Proj}_{2} (D_{f^{'}} \circ W^{'}),

where

W^{'} \in \bar{W}

. As above,

F^{'}

is well defined, and it is a homeomorphism from

{DMC}_{X, Y_{1}^{'}}^{(o)}

to

F^{'} ({DMC}_{X, Y_{1}^{'}}^{(o)})

. We want to show that

F^{'} ({DMC}_{X, Y_{1}^{'}}^{(o)}) = F ({DMC}_{X, Y_{1}}^{(o)})

. For every

\bar{W} \in {DMC}_{X, Y_{1}^{'}}^{(o)}

, let

W^{'} \in \bar{W}

. We have

F^{'} (\bar{W}) = {Proj}_{2} (D_{f^{'}} \circ W^{'}) = {Proj}_{2} (D_{f} \circ (D_{g} \circ W^{'})) = F (\hat{D_{g} \circ W^{'}}) \in F ({DMC}_{X, Y_{1}}^{(o)}) .

Since this is true for every

\bar{W} \in {DMC}_{X, Y_{1}^{'}}^{(o)}

, we deduce that

F^{'} ({DMC}_{X, Y_{1}^{'}}^{(o)}) \subset F ({DMC}_{X, Y_{1}}^{(o)})

. By exchanging the roles of

Y_{1}

and

Y_{1}^{'}

and using the fact that

f = f^{'} \circ g^{- 1}

, we get

F ({DMC}_{X, Y_{1}}^{(o)}) \subset F^{'} ({DMC}_{X, Y_{1}^{'}}^{(o)})

. We conclude that

F ({DMC}_{X, Y_{1}}^{(o)}) = F^{'} ({DMC}_{X, Y_{1}^{'}}^{(o)})

, which means that

F ({DMC}_{X, Y_{1}}^{(o)})

depends only on

| Y_{1} |

,

X

and

Y_{2}

.

Finally, for every

W^{'} \in \hat{W}

and every

W^{″} \in F (\hat{W}) = \tilde{D_{f} \circ W^{'}}

,

W^{″}

is equivalent to

D_{f} \circ W^{'}

and

D_{f} \circ W^{'}

is equivalent to

W^{'}

(by Lemma 4), hence

W^{″}

is equivalent to

W^{'}

. □

Corollary 5.

If

| Y_{1} | = | Y_{2} |

, there exists a canonical homeomorphism from

{DMC}_{X, Y_{1}}^{(o)}

to

{DMC}_{X, Y_{2}}^{(o)}

depending only on

X, Y_{1}

and

Y_{2}

.

Proof.

Let f be a bijection from

Y_{1}

to

Y_{2}

. Define the mapping

F : {DMC}_{X, Y_{1}}^{(o)} \to {DMC}_{X, Y_{2}}^{(o)}

as

F (\hat{W}) = \tilde{D_{f} \circ W^{'}} = {Proj}_{2} (D_{f} \circ W^{'}),

where

W^{'} \in \hat{W}

and

{Proj}_{2} : {DMC}_{X, Y_{2}} \to {DMC}_{X, Y_{2}}^{(o)}

is the projection onto the

R_{X, Y_{2}}^{(o)}

-equivalence classes.

Also, define the mapping

F^{'} : {DMC}_{X, Y_{2}}^{(o)} \to {DMC}_{X, Y_{1}}^{(o)}

as

F^{'} (\tilde{V}) = \hat{D_{f^{- 1}} \circ V^{'}} = {Proj}_{1} (D_{f^{- 1}} \circ V^{'}),

where

V^{'} \in \tilde{V}

and

{Proj}_{1} : {DMC}_{X, Y_{1}} \to {DMC}_{X, Y_{1}}^{(o)}

is the projection onto the

R_{X, Y_{1}}^{(o)}

-equivalence classes.

Proposition 3 shows that F and

F^{'}

are well defined.

For every

W \in {DMC}_{X, Y_{1}}

, we have:

\begin{matrix} F^{'} (F (\hat{W})) & \overset{(a)}{=} F^{'} (\tilde{D_{f} \circ W}) \overset{(b)}{=} \hat{D_{f^{- 1}} \circ (D_{f} \circ W)} = \hat{W}, \end{matrix}

where (a) follows from the fact that

W \in \hat{W}

and (b) follows from the fact that

D_{f} \circ W \in \tilde{D_{f} \circ W}

.

We can similarly show that

F (F^{'} (\tilde{V})) = \tilde{V}

for every

\tilde{V} \in {DMC}_{X, Y_{2}}^{(o)}

. Therefore, both F and

F^{'}

are bijections. Proposition 3 now implies that F is a homeomorphism from

{DMC}_{X, Y_{1}}^{(o)}

to

F ({DMC}_{X, Y_{1}}^{(o)}) = {DMC}_{X, Y_{2}}^{(o)}

. Moreover, F depends only on

X, Y_{1}

and

Y_{2}

. □

Corollary 5 allows us to identify

{DMC}_{X, Y_{1}}^{(o)}

with

{DMC}_{X, Y_{2}}^{(o)}

whenever

| Y_{1} | = | Y_{2} |

. In the rest of this paper, we identify

{DMC}_{X, Y}^{(o)}

with

{DMC}_{X, [n]}^{(o)}

through the canonical identification, where

n = | Y |

and

[n] = {1, \dots, n}

.

Moreover, for every

1 \leq n \leq m

, Proposition 3 allows us to identify

{DMC}_{X, [n]}^{(o)}

with the canonical subspace of

{DMC}_{X, [m]}^{(o)}

that is homeomorphic to

{DMC}_{X, [n]}^{(o)}

. In the rest of this paper, we consider that

{DMC}_{X, [n]}^{(o)}

is a compact subspace of

{DMC}_{X, [m]}^{(o)}

.

Intuitively,

{DMC}_{X, [n]}^{(o)}

has a “lower dimension" compared to

{DMC}_{X, [m]}^{(o)}

. So one expects that the interior of

{DMC}_{X, [n]}^{(o)}

in

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

is empty if

m > n

. The following proposition shows that this intuition is accurate.

Proposition 4.

If

| X | \geq 2

, then for every

1 \leq n < m

, the interior of

{DMC}_{X, [n]}^{(o)}

in

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

is empty.

Proof.

See Appendix C. □

7. Space of Equivalent Channels

We would like to form the space of all equivalent channels having the same input alphabet

X

. The previous section showed that if

| Y_{1} | = | Y_{2} |

, there is a canonical identification between

{DMC}_{X, Y_{1}}^{(o)}

and

{DMC}_{X, Y_{2}}^{(o)}

. This shows that if we are interested in equivalent channels, it is sufficient to study the spaces

{DMC}_{X, [n]}

and

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

. Define the space

{DMC}_{X, *} = \underset{n \geq 1}{∐} {DMC}_{X, [n]} .

The subscript * indicates that the output alphabets of the considered channels are arbitrary but finite.

We define the equivalence relation

R_{X, *}^{(o)}

on

{DMC}_{X, *}

as follows:

\forall W, W^{'} \in {DMC}_{X, *}, W R_{X, *}^{(o)} W^{'} \Leftrightarrow W is equivalent to W^{'} .

Definition 2.

The space of equivalent channels with input alphabet

X

is the quotient of the space of channels with input alphabet

X

by the equivalence relation:

{DMC}_{X, *}^{(o)} = {DMC}_{X, *} / R_{X, *}^{(o)} .

For every

n \geq 1

and every

W, W^{'} \in {DMC}_{X, [n]}

, we have

W R_{X, *}^{(o)} W^{'}

if and only if

W R_{X, [n]}^{(o)} W^{'}

by definition. Therefore,

{DMC}_{X, [n]} / R_{X, *}^{(o)}

can be canonically identified with

{DMC}_{X, [n]} / R_{X, [n]}^{(o)} = {DMC}_{X, [n]}^{(o)}

. However, since we identified

{DMC}_{X, [n]}^{(o)}

to its image through the canonical embedding in

{DMC}_{X, [m]}^{(o)}

for every

m \geq n

, we have to make sure that these identifications are consistent with each other.

Remember that for every

m \geq n \geq 1

and every

W \in {DMC}_{X, [n]}

, we identified

\hat{W}

with

\tilde{D_{f} \circ W}

, where f is any injection from

[n]

to

[m]

,

\hat{W}

is the

R_{X, [n]}^{(o)}

-equivalence class of W and

\tilde{D_{f} \circ W}

is the

R_{X, [m]}^{(o)}

-equivalence class of

D_{f} \circ W

. Since

D_{f} \circ W

is equivalent to W (by Lemma 4), W is

R_{X, *}^{(o)}

-equivalent to

D_{f} \circ W

for every

W \in {DMC}_{X, [n]}^{(o)}

. We conclude that identifying

{DMC}_{X, [n]}^{(o)}

to its image through the canonical embedding in

{DMC}_{X, [m]}^{(o)}

for every

m \geq n \geq 1

is consistent with identifying

{DMC}_{X, [n]} / R_{X, *}^{(o)}

to

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

. Hence, we can write

{DMC}_{X, *}^{(o)} = ⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)} .

For any

W, W^{'} \in {DMC}_{X, *}

, Proposition 2 shows that

W R_{X, *}^{(o)} W^{'}

if and only if

{MP}_{W} = {MP}_{W^{'}}

. Therefore, for every

\hat{W} \in {DMC}_{X, *}^{(o)}

, we can define the Blackwell measure of

\hat{W}

as

{MP}_{\hat{W}} : = {MP}_{W^{'}}

for any

W^{'} \in \hat{W}

. We also define the rank of

\hat{W}

as

rank (\hat{W}) = | supp ({MP}_{\hat{W}}) |

. Due to Proposition 2, we have

{DMC}_{X, [n]}^{(o)} = {\hat{W} \in {DMC}_{X, *}^{(o)} : rank (\hat{W}) \leq n} .

A subset A of

{DMC}_{X, *}^{(o)}

is said to be rank-bounded if there exists

n \geq 1

such that

A \subset {DMC}_{X, [n]}^{(o)}

. A is rank-unbounded if it is not rank-bounded.

7.1. Natural Topologies on ${DMC}_{X, *}^{(o)}$

Since

{DMC}_{X, *}^{(o)}

is the quotient of

{DMC}_{X, *}

and since

{DMC}_{X, *}

was not given any topology, there is no “standard topology” on

{DMC}_{X, *}^{(o)}

.

However, there are many properties that one may require from any “reasonable” topology on

{DMC}_{X, *}^{(o)}

. For example, one may require the continuity of all mappings that are relevant to information theory such as capacity, mutual information, probability of error of any fixed code, optimal probability of error of a given rate and blocklength, channel sums and products, etc. The continuity of these mappings under different topologies on

{DMC}_{X, *}^{(o)}

is studied in [8].

In this paper, we focus on one particular requirement that we consider the most basic property required from any “acceptable” topology on

{DMC}_{X, *}^{(o)}

:

Definition 3.

A topology

T

on

{DMC}_{X, *}^{(o)}

is said to be natural if it induces the quotient topology

T_{X, [n]}^{(o)}

on

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

.

The reason why we consider such topology as natural is because

{DMC}_{X, [n]}^{(o)}

is subset of

{DMC}_{X, *}^{(o)}

and the quotient topology

T_{X, [n]}^{(o)}

is the “standard” and “most natural” topology on

{DMC}_{X, [n]}^{(o)}

. Therefore, we do not want to induce any non-standard topology on

{DMC}_{X, [n]}^{(o)}

by relativization.

Before discussing any particular natural topology, we would like to discuss a few properties that are common to all natural topologies.

Proposition 5.

Every natural topology is σ-compact, separable and path-connected.

Proof.

Since

{DMC}_{X, *}^{(o)}

is the countable union of compact and separable subspaces (namely

{{DMC}_{X, [n]}^{(o)}}_{n \geq 1}

),

{DMC}_{X, *}^{(o)}

is

σ

-compact and separable.

On the other hand, since

⋂_{n \geq 1} {DMC}_{X, [n]}^{(o)} = {DMC}_{X, [1]}^{(o)} \neq \emptyset

and since

{DMC}_{X, [n]}^{(o)}

is path-connected for every

n \geq 1

, the union

{DMC}_{X, *}^{(o)} = ⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)}

is path-connected. □

Proposition 6.

If

| X | \geq 2

and

T

is a natural topology, every open set is rank-unbounded.

Proof.

Assume on the contrary that there exists a non-empty open set

U \in T

such that

U \subset {DMC}_{X, [n]}^{(o)}

for some

n \geq 1

.

U \cap {DMC}_{X, [n + 1]}^{(o)}

is open in

{DMC}_{X, [n + 1]}^{(o)}

because

T

is natural. On the other hand,

U \cap {DMC}_{X, [n + 1]}^{(o)} \subset U \subset {DMC}_{X, [n]}^{(o)}

. Proposition 4 now implies that

U \cap {DMC}_{X, [n + 1]}^{(o)} = \emptyset

. Therefore,

U = U \cap {DMC}_{X, [n]}^{(o)} \subset U \cap {DMC}_{X, [n + 1]}^{(o)} = \emptyset,

which is a contradiction. □

Corollary 6.

If

| X | \geq 2

and

T

is a natural topology, then for every

n \geq 1

, the interior of

{DMC}_{X, [n]}^{(o)}

in

({DMC}_{X, *}^{(o)}, T)

is empty.

Proposition 7.

If

| X | \geq 2

and

T

is a Hausdorff natural topology, then

({DMC}_{X, *}^{(o)}, T)

is not a Baire space.

Proof.

Fix

n \geq 1

. Since

T

is natural,

{DMC}_{X, [n]}^{(o)}

is a compact subset of

({DMC}_{X, *}^{(o)}, T)

. However,

T

is Hausdorff, so

{DMC}_{X, [n]}^{(o)}

is a closed subset of

({DMC}_{X, *}^{(o)}, T)

. Therefore,

{DMC}_{X, *}^{(o)} ∖ {DMC}_{X, [n]}^{(o)}

is open.

On the other hand, Corollary 6 shows that the interior of

{DMC}_{X, [n]}^{(o)}

in

({DMC}_{X, *}^{(o)}, T)

is empty. Therefore,

{DMC}_{X, *}^{(o)} ∖ {DMC}_{X, [n]}^{(o)}

is dense in

({DMC}_{X, *}^{(o)}, T)

.

Now since

⋂_{n \geq 1} ({DMC}_{X, *}^{(o)} ∖ {DMC}_{X, [n]}^{(o)}) = {DMC}_{X, *}^{(o)} ∖ (⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)}) = \emptyset,

and since

{DMC}_{X, *}^{(o)} ∖ {DMC}_{X, [n]}^{(o)}

is open and dense in

({DMC}_{X, *}^{(o)}, T)

for every

n \geq 1

, we conclude that

({DMC}_{X, *}^{(o)}, T)

is not a Baire space. □

Corollary 7.

If

| X | \geq 2

, no natural topology on

{DMC}_{X, *}^{(o)}

can be completely metrizable.

Proof.

The corollary follows from Proposition 7 and the fact that every completely metrizable topology is both Hausdorff and Baire. □

Proposition 8.

If

| X | \geq 2

and

T

is a Hausdorff natural topology, then

({DMC}_{X, *}^{(o)}, T)

is not locally compact anywhere, i.e., for every

\hat{W} \in {DMC}_{X, *}^{(o)}

, there is no compact neighborhood of

\hat{W}

in

({DMC}_{X, *}^{(o)}, T)

.

Proof.

Assume on the contrary that there exists a compact neighborhood K of

\hat{W}

. There exists an open set U such that

\hat{W} \in U \subset K

.

Since K is compact and Hausdorff, it is a Baire space. Moreover, since U is an open subset of K, U is also a Baire space.

Fix

n \geq 1

. Since the interior of

{DMC}_{X, [n]}^{(o)}

in

({DMC}_{X, *}^{(o)}, T)

is empty, the interior of

U \cap {DMC}_{X, [n]}^{(o)}

in U is also empty. Therefore,

U ∖ {DMC}_{X, [n]}^{(o)}

is dense in U. On the other hand, since

T

is natural,

{DMC}_{X, [n]}^{(o)}

is compact which implies that it is closed because

T

is Hausdorff. Therefore,

U ∖ {DMC}_{X, [n]}^{(o)}

is open in U. Now since

⋂_{n \geq 1} (U ∖ {DMC}_{X, [n]}^{(o)}) = U ∖ (⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)}) = \emptyset,

and since

U ∖ {DMC}_{X, [n]}^{(o)}

is open and dense in U for every

n \geq 1

, U is not Baire, which is a contradiction. Therefore, there is no compact neighborhood of

\hat{W}

in

({DMC}_{X, *}^{(o)}, T)

. □

8. Strong Topology on ${DMC}_{X, *}^{(o)}$

The first natural topology that we study is the strong topology

T_{s, X, *}^{(o)}

on

{DMC}_{X, *}^{(o)}

, which is the finest natural topology.

Since the spaces

{{DMC}_{X, [n]}}_{n \geq 1}

are disjoint and since there is no a priori way to (topologically) compare channels in

{DMC}_{X, [n]}

with channels in

{DMC}_{X, [n^{'}]}

for

n \neq n^{'}

, the “most natural” topology that we can define on

{DMC}_{X, *}

is the disjoint union topology

T_{s, X, *} : = ⨁_{n \geq 1} T_{X, [n]}

. Clearly, the space

({DMC}_{X, *}, T_{s, X, *})

is disconnected. Moreover,

T_{s, X, *}

is metrizable because it is the disjoint union of metrizable spaces. It is also

σ

-compact because it is the union of countably many compact spaces.

We added the subscript s to emphasize the fact that

T_{s, X, *}

is a strong topology (remember that the disjoint union topology is the finest topology that makes the canonical injections continuous).

Definition 4.

We define the strong topology

T_{s, X, *}^{(o)}

on

{DMC}_{X, *}^{(o)}

as the quotient topology

T_{s, X, *} / R_{X, *}^{(o)}

.

We call open and closed sets in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

as strongly open and strongly closed sets respectively.

Let

Proj : {DMC}_{X, *} \to {DMC}_{X, *}^{(o)}

be the projection onto the

R_{X, *}^{(o)}

-equivalence classes, and for every

n \geq 1

let

{Proj}_{n} : {DMC}_{X, [n]} \to {DMC}_{X, [n]}^{(o)}

be the projection onto the

R_{X, [n]}^{(o)}

-equivalence classes. Due to the identifications that we made in Section 7, we have

Proj (W) = {Proj}_{n} (W)

for every

W \in {DMC}_{X, [n]}

. Therefore, for every

U \subset {DMC}_{X, *}^{(o)}

, we have

{Proj}^{- 1} (U) = \underset{n \geq 1}{∐} {Proj}_{n}^{- 1} (U \cap {DMC}_{X, [n]}^{(o)}) .

Hence,

\begin{matrix} U \in T_{s, X, *}^{(o)} & \overset{(a)}{\Leftrightarrow} {Proj}^{- 1} (U) \in T_{s, X, *} \\ \overset{(b)}{\Leftrightarrow} {Proj}^{- 1} (U) \cap {DMC}_{X, [n]} \in T_{X, [n]}, \forall n \geq 1 \\ \Leftrightarrow (\underset{n^{'} \geq 1}{∐} {Proj}_{n^{'}}^{- 1} (U \cap {DMC}_{X, [n^{'}]}^{(o)})) \cap {DMC}_{X, [n]} \in T_{X, [n]}, \forall n \geq 1 \\ \Leftrightarrow {Proj}_{n}^{- 1} (U \cap {DMC}_{X, [n]}^{(o)}) \in T_{X, [n]}, \forall n \geq 1 \\ \overset{(c)}{\Leftrightarrow} U \cap {DMC}_{X, [n]}^{(o)} \in T_{X, [n]}^{(o)}, \forall n \geq 1, \end{matrix}

where (a) and (c) follow from the properties of the quotient topology, and (b) follows from the properties of the disjoint union topology.

We conclude that

U \subset {DMC}_{X, *}^{(o)}

is strongly open in

{DMC}_{X, *}^{(o)}

if and only if

U \cap {DMC}_{X, [n]}^{(o)}

is open in

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

. This shows that the topology on

{DMC}_{X, [n]}^{(o)}

that is inherited from

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is exactly

T_{X, [n]}^{(o)}

. Therefore,

T_{s, X, *}^{(o)}

is a natural topology. On the other hand, if

T

is an arbitrary natural topology and

U \in T

, then

U \cap {DMC}_{X, [n]}^{(o)}

is open in

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

, so

U \in T_{s, X, *}^{(o)}

. We conclude that

T_{s, X, *}^{(o)}

is the finest natural topology.

We can also characterize the strongly closed subsets of

{DMC}_{X, *}^{(o)}

in terms of the closed sets of the

{DMC}_{X, [n]}^{(o)}

spaces:

\begin{matrix} F is strongly closed in {DMC}_{X, *}^{(o)} & \Leftrightarrow {DMC}_{X, *}^{(o)} ∖ F is strongly open in {DMC}_{X, *}^{(o)} \\ \Leftrightarrow ({DMC}_{X, *}^{(o)} ∖ F) \cap {DMC}_{X, [n]}^{(o)} is open in {DMC}_{X, [n]}^{(o)}, \forall n \geq 1 \\ \Leftrightarrow {DMC}_{X, [n]}^{(o)} ∖ (F \cap {DMC}_{X, [n]}^{(o)}) is open in {DMC}_{X, [n]}^{(o)}, \forall n \geq 1 \\ \Leftrightarrow F \cap {DMC}_{X, [n]}^{(o)} is closed in {DMC}_{X, [n]}^{(o)}, \forall n \geq 1 . \end{matrix}

Since

{DMC}_{X, [n]}^{(o)}

is metrizable for every

n \geq 1

, it is also normal. We can use this fact to prove that the strong topology on

{DMC}_{X, *}^{(o)}

is normal:

Lemma 5.

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is normal.

Proof.

See Appendix D. □

The following theorem shows that the strong topology satisfies many desirable properties.

Theorem 4.

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is a compactly generated, sequential and

T_{4}

space.

Proof.

Since

({DMC}_{X, *}, T_{s, X, *})

is metrizable, it is sequential. Therefore,

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

, which is the quotient of a sequential space, is sequential.

Let us now show that

{DMC}_{X, *}^{(o)}

is

T_{4}

. Fix

\hat{W} \in {DMC}_{X, *}^{(o)}

. For every

n \geq 1

,

{\hat{W}} \cap {DMC}_{X, [n]}^{(o)}

is either

{\hat{W}}

or ∅ depending on whether

\hat{W} \in {DMC}_{X, [n]}^{(o)}

or not. Since

{DMC}_{X, [n]}^{(o)}

is metrizable, it is

T_{1}

and so singletons are closed in

{DMC}_{X, [n]}^{(o)}

. We conclude that in all cases,

{\hat{W}} \cap {DMC}_{X, [n]}^{(o)}

is closed in

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

. Therefore,

{\hat{W}}

is strongly closed in

{DMC}_{X, *}^{(o)}

. This shows that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is

T_{1}

. On the other hand, Lemma 5 shows that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is normal. This means that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is

T_{4}

, which implies that it is Hausdorff.

Now since

({DMC}_{X, *}, T_{s, X, *})

is metrizable, it is compactly generated. On the other hand, the quotient space

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

was shown to be Hausdorff. We conclude that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is compactly generated. □

Corollary 8.

If

| X | \geq 2

,

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is not locally compact anywhere.

Proof.

Since

T_{s, X, *}^{(o)}

is a natural Hausdorff topology, Proposition 8 implies that

T_{s, X, *}^{(o)}

is not locally compact anywhere. □

Although

({DMC}_{X, *}, T_{s, X, *})

is second-countable (because it is a

σ

-compact metrizable space), the quotient space

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is not second-countable. In fact, we will show later that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

fails to be first-countable (and hence it is not metrizable). This is one manifestation of the strength of the topology

T_{s, X, *}^{(o)}

. In order to show that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is not first-countable, we need to characterize the converging sequences in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

.

A sequence

{({\hat{W}}_{n})}_{n \geq 1}

in

{DMC}_{X, *}^{(o)}

is said to be rank-bounded if

rank ({\hat{W}}_{n})

is bounded.

{({\hat{W}}_{n})}_{n \geq 1}

is rank-unbounded if it is not bounded.

The following proposition shows that every rank-unbounded sequence does not converge in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

.

Proposition 9.

A sequence

{({\hat{W}}_{n})}_{n \geq 0}

converges in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

if and only if there exists

m \geq 1

such that

{\hat{W}}_{n} \in {DMC}_{X, [m]}^{(o)}

for every

n \geq 0

, and

{({\hat{W}}_{n})}_{n \geq 0}

converges in

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

.

Proof.

Assume that a sequence

{({\hat{W}}_{n})}_{n \geq 0}

in

{DMC}_{X, *}^{(o)}

is rank-unbounded. This cannot happen unless

| X | \geq 2

. In order to show that

{({\hat{W}}_{n})}_{n \geq 0}

does not converge, it is sufficient to show that there exists a subsequence of

{({\hat{W}}_{n})}_{n \geq 0}

which does not converge.

Let

{({\hat{W}}_{n_{k}})}_{k \geq 0}

be any subsequence of

{({\hat{W}}_{n})}_{n \geq 0}

where the rank strictly increases, i.e.,

rank (W_{n_{k}}) < rank (W_{n_{k^{'}}})

for every

0 \leq k < k^{'}

. We will show that

{({\hat{W}}_{n_{k}})}_{k \geq 0}

does not converge.

Assume on the contrary that

{({\hat{W}}_{n_{k}})}_{k \geq 0}

converges to

\hat{W} \in {DMC}_{X, *}^{(o)}

. Define the set

A = {{\hat{W}}_{n_{k}} : k \geq 0} ∖ {\hat{W}} .

For every

m \geq 1

, the set

A \cap {DMC}_{X, [m]}^{(o)}

contains finitely many points. This means that

A \cap {DMC}_{X, [m]}^{(o)}

is a finite union of singletons (which are closed in

{DMC}_{X, [m]}^{(o)}

), hence

A \cap {DMC}_{X, [m]}^{(o)}

is closed in

{DMC}_{X, [m]}^{(o)}

for every

m \geq 1

. Therefore A is closed in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

.

Now define

U = {DMC}_{X, *}^{(o)} ∖ A

. Since A is strongly closed, U is strongly open. Moreover, U contains

\hat{W}

, so U is a neighborhood of

\hat{W}

. Therefore, there exists

k_{0} \geq 0

such that

{\hat{W}}_{n_{k}} \in U

for every

k \geq k_{0}

. Now since the rank of

{({\hat{W}}_{n_{k}})}_{k \geq 0}

strictly increases, we can find

k \geq k_{0}

such that

rank ({\hat{W}}_{n_{k}}) > rank (\hat{W})

. This means that

{\hat{W}}_{n_{k}} \neq \hat{W}

and so

{\hat{W}}_{n_{k}} \in A

. Therefore,

{\hat{W}}_{n_{k}} \notin U

which is a contradiction.

We conclude that every converging sequence in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

must be rank-bounded.

Now let

{({\hat{W}}_{n})}_{n \geq 0}

be a rank-bounded sequence in

{DMC}_{X, *}^{(o)}

, i.e., there exists

m \geq 1

such that

{\hat{W}}_{n} \in {DMC}_{X, [m]}^{(o)}

for every

n \geq 0

. If

{({\hat{W}}_{n})}_{n \geq 0}

converges in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

then it converges in

{DMC}_{X, [m]}^{(o)}

since

{DMC}_{X, [m]}^{(o)}

is strongly closed.

Conversely, assume that

{({\hat{W}}_{n})}_{n \geq 0}

converges in

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

to

\hat{W} \in {DMC}_{X, [m]}^{(o)}

. Let O be any neighborhood of

\hat{W}

in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. There exists a strongly open set U such that

\hat{W} \in U \subset O

. Since

U \cap {DMC}_{X, [m]}^{(o)}

is open in

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

, there exists

n_{0} > 0

such that

{\hat{W}}_{n} \in U \cap {DMC}_{X, [m]}^{(o)}

for every

n \geq n_{0}

. This implies that

{\hat{W}}_{n} \in O

for every

n \geq n_{0}

. Therefore

{({\hat{W}}_{n})}_{n \geq 0}

converges to

\hat{W}

in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. □

Corollary 9.

If

| X | \geq 2

,

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is not first-countable anywhere, i.e., for every

\hat{W} \in {DMC}_{X, *}^{(o)}

, there is no countable neighborhood basis of

\hat{W}

.

Proof.

Fix

\hat{W} \in {DMC}_{X, *}^{(o)}

and assume on the contrary that

\hat{W}

admits a countable neighborhood basis

{O_{n}}_{n \geq 1}

in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. For every

n \geq 1

, let

U_{n}^{'}

be a strongly open set such that

\hat{W} \in U_{n}^{'} \subset O_{n}

. Define

U_{n} = ⋂_{i = 1}^{n} U_{i}^{'}

.

U_{n}

is strongly open because it is the intersection of finitely many strongly open sets. Moreover,

U_{n} \subset O_{m}

for every

n \geq m

.

For every

n \geq 1

, Proposition 6 implies that

U_{n}

(which is non-empty and strongly open) is rank-unbounded, so it cannot be contained in

{DMC}_{X, [n]}^{(o)}

. Hence there exists

{\hat{W}}_{n} \in U_{n}

such that

{\hat{W}}_{n} \notin {DMC}_{X, [n]}^{(o)}

.

Since

{\hat{W}}_{n} \notin {DMC}_{X, [n]}^{(o)}

, we have

rank ({\hat{W}}_{n}) > n

for every

n \geq 1

. Therefore,

{({\hat{W}}_{n})}_{n \geq 1}

is rank-unbounded. Proposition 9 implies that

{({\hat{W}}_{n})}_{n \geq 1}

does not converge in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

.

Now let O be a neighborhood of

\hat{W}

in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. Since

{O_{n}}_{n \geq 1}

is a neighborhood basis for

\hat{W}

, there exists

n_{0} \geq 1

such that

O_{n_{0}} \subset O

. For every

n \geq n_{0}

, we have

{\hat{W}}_{n} \in U_{n} \subset O_{n_{0}} \subset O

. This means that

{({\hat{W}}_{n})}_{n \geq 1}

converges to

\hat{W}

in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

which is a contradiction. Therefore,

\hat{W}

does not admit a countable neighborhood basis in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. □

Compact Subspaces of $({DMC}_{X, }^{(o)}, T_{s, X, }^{(o)})$

It is well known that a compact subset of

R

is compact if and only if it is closed and bounded. The following proposition shows that a similar statement holds for

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

.

Proposition 10.

A subspace of

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is compact if and only if it is rank-bounded and strongly closed.

Proof.

If

| X | = 1

, all channels are equivalent to each other and so

{DMC}_{X, *}^{(o)} = {DMC}_{X, [1]}^{(o)}

consists of a single point. Therefore, all subsets of

{DMC}_{X, *}^{(o)}

are rank-bounded, compact and strongly closed.

Assume now that

| X | \geq 2

. Let A be a subspace of

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. If A is rank-bounded and strongly closed, then there exists

n \geq 1

such that

A \subset {DMC}_{X, [n]}^{(o)}

. Since A is strongly closed, then

A = A \cap {DMC}_{X, [n]}^{(o)}

is closed in

{DMC}_{X, [n]}^{(o)}

which is compact. Therefore, A is compact.

Now let A be a compact subspace of

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. Since

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is Hausdorff, A is strongly closed. It remains to show that A is rank-bounded.

Assume on the contrary that A is rank-unbounded. We can construct a sequence

{({\hat{W}}_{n})}_{n \geq 0}

in A where the rank is strictly increasing, i.e.,

rank ({\hat{W}}_{n}) < rank ({\hat{W}}_{n^{'}})

for every

0 \leq n < n^{'}

. Since the rank of

{({\hat{W}}_{n})}_{n \geq 0}

is strictly increasing, every subsequence of

{({\hat{W}}_{n})}_{n \geq 0}

is rank-unbounded. Proposition 9 implies that every subsequence of

{({\hat{W}}_{n})}_{n \geq 0}

does not converge in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. On the other hand, we have:

A is countably compact because it is compact.
Since A is strongly closed and since $({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})$ is a sequential space, A is sequential.
A is Hausdorff because $({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})$ is Hausdorff.

Now since every countably compact sequential Hausdorff space is sequentially compact [11], A must be sequentially compact. Therefore,

{({\hat{W}}_{n})}_{n \geq 0}

has a converging subsequence which is a contradiction. We conclude that A must be rank-bounded. □

9. The Noisiness Metric on DMC Spaces

Theorem 3 implies that

{DMC}_{X, [n]}^{(o)}

is metrizable for every

n \geq 1

. One might ask whether the spaces

{DMC}_{X, [n]}^{(o)}

are “simultaneously metrizable" in the sense that we can define a metric

d_{n}

on

{DMC}_{X, [n]}^{(o)}

for every

n \geq 1

in such a way that

d_{n}

is the restriction of

d_{n + 1}

for every

n \geq 1

. If this is the case, we can then define a metric on

{DMC}_{X, *}^{(o)} = ⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)}

as

d (\hat{W}, {\hat{W}}^{'}) = d_{n} (\hat{W}, {\hat{W}}^{'})

for any

n \geq 1

satisfying

\hat{W}, {\hat{W}}^{'} \in {DMC}_{X, [n]}^{(o)}

. In this section, we will show that such metrics can be constructed.

9.1. Noisiness Metric on ${DMC}_{X, Y}^{(o)}$

For every

m \geq 1

, let

Δ_{[m] \times X}

be the space of probability distributions on

[m] \times X

.

Let

Y

be a finite set and let

W \in {DMC}_{X, Y}

. For every

p \in Δ_{[m] \times X}

, define

P_{c} (p, W)

as follows:

P_{c} (p, W) = sup_{D \in {DMC}_{Y, [m]}} \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) W (y | x) D (u | y) .

(2)

P_{c} (p, W)

can be interpreted as follows: let

(U, X)

be a pair of random variables distributed according to p, send X through the channel W, and let Y be the output of W in such a way that

U - X - Y

is a Markov chain. Let

\hat{U}

be the estimate of U obtained by applying a random decoder

D \in {DMC}_{Y, [m]}

. In this interpretation, p can be seen as a random encoder. The probability of correctly guessing U by using the decoder D is given by

\sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) W (y | x) D (u | y) .

Therefore,

P_{c} (p, W)

is the optimal probability of correctly guessing U from Y. Note that we can take the supremum in (2) over only deterministic channels

D \in {DMC}_{Y, [m]}

because we can always choose an optimal decoder that is deterministic.

It is well known that if W is degraded from

W^{'}

, then

P_{c} (p, W) \leq P_{c} (p, W^{'})

for every

p \in Δ_{[m] \times X}

and every

m \geq 1

. It was shown in [15] that the converse is also true. Therefore, W is equivalent to

W^{'}

if and only if

P_{c} (p, W) = P_{c} (p, W^{'})

for every

p \in Δ_{[m] \times X}

and every

m \geq 1

. This shows that the quantity

P_{c} (p, W)

depends only on the

R_{X, Y}^{(o)}

-equivalence class of W. Therefore, if

\hat{W} \in {DMC}_{X, Y}^{(o)}

, we can define

P_{c} (p, \hat{W}) : = P_{c} (p, W^{'})

for any

W^{'} \in \hat{W}

.

Define the noisiness distance

d_{X, Y}^{(o)} : {DMC}_{X, Y}^{(o)} \times {DMC}_{X, Y}^{(o)} \to R^{+}

as follows:

d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) = sup_{\begin{matrix} m \geq 1, \\ p \in Δ_{[m] \times X} \end{matrix}} | P_{c} (p, {\hat{W}}_{1}) - P_{c} (p, {\hat{W}}_{2}) | .

It is easy to see that

0 \leq d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) \leq 1

for every

{\hat{W}}_{1}, {\hat{W}}_{2} \in {DMC}_{X, Y}^{(o)}

. Moreover, we have:

$d_{X, Y}^{(o)} (\hat{W}, \hat{W}) = 0$ for every $\hat{W} \in {DMC}_{X, Y}^{(o)}$ .
For every ${\hat{W}}_{1}, {\hat{W}}_{2} \in {DMC}_{X, Y}^{(o)}$ , if $d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) = 0$ , then $P_{c} (p, {\hat{W}}_{1}) = P_{c} (p, {\hat{W}}_{2})$ for every $p \in Δ_{[m] \times X}$ and every $m \geq 1$ , which implies that the channels in ${\hat{W}}_{1}$ are equivalent to the channels in ${\hat{W}}_{2}$ , hence ${\hat{W}}_{1} = {\hat{W}}_{2}$ .
$d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) = d_{X, Y}^{(o)} ({\hat{W}}_{2}, {\hat{W}}_{1})$ for every ${\hat{W}}_{1}, {\hat{W}}_{2} \in {DMC}_{X, Y}^{(o)}$ .
$d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{3}) \leq d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) + d_{X, Y}^{(o)} ({\hat{W}}_{2}, {\hat{W}}_{3})$ for every ${\hat{W}}_{1}, {\hat{W}}_{2}, {\hat{W}}_{3} \in {DMC}_{X, Y}^{(o)}$ .

This shows that

d_{X, Y}^{(o)}

is a metric on

{DMC}_{X, Y}^{(o)}

.

d_{X, Y}^{(o)}

is called the noisiness metric because it compares the “noisiness” of

{\hat{W}}_{1}

with that of

{\hat{W}}_{2}

: if

P_{c} (p, {\hat{W}}_{1})

is close to

P_{c} (p, {\hat{W}}_{2})

for every random encoder p, then

{\hat{W}}_{1}

and

{\hat{W}}_{2}

have close “noisiness levels”.

A natural question to ask is whether the metric topology on

{DMC}_{X, Y}^{(o)}

that is induced by

d_{X, Y}^{(o)}

is the same as the quotient topology

T_{X, Y}^{(o)}

that we defined in Section 6.1. To answer this question, we need the following lemma.

Lemma 6.

For every

W_{1}, W_{2} \in {DMC}_{X, Y}

, we have:

d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) \leq d_{X, Y} (W_{1}, W_{2}),

where

{\hat{W}}_{1}

and

{\hat{W}}_{2}

are the

R_{X, Y}^{(o)}

-equivalence classes of

W_{1}

and

W_{2}

respectively.

Proof.

See Appendix E. □

Proposition 11.

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

and

({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})

are topologically equivalent.

Proof.

Consider the projection mapping

Proj : {DMC}_{X, Y} \to {DMC}_{X, Y}^{(o)}

defined as

Proj (W) = \hat{W}

, where

\hat{W}

is the

R_{X, Y}^{(o)}

-equivalence class of W.

Lemma 6 implies that Proj is a continuous mapping from

({DMC}_{X, Y}, d_{X, Y})

to

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

. Now since

Proj (W) = Proj (W^{'})

whenever

W R_{X, Y}^{(o)} W^{'}

, Lemma 1 implies that the identity mapping

i d : {DMC}_{X, Y}^{(o)} \to {DMC}_{X, Y}^{(o)}

is continuous from

({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})

to

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

. We have:

For every $U \subset {DMC}_{X, Y}^{(o)}$ that is open in $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ , $U = i d^{- 1} (U) \in T_{X, Y}^{(o)}$ because $i d$ is a continuous mapping from $({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})$ to $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ .
For every $U \in T_{X, Y}^{(o)}$ , the set ${DMC}_{X, Y}^{(o)} ∖ U$ is closed in $({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})$ which is compact. Therefore, ${DMC}_{X, Y}^{(o)} ∖ U$ is a compact subset of $({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})$ . Now since $i d$ is continuous from $({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})$ to $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ , ${DMC}_{X, Y}^{(o)} ∖ U = i d ({DMC}_{X, Y}^{(o)} ∖ U)$ is a compact subset of $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ which is Hausdorff (because it is metric). This shows that ${DMC}_{X, Y}^{(o)} ∖ U$ is closed in $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ , which implies that U is open in $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ .

We conclude that

U \subset {DMC}_{X, Y}^{(o)}

is open in

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

if and only if it is open in

({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})

. □

Corollary 10.

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

is a compact path-connected metric space.

The reader might be wondering why we considered and studied the quotient topology

T_{X, Y}^{(o)}

while it is possible to explicitly define a metric on the space

{DMC}_{X, Y}^{(o)}

. There are two reasons:

The definition of $d_{X, Y}^{(o)}$ does not seem to be intuitive at the first sight and it is not clear why one would adopt it as a standard metric on ${DMC}_{X, Y}^{(o)}$ . Just being a metric is not convincing enough. On the other hand, the existence of a natural standard topology on ${DMC}_{X, Y}$ makes the quotient topology the most natural starting point.
If one wants to show that a mapping $f : {DMC}_{X, Y}^{(o)} \to S$ is continuous from $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ to a topological space $(S, V)$ , it is much easier to prove it through the quotient topology $T_{X, Y}^{(o)}$ rather than proving it directly using the metric $d_{X, Y}^{(o)}$ . Therefore, it is important to show the topological equivalence between $({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})$ and $({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})$ .

It is worth mentioning that in the proof of Proposition 11, the only topological property of

({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})

that we used is its compactness. This means that we do not need Lemma 3 to prove Theorem 3. An alternative proof of Theorem 3 would be to show the compactness and path-connectedness by inheriting those properties from

{DMC}_{X, Y}

, and then show that

({DMC}_{X, Y}^{(o)}, T_{X, Y}^{(o)})

is topologically equivalent to

({DMC}_{X, Y}^{(o)}, d_{X, Y}^{(o)})

as in Proposition 11.

The main reason why we restricted ourselves to topological methods in Section 6.1 is because they might be useful if one wants to generalize our results to spaces of non-discrete channels. It might not be easy to find an explicit metric for those spaces, or even worse, those spaces might fail to be metrizable. Therefore, one might want to prove weaker topological properties such as being Hausdorff and/or regular. In such cases, the methods of Section 6.1 might be useful.

9.2. Noisiness Metric on ${DMC}_{X, *}^{(o)}$

For every

{\hat{W}}_{1}, {\hat{W}}_{2} \in {DMC}_{X, *}^{(o)}

, define the noisiness metric on

{DMC}_{X, *}^{(o)}

as follows:

d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) : = d_{X, [n]}^{(o)} (\hat{W}, {\hat{W}}^{'}) where n \geq 1 satisfies \hat{W}, {\hat{W}}^{'} \in {DMC}_{X, [n]}^{(o)} .

d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'})

is well defined because

d_{X, [n]}^{(o)} (\hat{W}, {\hat{W}}^{'})

does not depend on

n \geq 1

as long as

\hat{W}, {\hat{W}}^{'} \in {DMC}_{X, [n]}^{(o)}

. We can also express

d_{X, *}^{(o)}

as follows:

d_{X, *}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) = sup_{\begin{matrix} m \geq 1, \\ p \in Δ_{[m] \times X} \end{matrix}} | P_{c} (p, {\hat{W}}_{1}) - P_{c} (p, {\hat{W}}_{2}) | .

It is easy to see that

d_{X, *}^{(o)}

is a metric on

{DMC}_{X, *}^{(o)}

. Let

T_{X, *}^{(o)}

be the metric topology on

{DMC}_{X, *}^{(o)}

that is induced by

d_{X, *}^{(o)}

. We call

T_{X, *}^{(o)}

the noisiness topology on

{DMC}_{X, *}^{(o)}

.

Clearly,

T_{X, *}^{(o)}

is natural because the restriction of

d_{X, *}^{(o)}

on

{DMC}_{X, [n]}^{(o)}

is exactly

d_{X, [n]}^{(o)}

, and the topology induced by

d_{X, [n]}^{(o)}

is

T_{X, [n]}^{(o)}

. If

| X | \geq 2

, Proposition 8 and Corollary 7 imply that

({DMC}_{X, *}^{(o)}, d_{X, *}^{(o)})

is not complete nor locally compact.

Since

T_{s, X, *}^{(o)}

is the finest natural topology,

T_{s, X, *}^{(o)}

is finer than

T_{X, *}^{(o)}

. On the other hand, if

| X | \geq 2

,

T_{X, *}^{(o)}

is metrizable and

T_{s, X, *}^{(o)}

is not (because it is not first-countable). Therefore, if

| X | \geq 2

, the strong topology

T_{s, X, *}^{(o)}

is strictly finer than the noisiness topology

T_{X, *}^{(o)}

.

It is worth mentioning that Propositions 9 and 10 do not hold for

({DMC}_{X, *}^{(o)}, T_{X, *}^{(o)})

. It is easy to find a rank-unbounded sequence

{{\hat{W}}_{n}}_{n \geq 0}

which converges in

({DMC}_{X, *}^{(o)}, T_{X, *}^{(o)})

to a point

\hat{W} \in {DMC}_{X, *}^{(o)}

. The set

{{\hat{W}}_{n} : n \geq 0} \cup {\hat{W}}

is clearly compact and rank-unbounded.

10. Topologies from Blackwell Measures

We saw in Section 8 that for every

\hat{W} \in {DMC}_{X, *}^{(o)}

, a Blackwell measure

{MP}_{\hat{W}}

on

Δ_{X}

is defined. Moreover, Proposition 2 implies that

\hat{W}

is uniquely determined by

{MP}_{\hat{W}}

. Therefore, each

R_{X, *}^{(o)}

-equivalence class in

{DMC}_{X, *}^{(o)}

can be identified with its Blackwell measure. On the other hand, Proposition 1 shows that the collection of Blackwell measures of the channels with input alphabet

X

is the same as the collection of balanced and finitely supported meta-probability measures on

X

.

Therefore, the mapping

\hat{W} \to {MP}_{\hat{W}}

is a bijection from

{DMC}_{X, *}^{(o)}

to

{MP}_{b f} (X)

. We call this mapping the canonical bijection from

{DMC}_{X, *}^{(o)}

to

{MP}_{b f} (X)

. Similarly, the inverse mapping is called the canonical bijection from

{MP}_{b f} (X)

to

{DMC}_{X, *}^{(o)}

.

Since

Δ_{X}

is a metric space, there are many standard ways to construct topologies on

MP (X)

. If we choose any of these standard topologies on

MP (X)

and then relativize it to the subspace

{MP}_{b f} (X)

, we can construct topologies on

{DMC}_{X, *}^{(o)}

through the canonical bijection.

We saw in Section 3.4 that there are three topologies that can be constructed on

MP (X)

: the total variation topology, the strong convergence topology, and the weak-* topology. However, since every measure in

{MP}_{b f} (X)

is a finitely supported measure, strong convergence and total variation convergence are equivalent in

{MP}_{b f} (X)

(see Section 3.4). Therefore, it is sufficient to study the total-variation topology and the weak-* topology. We will start by studying the weak-* topology.

10.1. Weak-* Topology

We first note that in the case of binary input channels, the weak-* topology is equivalent to the topology induced by the convergence in distribution of D-densities (or L-densities, or G-densities) that was defined in [4]. Note also that the weak-* topology is equivalent to the topology that is induced by the Le Cam deficiency distance [7].

Consider the topology on

{DMC}_{X, *}^{(o)}

that is obtained by transporting the weak-* topology from

{MP}_{b f} (X)

to

{DMC}_{X, *}^{(o)}

through the canonical bijection

F_{can}

, i.e., we let

U \subset {DMC}_{X, *}^{(o)}

be open if and only if

F_{can}^{- 1} (U)

is weakly-* open. We will call this topology the weak-* topology on

{DMC}_{X, *}^{(o)}

.

In this section, we show that the weak-* topology is the same as the noisiness topology

T_{X, *}^{(o)}

. We will show this using the Wasserstein metric.

Since

Δ_{X}

is complete and separable, the 1^st-Wasserstein distance metrizes the weak-* topology [13]. Therefore, in order to show that the weak-* topology and the noisiness topology

T_{X, *}^{(o)}

are the same, it is sufficient to show that the canonical bijection

F_{can}

from

({MP}_{b f} (X), W_{1})

to

({DMC}_{X, *}^{(o)}, d_{X, *}^{(o)})

is a homeomorphism.

Note that since

Δ_{X}

is compact, the metric space

(MP (X), W_{1})

is compact as well [13].

Lemma 7.

For every

\hat{W}, {\hat{W}}^{'} \in {DMC}_{X, *}^{(o)}

, we have

d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) \leq | X | \cdot W_{1} ({MP}_{\hat{W}}, {MP}_{{\hat{W}}^{'}})

.

Proof.

See Appendix F. □

Lemma 7 can also be expressed as follows: for every

MP, {MP}^{'} \in {MP}_{b f} (X)

, we have

d_{X, *}^{(o)} (F_{can} (MP), F_{can} ({MP}^{'})) \leq | X | \cdot W_{1} (MP, {MP}^{'})

. This shows that the canonical bijection

F_{can}

is continuous. Therefore, the weak-* topology is at least as strong as

T_{X, *}^{(o)}

. It remains to show that

F_{can}^{- 1}

is continuous. One approach to prove the continuity of

F_{can}^{- 1}

is to find a lower bound of

d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'})

in terms of the Wasserstein metric, but this is tedious. We will follow another approach in order to show that the canonical bijection

F_{can}

is a homeomorphism. We need the following proposition:

Proposition 12.

The weak-* closure of

{MP}_{b f} (X)

is

{MP}_{b} (X)

.

Proof.

See Appendix G. □

Theorem 5.

The weak-* topology on

{DMC}_{X, *}^{(o)}

is the same as the noisiness topology

T_{X, *}^{(o)}

.

Proof.

Let

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

be a completion of

({DMC}_{X, *}^{(o)}, d_{X, *}^{(o)})

. Since

{MP}_{b} (X)

is the weak-* closure of

{MP}_{b f} (X)

(Proposition 12), we can extend the canonical bijection

F_{can} : {MP}_{b f} (X) \to {DMC}_{X, *}^{(o)}

to a mapping

\bar{F} : {MP}_{b} (X) \to {\bar{DMC}}_{X, *}^{(o)}

as follows:

\bar{F} (MP) = lim_{n \to \infty} F_{can} ({MP}_{n}),

(3)

where

{({MP}_{n})}_{n \geq 0}

is any sequence in

{MP}_{b f} (X)

that converges to

MP \in {MP}_{b} (X)

, and where the limit in (3) is taken inside

{\bar{DMC}}_{X, *}^{(o)}

. In order to show that

\bar{F}

is well defined, we have to make sure that the limit in (3) exists and that it does not depend on the sequence

{({MP}_{n})}_{n \geq 0}

.

Since the sequence

{({MP}_{n})}_{n \geq 0}

converges, it is a Cauchy sequence. Therefore, for every

ϵ > 0

there exists

n_{0} > 0

such that for every

n_{1}, n_{2} \geq 1

we have

W_{1} ({MP}_{n_{1}}, {MP}_{n_{2}}) < \frac{ϵ}{| X |}

. By Lemma 7, we have

{\bar{d}}_{X, *}^{(o)} (F_{can} ({MP}_{n_{1}}), F_{can} ({MP}_{n_{2}})) = d_{X, *}^{(o)} (F_{can} ({MP}_{n_{1}}), F_{can} ({MP}_{n_{2}})) \leq | X | \cdot W_{1} ({MP}_{n_{1}}, {MP}_{n_{2}}) < ϵ .

Therefore,

{(F_{can} ({MP}_{n}))}_{n \geq 0}

is a Cauchy sequence in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

which is complete, hence the limit in (3) exists. Now assume that

{({MP}_{n}^{'})}_{n \geq 0}

is another sequence in

{MP}_{b f} (X)

which converges to MP. We have:

\begin{matrix} lim_{n \to \infty} {\bar{d}}_{X, *}^{(o)} (F_{can} ({MP}_{n}), F_{can} ({MP}_{n}^{'})) & = lim_{n \to \infty} d_{X, *}^{(o)} (F_{can} ({MP}_{n}), F_{can} ({MP}_{n}^{'})) \\ \overset{(a)}{\leq} lim_{n \to \infty} | X | \cdot W_{1} ({MP}_{n}, {MP}_{n}^{'}) \overset{(b)}{=} 0, \end{matrix}

where (a) follows from Lemma 7 and (b) follows from the fact that

{({MP}_{n})}_{n \geq 0}

and

{({MP}_{n}^{'})}_{n \geq 0}

converge to the same point. Therefore,

{(F_{can} ({MP}_{n}))}_{n \geq 0}

and

{(F_{can} ({MP}_{n}^{'}))}_{n \geq 0}

converge to the same point in

{\bar{DMC}}_{X, *}^{(o)}

. We conclude that

\bar{F}

is well defined.

Now fix

MP, {MP}^{'} \in {MP}_{b} (X)

and let

{({MP}_{n})}_{n \geq 0}

and

{({MP}_{n}^{'})}_{n \geq 0}

be two sequences in

{MP}_{b f} (X)

that converge to MP and

{MP}^{'}

respectively. We have:

\begin{matrix} {\bar{d}}_{X, *}^{(o)} (\bar{F} (MP), \bar{F} ({MP}^{'})) & = {\bar{d}}_{X, *}^{(o)} (lim_{n \to \infty} F_{can} ({MP}_{n}), lim_{n \to \infty} F_{can} ({MP}_{n}^{'})) \\ \overset{(a)}{=} lim_{n \to \infty} {\bar{d}}_{X, *}^{(o)} (F_{can} ({MP}_{n}), F_{can} ({MP}_{n}^{'})) \\ = lim_{n \to \infty} d_{X, *}^{(o)} (F_{can} ({MP}_{n}), F_{can} ({MP}_{n}^{'})) \\ \overset{(b)}{\leq} lim_{n \to \infty} | X | \cdot W_{1} ({MP}_{n}, {MP}_{n}^{'}) \overset{(c)}{=} | X | \cdot W_{1} (MP, {MP}^{'}), \end{matrix}

where (a) and (c) follow from the fact that metric distances are continuous, and (b) follows from Lemma 7. Therefore,

\bar{F}

is continuous from

({MP}_{b} (X), W_{1})

to

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

. Moreover, since

{MP}_{b} (X)

is weakly-* closed in

MP (X)

which is compact,

{MP}_{b} (X)

is compact under the weak-* topology. Therefore for every weakly-* closed subset A of

{MP}_{b} (X)

, A is compact and so

\bar{F} (A)

is compact in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

which is Hausdorff. This implies that

\bar{F} (A)

is closed in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

for every weakly-* closed subset A of

{MP}_{b} (X)

. Therefore,

\bar{F}

is both continuous and closed. In particular,

\bar{F} ({MP}_{b} (X))

is closed in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

. However,

\bar{F} ({MP}_{b} (X)) \supset \bar{F} ({MP}_{b f} (X)) = F_{can} ({MP}_{b f} (X)) = {DMC}_{X, *}^{(o)}

, and

{DMC}_{X, *}^{(o)}

is dense in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

. Therefore, we must have

\bar{F} ({MP}_{b} (X)) = {\bar{DMC}}_{X, *}^{(o)}

. We conclude that

\bar{F}

is a homeomorphism from

({MP}_{b} (X), W_{1})

to

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

.

Now since

\bar{F} ({MP}_{b f} (X)) = {DMC}_{X, *}^{(o)}

, the restriction of

\bar{F}

to

{MP}_{b f} (X)

is a homeomorphism from

({MP}_{b f} (X), W_{1})

to

({DMC}_{X, *}^{(o)}, d_{X, *}^{(o)})

. However, the restriction of

\bar{F}

to

{MP}_{b f} (X)

is nothing but

F_{can}

. We conclude that the canonical bijection is a homeomorphism from

({MP}_{b f} (X), W_{1})

to

({DMC}_{X, *}^{(o)}, d_{X, *}^{(o)})

. Therefore, the weak-* topology on

{DMC}_{X, *}^{(o)}

is the same as the noisiness topology

T_{X, *}^{(o)}

. □

Since

({MP}_{b} (X), W_{1})

is homeomorphic to

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

, we can interpret this by saying that

{\bar{DMC}}_{X, *}^{(o)}

is the space of all equivalent channels with input alphabet

X

and arbitrary output alphabet (with arbitrary cardinality). Moreover, since

{DMC}_{X, *}^{(o)}

is dense in

({\bar{DMC}}_{X, *}^{(o)}, {\bar{d}}_{X, *}^{(o)})

, we can say that any channel with input alphabet

X

can be approximated in the noisiness/weak-* sense by a channel having a finite output alphabet.

10.2. Total Variation Topology

The total-variation metric distance

d_{T V, X, *}^{(o)}

on

{DMC}_{X, *}^{(o)}

is defined as

d_{T V, X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) = {∥ {MP}_{\hat{W}} - {MP}_{{\hat{W}}^{'}} ∥}_{T V} .

The total-variation topology

T_{T V, X, *}^{(o)}

is the metric topology that is induced by

d_{T V, X, *}^{(o)}

on

{DMC}_{X, *}^{(o)}

. We will refer to the open sets (respectively, closed sets, compact sets, …) of

T_{T V, X, *}^{(o)}

as TV-open (respectively, TV-closed, TV-compact, …). The same notation is also used for open sets of

{MP}_{b f} (X)

,

{MP}_{b} (X)

and

MP (X)

in the total variation topology.

Proposition 13.

If

| X | \geq 2

and

n \geq 2

, then

{DMC}_{X, [n]}^{(o)}

is not TV-compact in

{DMC}_{X, *}^{(o)}

.

Proof.

Let

p, p^{'} \in Δ_{X}

be such that

p \neq p^{'}

and

\frac{1}{2} p + \frac{1}{2} p^{'} = π_{X}

, where

π_{X}

is the uniform distribution on

X

. For every

n \geq 1

, define

p_{n}, p_{n}^{'} \in Δ_{X}

as

p_{n} = \frac{1}{n} p + (1 - \frac{1}{n}) π_{X},

and

p_{n}^{'} = \frac{1}{n} p^{'} + (1 - \frac{1}{n}) π_{X} .

Clearly,

\frac{1}{2} p_{n} + \frac{1}{2} p_{n}^{'} = π_{X}

for every

n \geq 1

.

Now let

{MP}_{n} = \frac{1}{2} δ_{p_{n}} + \frac{1}{2} δ_{p_{n}^{'}}

, where

δ_{p_{n}}

and

δ_{p_{n}^{'}}

are Dirac measures centered at

p_{n}

and

p_{n}^{'}

respectively. Clearly,

{MP}_{n}

is balanced and finitely supported for every

n \geq 1

. Let

{\hat{W}}_{n} = F_{can} ({MP}_{n})

. We have

| supp ({MP}_{{\hat{W}}_{n}}) | = | supp ({MP}_{n}) | = | {p_{n}, p_{n}^{'}} | = 2 .

Therefore,

{\hat{W}}_{n} \in {DMC}_{X, [2]}^{(o)} \subset {DMC}_{X, [m]}^{(o)}

for every

n \geq 1

and every

m \geq 2

. It is easy to see that

d_{T V, X, *}^{(o)} ({\hat{W}}_{n_{1}}, {\hat{W}}_{n_{2}}) = {∥ {MP}_{n_{1}} - {MP}_{n_{2}} ∥}_{T V} = 1

for every

n_{2} > n_{1} \geq 1

. Therefore, no subsequence of

{({MP}_{n})}_{n \geq 1}

can converge. This means that

{DMC}_{X, [m]}^{(o)}

is not sequentially compact for any

m \geq 2

. Now since

T_{T V, X, *}^{(o)}

is metrizable, we conclude that

{DMC}_{X, [n]}^{(o)}

is not compact for any

n \geq 2

. □

Corollary 11.

If

| X | \geq 2

, then

T_{T V, X, *}^{(o)}

is not a natural topology.

Proof.

If

T_{T V, X, *}^{(o)}

were natural,

{DMC}_{X, [2]}^{(o)}

would be compact, and this is not the case. □

Since the noisiness topology is the same as the weak-* topology,

T_{X, *}^{(o)}

is coarser than

T_{T V, X, *}^{(o)}

. On the other hand, since

T_{X, *}^{(o)}

is natural and

T_{T V, X, *}^{(o)}

is not,

T_{X, *}^{(o)}

is strictly coarser than

T_{T V, X, *}^{(o)}

when

| X | \geq 2

.

Note that the sequence

{({MP}_{n})}_{n \geq 1}

in the proof of Proposition 13 converges in the strong topology because of Proposition 9. Therefore,

T_{s, X, *}^{(o)}

is not finer than

T_{T V, X, *}^{(o)}

.

Although

T_{T V, X, *}^{(o)}

is not a natural topology itself, it has many properties of natural topologies.

Proposition 14.

If

| X | \geq 2

, every non-empty TV-open subset of

{DMC}_{X, *}^{(o)}

is rank-unbounded.

Proof.

Let U be a non-empty TV-open set of

{DMC}_{X, *}^{(o)}

. Let

\hat{W} \in U

and let

ϵ > 0

be such that

{\hat{W}}^{'} \in U

whenever

d_{T V, X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) < ϵ

.

Let p,

p^{'}

,

{(p_{n})}_{n \geq 1}

and

{(p_{n}^{'})}_{n \geq 1}

be as in Proposition 13. For every

n \geq 1

, define

{MP}_{n} \in MP (X)

as follows:

{MP}_{n} = (1 - \frac{ϵ}{4 n}) {MP}_{\hat{W}} + \frac{ϵ}{8 n^{2}} \cdot \sum_{i = 1}^{n} (δ_{p_{i}} + δ_{p_{i}^{'}}) .

Clearly,

{MP}_{n}

is balanced and finitely supported, so

{MP}_{n} \in {MP}_{b f} (X)

. Moreover,

d_{T V, X, *}^{(o)} (F_{can} ({MP}_{n}), \hat{W}) = {∥ {MP}_{n} - {MP}_{\hat{W}} ∥}_{T V} \leq \frac{ϵ}{2 n} < ϵ .

Therefore,

F_{can} ({MP}_{n}) \in U

for every

n \geq 1

. On the other hand,

supp ({MP}_{n}) \supset {p_{i}, p_{i}^{'} : 1 \leq i \leq n}

, which means that

| supp ({MP}_{n}) | \geq 2 n

and so

F_{can} ({MP}_{n}) \notin {DMC}_{X, [n]}^{(o)}

for every

n \geq 1

. We conclude that U is rank-unbounded. □

Corollary 12.

If

| X | \geq 2

, the TV-interior of

{DMC}_{X, [n]}^{(o)}

in

{DMC}_{X, *}^{(o)}

is empty.

Note that the sequence

{(F_{can} ({MP}_{n}))}_{n \geq 1}

in the proof of Proposition 14 is rank-unbounded and converges in total variation to

\hat{W}

. On the other hand, Proposition 9 implies that

{(F_{can} ({MP}_{n}))}_{n \geq 1}

does not converge in

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

. We conclude that

T_{T V, X, *}^{(o)}

is not finer than

T_{s, X, *}^{(o)}

.

Although

{DMC}_{X, [n]}^{(o)}

is not TV-compact if

| X | \geq 2

and

n \geq 2

, it is TV-complete:

Proposition 15.

For every

n \geq 1

,

{DMC}_{X, [n]}^{(o)}

is TV-complete in

{DMC}_{X, *}^{(o)}

.

Proof.

Let

{MP}_{b, n} (X)

be the set of balanced meta-probability measures whose support is of size at most n:

{MP}_{b, n} (X) = {MP \in {MP}_{b} (X) : | supp (MP) | \leq n} .

Since

({DMC}_{X, [n]}^{(o)}, d_{T V, X, *}^{(o)})

is isometric to

({MP}_{b, n} (X), ∥ \cdot ∥_{T V})

, and since

(MP (X), ∥ \cdot ∥_{T V})

is complete, it is sufficient to show that

{MP}_{b, n} (X)

is TV-closed in

MP (X)

.

Let MP be in the TV-closure of

{MP}_{b, n} (X)

. Since we are working in a metric space, there exists a sequence

{({MP}_{m})}_{m \geq 0}

in

{MP}_{b, n} (X)

that TV-converges to MP. Assume that

MP \notin {MP}_{b, n} (X)

. There exist

p_{1}, \dots, p_{n + 1} \in Δ_{X}

that are pairwise different and which satisfy

MP (p_{i}) > 0

for every

1 \leq i \leq n + 1

. Since

{({MP}_{m})}_{m \geq 0}

TV-converges to MP, there exists

m_{0} \geq 0

such that

{MP}_{m_{0}} (p_{i}) > 0

for every

1 \leq i \leq n + 1

. This contradicts the fact

{MP}_{m_{0}} \in {MP}_{b, n} (X)

. Therefore,

MP \in {MP}_{b, n} (X)

for every MP in the TV-closure of

{MP}_{b, n} (X)

. This shows that

{MP}_{b, n} (X)

is TV-closed. Therefore,

{DMC}_{X, [n]}^{(o)}

is TV-complete in

{DMC}_{X, *}^{(o)}

. □

Proposition 16.

If

| X | \geq 2

,

({DMC}_{X, *}^{(o)}, T_{T V, X, *}^{(o)})

is neither Baire nor locally compact anywhere.

Proof.

Since

{DMC}_{X, [n]}^{(o)}

is TV-complete, it is TV-closed. Since it also has empty TV-interior, the same techniques that were used for natural topologies in Section 7.1 can be applied for

T_{T V, X, *}^{(o)}

. □

The above proposition shows that

({DMC}_{X, *}^{(o)}, T_{T V, X, *}^{(o)})

cannot be completely metrized. Note that since

({DMC}_{X, *}^{(o)}, d_{T V, X, *}^{(o)})

is isometric to

({MP}_{b f} (X), ∥ \cdot ∥_{T V})

, and since

(MP (X), ∥ \cdot ∥_{T V})

is complete, the completion of

({DMC}_{X, *}^{(o)}, d_{T V, X, *}^{(o)})

is isometric to the closure of

{MP}_{b f} (X)

in

(MP (X), ∥ \cdot ∥_{T V})

. It can be shown that the TV-closure of

{MP}_{b f} (X)

in

MP (X)

is the set of all balanced and countably supported meta-probability measures on

X

. Therefore, the completion of

({DMC}_{X, *}^{(o)}, d_{T V, X, *}^{(o)})

can be thought of as the space of equivalent channels from

X

to a countably infinite output alphabet. This allows us to say that any channel with input alphabet

X

and a countable output alphabet can be approximated in the total variation sense by a channel having a finite output alphabet.

11. The Natural Borel $σ$ -algebra on ${DMC}_{X, *}^{(o)}$

Let

T

be a Hausdorff natural topology on

{DMC}_{X, *}^{(o)}

. Since

T_{s, X, *}^{(o)}

is the finest natural topology, we have

T \subset T_{s, X, *}^{(o)}

. Therefore,

B (T) \subset B (T_{s, X, *}^{(o)})

.

On the other hand, for every

U \in T_{s, X, *}^{(o)}

and every

n \geq 1

, we have

U \cap {DMC}_{X, [n]}^{(o)} \in T_{X, [n]}^{(o)}

. However,

T

is a natural topology, so there must exist

U_{n} \in T

such that

U_{n} \cap {DMC}_{X, [n]}^{(o)} = U \cap {DMC}_{X, [n]}^{(o)}

. Since

U_{n} \in T

, we have

U_{n} \in B (T)

. Moreover,

{DMC}_{X, [n]}^{(o)}

is

T

-closed (because it is compact and

T

is Hausdorff). Therefore,

{DMC}_{X, [n]}^{(o)} \in B (T)

. This implies that

U \cap {DMC}_{X, [n]}^{(o)} = U_{n} \cap {DMC}_{X, [n]}^{(o)} \in B (T)

, hence

U = ⋃_{n \geq 1} (U \cap {DMC}_{X, [n]}^{(o)}) \in B (T) .

Since this is true for every

U \in T_{s, X, *}^{(o)}

, we have

T_{s, X, *}^{(o)} \subset B (T)

which implies that

B (T_{s, X, *}^{(o)}) \subset B (T)

. We conclude that all Hausdorff natural topologies on

{DMC}_{X, *}^{(o)}

have the same

σ

-algebra. This

σ

-algebra deserves to be called the natural Borel σ-algebra on

{DMC}_{X, *}^{(o)}

.

Note that for every

n \geq 1

, the inclusion mapping

i_{n} : {DMC}_{X, [n]}^{(o)} \to {DMC}_{X, *}^{(o)}

is continuous from

({DMC}_{X, [n]}^{(o)}, T_{X, [n]}^{(o)})

to

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

, hence it is measurable. Therefore, for every

B \in B (T_{s, X, *}^{(o)})

, we have

i_{n}^{- 1} (B) = B \cap {DMC}_{X, [n]}^{(o)} \in B (T_{X, [n]}^{(o)})

. In the following, we show a converse for this statement.

Fix

n \geq 1

and let

U \in T_{X, [n]}^{(o)}

. There exists

U^{'} \in T_{s, X, *}^{(o)}

such that

U = U^{'} \cap {DMC}_{X, [n]}^{(o)}

. Since

U^{'}

and

{DMC}_{X, [n]}^{(o)}

are respectively open and closed in the topology

T_{s, X, *}^{(o)}

, they are both in its Borel

σ

-algebra. Therefore,

U = U^{'} \cap {DMC}_{X, [n]}^{(o)} \in B (T_{s, X, *}^{(o)})

for every

U \in T_{X, [n]}^{(o)}

. This means that

T_{X, [n]}^{(o)} \subset B (T_{s, X, *}^{(o)})

and

B (T_{X, [n]}^{(o)}) \subset B (T_{s, X, *}^{(o)})

for every

n \geq 1

.

Assume now that

A \subset {DMC}_{X, *}^{(o)}

satisfies

A \cap {DMC}_{X, [n]}^{(o)} \in B (T_{X, [n]}^{(o)})

for every

n \geq 1

. This implies that

A \cap {DMC}_{X, [n]}^{(o)} \in B (T_{s, X, *}^{(o)})

for every

n \geq 1

, hence

A = ⋃_{n \geq 1} (A \cap {DMC}_{X, [n]}^{(o)}) \in B (T_{s, X, *}^{(o)}) .

We conclude that a subset A of

{DMC}_{X, *}^{(o)}

is in the natural Borel

σ

-algebra if and only if

A \cap {DMC}_{X, [n]}^{(o)} \in B (T_{X, [n]}^{(o)})

for every

n \geq 1

.

12. Discussion and Conclusions

The fact that the noisiness and weak-* topologies are the same gives us more freedom in proving theorems. Statements that can be hard to prove using the weak-* formulation might be easier to prove using the noisiness formulation.

The strong topology is too strong to be adopted as the “standard natural topology”. However, it can still be useful because it is relatively easy to work with as it has a quotient formulation. Moreover, since it is finer than the noisiness/weak-* topology, many statements that are true for the strong topology are also true for coarser topologies, e.g., any sequence that converges in the strong topology also converges in the noisiness/weak-* topology.

Although the total variation topology is not natural, it can still be useful because it is finer than the noisiness/weak-* topology.

Many interesting questions remain open: Are all natural topologies Hausdorff? Can we find more topological properties that are common for all natural topologies? Is there a coarsest natural topology? Is there a natural topology that is coarser than the noisiness/weak-* one?

Finding meaningful measures on

{DMC}_{X, *}^{(o)}

might be challenging. One might be tempted to require that the measure of

{DMC}_{X, [n]}^{(o)}

should be zero because it is “finite dimensional” whereas

{DMC}_{X, *}^{(o)}

is “infinite dimensional”. On the other hand, if

{DMC}_{X, [n]}^{(o)}

has a zero measure for every

n \geq 1

, the whole space

{DMC}_{X, *}^{(o)}

will have a zero measure because it is a countable union of these subspaces. Nevertheless, statements such as “the property

X

is true for almost all channels” can still make sense. One possible definition of null-sets is as follows: for every set A in the natural Borel

σ

-algebra, we say that A is a null-set if and only if there exists

n_{0} \geq 1

such that

P_{n} ({Proj}_{n}^{- 1} (A \cap {DMC}_{X, [n]}^{(o)})) = 0

for every

n \geq n_{0}

, where

{Proj}_{n}

is the projection onto the

R_{X, [n]}^{(o)}

-equivalence classes and

P_{n}

is the uniform probability measure on

{DMC}_{X, [n]} \equiv {(Δ_{[n]})}^{X}

. Another possible definition, which is weaker, is to say that A is a null-set if and only if

lim_{n \to \infty} P_{n} ({Proj}_{n}^{- 1} (A \cap {DMC}_{X, [n]}^{(o)})) = 0

.

Another notion of equivalence is the Shannon-equivalence that allows randomization at both the input and the output, as well as shared randomness between the transmitter and the receiver [16]. The Shannon deficiency that was introduced in [17] compares a particular channel with the Shannon-equivalence-class of another channel, but it is not a metric distance between Shannon-equivalence-classes. In [18], we provide a characterization of the Shannon ordering and we prove that some of the results of this paper holds for the space of Shannon-equivalent channels.

In [19], we introduce the notions of input-degradedness and input-equivalence. A channel W is said to be input-degraded from another channel

W^{'}

if W can be simulated from

W^{'}

by local operations at the input. In [19], we provide a characterization of input-degradedness and and we prove that many of the results of this paper hold for the space of input-equivalent channels.

Acknowledgments

I would like to thank Emre Telatar and Mohammad Bazzi for helpful discussions. I am also grateful to Maxim Raginsky for his comments.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BEC	Binary erasure channel
BSC	Binary symmetric channel
TV	Total variation
DMC	Discrete memoryless channel

Appendix A. Proof of Proposition 2

For every

A \subset Δ_{X}

, let

co (A)

be the convex hull of A. We say that

p \in A

is convex-extreme if it is an extreme point of

co (A)

, i.e., for every

p_{1}, \dots, p_{n} \in co (A)

and every

λ_{1}, \dots, λ_{n} > 0

satisfying

\sum_{i = 1}^{n} λ_{i} = 1

and

\sum_{i = 1}^{n} λ_{i} p_{i} = p

, we have

p_{1} = \dots = p_{n} = p

. It is easy to see that if A is finite, then the convex-extreme points of A coincide with the extreme points of

co (A)

. We denote the set of convex-extreme points of A as

CE (A)

.

Let

W \in {DMC}_{X, Y}

and

W^{'} \in {DMC}_{X, Z}

be such that

W^{'}

is degraded from W. There exists

V \in {DMC}_{Y, Z}

such that

W^{'} = V \circ W

. Let X be a random variable uniformly distributed in

X

, let Y be the output of W when X is the input, and let Z be the output of V when Y is the input in such a way that

X - Y - Z

is a Markov chain. Clearly,

P_{Z | X} (z | x) = W^{'} (z | x)

for every

(x, z) \in X \times Z

.

For every

z \in Z

, we have:

P_{W^{'}}^{o} (z) = P_{Z} (z) = \sum_{y \in Y} P_{Y} (y) P_{Z | Y} (z | y) = \sum_{y \in Im (W)} V (z | y) P_{W}^{o} (y) .

(A1)

Define

V^{- 1} \in {DMC}_{Im (W^{'}), Im (W)}

as

V^{- 1} (y | z) = P_{Y | Z} (y | z) = \frac{P_{Y} (y) P_{Z | Y} (z | y)}{P_{Z} (z)} = \frac{V (z | y) P_{W}^{o} (y)}{\sum_{y^{'} \in Im (W)} V (z | y^{'}) P_{W}^{o} (y^{'})} .

Note that for every

(y, z) \in Im (W) \times Im (W^{'})

, we have

V^{- 1} (y | z) = 0

if and only if

V (z | y) = 0

.

For every

(x, z) \in X \times Im (W^{'})

, we have:

\begin{matrix} {W_{z}^{'}}^{- 1} (x) & = P_{X | Z} (x | z) = \sum_{y \in Y} P_{X, Y | Z} (x, y | z) = \sum_{\begin{matrix} y \in Y, \\ P_{Y} (y) > 0 \end{matrix}} P_{X, Y | Z} (x, y | z) \\ = \sum_{y \in Im (W)} P_{Y | Z} (y | z) P_{X | Y, Z} (x | y, z) \overset{(a)}{=} \sum_{y \in Im (W)} V^{- 1} (y | z) P_{X | Y} (x | y) \\ = \sum_{y \in Im (W)} V^{- 1} (y | z) W_{y}^{- 1} (x), \end{matrix}

(A2)

where (a) follows from the fact that

X - Y - Z

is a Markov chain.

Equation (A2) shows that for every

z \in Im (W^{'})

, we have

{W_{z}^{'}}^{- 1} \in co ({W_{y}^{- 1} : y \in Im (W)}) = co (supp ({MP}_{W})) .

Therefore,

co (supp ({MP}_{W^{'}})) = co ({W_{z}^{' - 1} : z \in Im (W^{'})}) \subset co (supp ({MP}_{W})) .

(A3)

Now for every

p \in Δ_{X}

, define

Y_{p} : = {y \in Im (W) : W_{y}^{- 1} = p} .

Similarly,

Z_{p} : = {z \in Im (W^{'}) : {W_{z}^{'}}^{- 1} = p} .

Let

p_{e x t} \in CE (supp ({MP}_{W}))

and let

z \in Im (W^{'})

. Equation (A2) shows that if

z \in Z_{p_{e x t}}

, then

V^{- 1} (y | z) = 0

for every

y \in Im (W) ∖ Y_{p_{e x t}}

. Now since

V^{- 1} (y | z) = 0 \Leftrightarrow V (z | y) = 0

for every

(y, z) \in Im (W) \times Im (W^{'})

, we deduce that if

z \in Z_{p_{e x t}}

then

V (z | y) = 0

for every

y \in Im (W) ∖ Y_{p_{e x t}}

.

Therefore,

\begin{matrix} {MP}_{W^{'}} (p_{e x t}) & = \sum_{z \in Z_{p_{e x t}}} P_{W^{'}}^{o} (z) \overset{(a)}{=} \sum_{z \in Z_{p_{e x t}}} \sum_{y \in Im (W)} V (z | y) P_{W}^{o} (y) \\ \overset{(b)}{=} \sum_{z \in Z_{p_{e x t}}} \sum_{y \in Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y) \leq \sum_{z \in Im (W^{'})} \sum_{y \in Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y) \\ = \sum_{y \in Y_{p_{e x t}}} P_{W}^{o} (y) = {MP}_{W} (p_{e x t}), \end{matrix}

(A4)

where (a) follows from Equation (A1), and (b) follows from the fact that for every

y \in Im (W) ∖ Y_{p_{e x t}}

, we have

V (z | y) = 0

.

Now assume that W and

W^{'}

are equivalent. Equation (A3) (applied twice) implies that we must have

co (supp ({MP}_{W^{'}})) = co (supp ({MP}_{W}))

which implies that

supp ({MP}_{W^{'}})

and

supp ({MP}_{W})

have the same convex-extreme points. Now fix a convex-extreme point

p_{e x t} \in CE (supp ({MP}_{W^{'}})) = CE (supp ({MP}_{W}))

. Equation (A4) (applied twice) implies that

{MP}_{W} (p_{e x t}) = {MP}_{W^{'}} (p_{e x t})

. By using Equation (A4) again we obtain:

\sum_{z \in Z_{p_{e x t}}} \sum_{y \in Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y) = \sum_{z \in Im (W^{'})} \sum_{y \in Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y),

hence

\sum_{z \in Im (W^{'}) ∖ Z_{p_{e x t}}} \sum_{y \in Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y) = 0 .

However,

P_{W}^{o} (y) > 0

for every

y \in Y_{p_{e x t}}

. Therefore, for every

z \in Im (W^{'}) ∖ Z_{p_{e x t}}

and every

y \in Y_{p_{e x t}}

, we must have

V (z | y) = 0

(which implies that

V^{- 1} (y | z) = 0

). We conclude that for every

z \in Im (W^{'}) ∖ Z_{p_{e x t}}

, we can rewrite Equations (A1) and (A2) as:

P_{W^{'}}^{o} (z) = \sum_{y \in Im (W) ∖ Y_{p_{e x t}}} V (z | y) P_{W}^{o} (y),

and

{W_{z}^{'}}^{- 1} = \sum_{y \in Im (W) ∖ Y_{p_{e x t}}} V^{- 1} (y | z) W_{y}^{- 1} .

We can now repeat the above argument but on

supp ({MP}_{W}) ∖ {p_{e x t}}

and

supp ({MP}_{W^{'}}) ∖ {p_{e x t}}

instead of

supp ({MP}_{W})

and

supp ({MP}_{W^{'}})

. We deduce that

co (supp ({MP}_{W}) ∖ {p_{e x t}}) = co (supp ({MP}_{W^{'}}) ∖ {p_{e x t}})

so

supp ({MP}_{W}) ∖ {p_{e x t}}

and

supp ({MP}_{W^{'}}) ∖ {p_{e x t}}

have the same convex-extreme points. We can also prove that

{MP}_{W} (p_{e x t}^{'}) = {MP}_{W^{'}} (p_{e x t}^{'})

for every

p_{e x t}^{'} \in CE (supp ({MP}_{W^{'}}) ∖ {p_{e x t}}) = CE (supp ({MP}_{W}) ∖ {p_{e x t}})

.

Notice that any point of

supp ({MP}_{W})

(respectively

supp ({MP}_{W^{'}})

) becomes convex-extreme after removing a finite number of elements from

supp ({MP}_{W})

(respectively

supp ({MP}_{W^{'}})

). Therefore, after inductively applying the above argument a finite number of times, we can deduce that

supp ({MP}_{W}) = supp ({MP}_{W^{'}})

and

{MP}_{W} (p) = {MP}_{W^{'}} (p)

for every

p \in supp ({MP}_{W}) = supp ({MP}_{W^{'}})

, hence

{MP}_{W} = {MP}_{W^{'}}

.

Now let

W \in {DMC}_{X, Y}

and

W^{'} \in {DMC}_{X, Z}

be any two channels satisfying

{MP}_{W} = {MP}_{W^{'}}

. We have

supp ({MP}_{W}) = supp ({MP}_{W^{'}})

, and for every

p \in supp ({MP}_{W}) = supp ({MP}_{W^{'}})

, we have

\sum_{y \in Y_{p}} P_{W}^{o} (y) = {MP}_{W} (p) = {MP}_{W^{'}} (p) = \sum_{z \in Z_{p}} P_{W^{'}}^{o} (z) .

Define the channel

V \in {DMC}_{Y, Z}

as

V (z | y) = \{\begin{matrix} \frac{1}{| Z |} & if y \notin Im (W), \\ \frac{P_{W^{'}}^{o} (z)}{{MP}_{W^{'}} (W_{y}^{- 1})} & if y \in Im (W) and z \in Z_{W_{y}^{- 1}}, \\ 0 & otherwise . \end{matrix}

A simple calculation shows that

\sum_{z \in Z} V (z | y) = 1

for every

y \in Y

, so V is a valid channel.

Notice that for every

(y, z) \in Im (W) \times Im (W^{'})

, we have:

z \in Z_{W_{y}^{- 1}} \Leftrightarrow {W_{z}^{'}}^{- 1} = W_{y}^{- 1} \Leftrightarrow y \in Y_{{W_{z}^{'}}^{- 1}} .

Moreover, if

z \in Im (W^{'}) and y \in Y_{{W_{z}^{'}}^{- 1}}

, we have

{MP}_{W^{'}} (W_{y}^{- 1}) = {MP}_{W} ({W_{z}^{'}}^{- 1})

. Therefore, we can rewrite V as:

V (z | y) = \{\begin{matrix} \frac{P_{W^{'}}^{o} (z)}{{MP}_{W} (W_{z}^{' - 1})} & if z \in Im (W^{'}) and y \in Y_{{W_{z}^{'}}^{- 1}}, \\ \frac{1}{| Z |} & if y \notin Im (W), \\ 0 & otherwise . \end{matrix}

Let

W^{″} = V \circ W \in {DMC}_{X, Z}

. For every

z \in Z ∖ Im (W^{'})

, Equation (A1) implies that:

\begin{matrix} P_{W^{″}}^{o} (z) & = \sum_{y \in Im (W)} V (z | y) P_{W}^{o} (y) \overset{(a)}{=} 0 = P_{W^{'}}^{o} (z), \end{matrix}

where (a) follows from the fact that

V (z | y) = 0

if

y \in Im (W)

and

z \notin Im (W^{'})

.

On the other hand, for every

z \in Im (W^{'})

, Equation (A1) implies that:

\begin{matrix} P_{W^{″}}^{o} (z) & = \sum_{y \in Im (W)} V (z | y) P_{W}^{o} (y) = \sum_{y \in Y_{{W_{z}^{'}}^{- 1}}} \frac{P_{W^{'}}^{o} (z)}{{MP}_{W} ({W_{z}^{'}}^{- 1})} P_{W}^{o} (y) \\ = \frac{P_{W^{'}}^{o} (z)}{{MP}_{W} ({W_{z}^{'}}^{- 1})} \sum_{y \in Y_{{W_{z}^{'}}^{- 1}}} P_{W}^{o} (y) = \frac{P_{W^{'}}^{o} (z)}{{MP}_{W} ({W_{z}^{'}}^{- 1})} {MP}_{W} ({W_{z}^{'}}^{- 1}) = P_{W^{'}}^{o} (z) . \end{matrix}

Therefore,

P_{W^{″}}^{o} (z) = P_{W^{'}}^{o} (z)

for every

z \in Z

, which implies that

Im (W^{″}) = Im (W^{'})

.

Now define

V^{- 1} \in {DMC}_{Im (W^{″}), Im (W)}

as

V^{- 1} (y | z) = \frac{V (z | y) P_{W}^{o} (y)}{\sum_{y^{'} \in Im (W)} V (z | y^{'}) P_{W}^{o} (y^{'})} .

Equation (A2) implies that for every

z \in Im (W^{″}) = Im (W^{'})

, we have:

\begin{matrix} {W_{z}^{″}}^{- 1} & = \sum_{y \in Im (W)} V^{- 1} (y | z) W_{y}^{- 1} \overset{(a)}{=} \sum_{y \in Y_{{W_{z}^{'}}^{- 1}}} V^{- 1} (y | z) W_{y}^{- 1} \\ = \sum_{y \in Y_{{W_{z}^{'}}^{- 1}}} V^{- 1} (y | z) {W_{z}^{'}}^{- 1} \overset{(b)}{=} \sum_{y \in Im (W)} V^{- 1} (y | z) {W_{z}^{'}}^{- 1} = {W_{z}^{'}}^{- 1}, \end{matrix}

where (a) and (b) follow from the fact that for every

(y, z) \in Im (W) \times Im (W^{″})

, we have

V^{- 1} (y | z) = 0

if and only if

V (z | y) = 0

.

We conclude that

P_{W^{″}}^{o} = P_{W^{'}}^{o}

, and for every

z \in Im (W^{″}) = Im (W^{'})

, we have

{W_{z}^{″}}^{- 1} = {W_{z}^{'}}^{- 1}

. Therefore,

W^{'} = W^{″} = V \circ W

and so

W^{'}

is degraded from W. By exchanging the roles of W and

W^{'}

we get that W is also degraded from

W^{'}

, hence W and

W^{'}

are equivalent.

Appendix B. Proof of Lemma 3

We need the following lemma:

Lemma 8.

The relation

R_{X, Y}^{(o)}

is closed in

{DMC}_{X, Y} \times {DMC}_{X, Y}

.

Proof.

Define the mapping

f : {({DMC}_{X, Y})}^{2} \times {({DMC}_{Y, Y})}^{2} \to {({DMC}_{X, Y})}^{4}

as:

f (W, W^{'}, V, V^{'}) = (W, V^{'} \circ W^{'}, W^{'}, V \circ W) .

f is continuous because channel composition is continuous.

Define the set

A \subset {({DMC}_{X, Y})}^{4}

as:

A : = {(W, W, W^{'}, W^{'}) : W, W^{'} \in {DMC}_{X, Y}} .

It is easy to see that A is a closed subset of

{({DMC}_{X, Y})}^{4}

. We have:

f^{- 1} (A) = {(W, W^{'}, V, V^{'}) \in {({DMC}_{X, Y})}^{2} \times {({DMC}_{Y, Y})}^{2} : V^{'} \circ W^{'} = W, V \circ W = W^{'}} .

Since f is continuous and since A is a closed subset of

{({DMC}_{X, Y})}^{4}

,

f^{- 1} (A)

is a closed subset of

{({DMC}_{X, Y})}^{2} \times {({DMC}_{Y, Y})}^{2}

which is compact. Therefore,

f^{- 1} (A)

is compact.

Now define the mapping

g : {({DMC}_{X, Y})}^{2} \times {({DMC}_{Y, Y})}^{2} \to {({DMC}_{X, Y})}^{2}

as follows:

g (W, W^{'}, V, V^{'}) = (W, W^{'}) .

Since g is continuous and since

f^{- 1} (A)

is compact,

g (f^{- 1} (A))

is a compact subset of

{DMC}_{X, Y}^{2}

. Now notice that

\begin{matrix} g (f^{- 1} (A)) & = \{(W, W^{'}) \in {({DMC}_{X, Y})}^{2} : \exists V, V^{'} \in {DMC}_{Y, Y}, (W, W^{'}, V, V^{'}) \in f^{- 1} (A)\} \\ = \{(W, W^{'}) \in {({DMC}_{X, Y})}^{2} : \exists V, V^{'} \in {DMC}_{Y, Y}, V^{'} \circ W^{'} = W, V \circ W = W^{'}\} \\ = \{(W, W^{'}) \in {({DMC}_{X, Y})}^{2} : W is equivalent to W^{'}\} \\ = \{(W, W^{'}) \in {({DMC}_{X, Y})}^{2} : W R_{X, Y}^{(o)} W^{'}\} = R_{X, Y}^{(o)} . \end{matrix}

We conclude that

R_{X, Y}^{(o)}

is compact, hence it is also closed because

{({DMC}_{X, Y})}^{2}

is a metric space. □

Now we are ready to prove Lemma 3:

Let

Proj : {DMC}_{X, Y} \to {DMC}_{X, Y}^{(o)}

be defined as

Proj (W) = \hat{W}

. The continuity of Proj follows from the definition of the quotient topology.

Now let A be a closed subset of

{DMC}_{X, Y}

. We want to show that

Proj (A)

is closed.

Since A is closed in

{DMC}_{X, Y}

, the set

{DMC}_{X, Y} \times A

is closed in

{({DMC}_{X, Y})}^{2}

. On the other hand,

R_{X, Y}^{(o)}

is closed in

{({DMC}_{X, Y})}^{2}

by Lemma 8. Therefore,

({DMC}_{X, Y} \times A) \cap R_{X, Y}^{(o)}

is closed in

{({DMC}_{X, Y})}^{2}

which is compact, hence

({DMC}_{X, Y} \times A) \cap R_{X, Y}^{(o)}

is compact. We have:

\begin{matrix} ({DMC}_{X, Y} \times A) \cap R_{X, Y}^{(o)} & = \{(W, W^{'}) \in {({DMC}_{X, Y})}^{2} : W R_{X, Y}^{(o)} W^{'} and W^{'} \in A\} . \end{matrix}

Now define the mapping

g : {({DMC}_{X, Y})}^{2} \to {DMC}_{X, Y}

as

g (W, W^{'}) = W .

Let

A_{R} : = g (({DMC}_{X, Y} \times A) \cap R_{X, Y}^{(o)})

. Since g is continuous and since

({DMC}_{X, Y} \times A) \cap R_{X, Y}^{(o)}

is compact,

A_{R}

is also compact. We have:

\begin{matrix} A_{R} & = \{W \in {DMC}_{X, Y} : \exists W^{'} \in A, W R_{X, Y}^{(o)} W^{'}\} = {Proj}^{- 1} (Proj (A)) . \end{matrix}

Since

{DMC}_{X, Y}

is a metric space and since

A_{R}

is compact,

{Proj}^{- 1} (Proj (A)) = A_{R}

is closed in

{DMC}_{X, Y}

. On the other hand, we have

{Proj}^{- 1} ({DMC}_{X, Y}^{(o)} ∖ Proj (A)) = {DMC}_{X, Y} ∖ {Proj}^{- 1} (Proj (A))

, hence

{Proj}^{- 1} ({DMC}_{X, Y}^{(o)} ∖ Proj (A))

is open in

{DMC}_{X, Y}

, which implies that

{DMC}_{X, Y}^{(o)} ∖ Proj (A)

is open in

{DMC}_{X, Y}^{(o)}

. Therefore,

Proj (A)

is closed in

{DMC}_{X, Y}^{(o)}

.

Appendix C. Proof of Proposition 4

Let

\hat{U}

be an arbitrary non-empty open subset of

({DMC}_{X, [m]}^{(o)}, T_{X, [m]}^{(o)})

and let Proj be the projection onto the

R_{X, [m]}^{(o)}

-equivalence classes.

{Proj}^{- 1} (\hat{U})

is open in the metric space

({DMC}_{X, [m]}, d_{X, [m]})

. Therefore, there exists

W \in {DMC}_{X, [m]}

and

ϵ > 0

such that

{Proj}^{- 1} (\hat{U})

contains the open ball of center W and radius

ϵ

.

We will show that there exists

W^{'} \in {DMC}_{X, [m]}

such that

rank (W^{'}) = m > n

and

d_{X, [m]} (W, W^{'}) < ϵ

. If

rank (W) = m

, take

W^{'} = W

.

Assume that

rank (W) < m

. Let

P_{W}^{o} \in Δ_{[m]}

,

Im (W)

and

{W_{y}^{- 1} : y \in Im (W)}

be as in Section 5.

Let

{v_{y}}_{y \in [m]}

be a collection of m vectors in

R^{X}

such that:

$\sum_{y \in Im (W)} P_{W}^{o} (y) \cdot v_{y} = 0$ .
$\sum_{y \in [m] ∖ Im (W)} v_{y} = 0$ .
For every $y \in [m]$ , $\sum_{x \in X} v_{y} (x) = 0$ .
The vectors ${v_{y}}_{y \in [m]}$ are pairwise different.

Such collection can always be found.

Let

0 < δ, δ^{'} < 1

and define

P_{W^{'}}^{o} \in R^{[m]}

as follows:

P_{W^{'}}^{o} (y) = \{\begin{matrix} (1 - δ) P_{W}^{o} (y) & if y \in Im (W), \\ \frac{δ}{m - | Im (W) |} & otherwise . \end{matrix}

Clearly,

P_{W^{'}}^{o} \in Δ_{[m]}

and

P_{W^{'}}^{o} (y) > 0

for every

y \in [m]

. Now for every

y \in [m]

, define

{W_{y}^{'}}^{- 1}

as follows:

\begin{matrix} {W_{y}^{'}}^{- 1} = \{\begin{matrix} (1 - δ) W_{y}^{- 1} + δ π_{X} + δ^{'} v_{y} & if y \in Im (W), \\ π_{X} + δ^{'} v_{y} & otherwise, \end{matrix} \end{matrix}

where

π_{X} \in Δ_{X}

is the uniform probability distribution on

X

. A simple calculation shows that

\sum_{y \in [m]} P_{W^{'}}^{o} (y) {W_{y}^{'}}^{- 1} = π_{X}

, and for every

y \in [m]

we have

\sum_{x \in X} {W_{y}^{'}}^{- 1} (x) = 1

.

Notice that for

y \in Im (W)

, since

0 < δ < 1

,

(1 - δ) W_{y}^{- 1} + δ π_{X}

lies inside the interior of the probability distribution simplex

Δ_{X}

. This means that for

δ^{'}

small enough,

(1 - δ) W_{y}^{- 1} + δ π_{X} + δ^{'} v_{y} \in Δ_{X}

for every

y \in Im (W)

, and

π_{X} + δ^{'} v_{y} \in Δ_{X}

for every

y \notin Im (W)

. For every

0 < δ < 1

, choose

δ^{'} : = δ^{'} (δ)

so that

0 < δ^{'} < δ

and

{W_{y}^{'}}^{- 1} \in Δ_{X}

for every

y \in [m]

.

It is easy to see that for

δ

small enough,

{W_{y_{1}}^{'}}^{- 1} \neq {W_{y_{2}}^{'}}^{- 1}

for every

y_{1}, y_{2} \in [m]

satisfying

y_{1} \neq y_{2}

. Define the channel

W^{'} \in {DMC}_{X, [m]}

as follows:

W^{'} (y | x) = | X | P_{W^{'}}^{o} (y) {W_{y}^{'}}^{- 1} (x) .

Since

P_{W^{'}}^{o} (y) > 0

for every

y \in [m]

, we have

supp ({MP}_{W^{'}}) = {{W_{y}^{'}}^{- 1} : y \in [m]}

. Therefore, there exists

δ_{0} > 0

such for every

0 < δ < δ_{0}

, we have

rank (W^{'}) = m

. On the other hand, we have

lim_{δ \to 0} P_{W^{'}}^{o} = P_{W}^{o}

and

lim_{δ \to 0} {W_{y}^{'}}^{- 1} = W_{y}^{- 1}

for every

y \in Im (W)

. Therefore,

lim_{δ \to 0} W^{'} = W

(where the limit is taken in

({DMC}_{X, [m]}, d_{X, [m]})

). This shows that there exists

W^{'} \in {DMC}_{X, [m]}

such that

rank (W^{'}) = m > n

and

d_{X, [m]} (W, W^{'}) < ϵ

, which means that

W^{'} \in {Proj}^{- 1} (\hat{U})

and

W^{'}

is not equivalent to any channel in

{DMC}_{X, [n]}

(see Corollary 1). Therefore,

Proj (W^{'}) \in \hat{U}

and

Proj (W^{'}) \notin {DMC}_{X, [n]}^{(o)}

because

W^{'}

is not equivalent to any channel in

{DMC}_{X, [n]}

. This shows that every non-empty open subset of

{DMC}_{X, [m]}^{(o)}

is not contained in

{DMC}_{X, [n]}^{(o)}

. We conclude that the interior of

{DMC}_{X, [n]}^{(o)}

in

{DMC}_{X, [m]}^{(o)}

is empty.

Appendix D. Proof of Lemma 5

Define

{DMC}_{X, [0]}^{(o)} = \emptyset

, which is strongly closed in

{DMC}_{X, *}^{(o)}

.

Let A and B be two disjoint strongly closed subsets of

{DMC}_{X, *}^{(o)}

. For every

n \geq 0

, let

A_{n} = A \cap {DMC}_{X, [n]}^{(o)}

and

B_{n} = B \cap {DMC}_{X, [n]}^{(o)}

. Since A and B are strongly closed in

{DMC}_{X, *}^{(o)}

,

A_{n}

and

B_{n}

are closed in

{DMC}_{X, [n]}^{(o)}

. Moreover,

A_{n} \cap B_{n} \subset A \cap B = \emptyset

.

Construct the sequences

{(U_{n})}_{n \geq 0}, {(U_{n}^{'})}_{n \geq 0}, {(K_{n})}_{n \geq 0}

and

{(K_{n}^{'})}_{n \geq 0}

recursively as follows:

U_{0} = U_{0}^{'} = K_{0} = K_{0}^{'} = \emptyset \subset {DMC}_{X, [0]}^{(o)}

. Since

A_{0} = B_{0} = \emptyset

, we have

A_{0} \subset U_{0} \subset K_{0}

and

B_{0} \subset U_{0}^{'} \subset K_{0}^{'}

. Moreover,

U_{0}

and

U_{0}^{'}

are open in

{DMC}_{X, [0]}^{(o)}

,

K_{0}

and

K_{0}^{'}

are closed in

{DMC}_{X, [0]}^{(o)}

, and

K_{0} \cap K_{0}^{'} = \emptyset

.

Now let

n \geq 1

and assume that we constructed

{(U_{i})}_{0 \leq i < n}, {(U_{i}^{'})}_{0 \leq i < n}, {(K_{i})}_{0 \leq i < n}

and

{(K_{i}^{'})}_{0 \leq i < n}

such that for every

0 \leq i < n

, we have

A_{i} \subset U_{i} \subset K_{i} \subset {DMC}_{X, [i]}^{(o)}

,

B_{i} \subset U_{i}^{'} \subset K_{i}^{'} \subset {DMC}_{X, [i]}^{(o)}

,

U_{i}

and

U_{i}^{'}

are open in

{DMC}_{X, [i]}^{(o)}

,

K_{i}

and

K_{i}^{'}

are closed in

{DMC}_{X, [i]}^{(o)}

, and

K_{i} \cap K_{i}^{'} = \emptyset

. Moreover, assume that

K_{i} \subset U_{i + 1}

and

K_{i}^{'} \subset U_{i + 1}^{'}

for every

0 \leq i < n - 1

.

Let

C_{n} = A_{n} \cup K_{n - 1}

and

D_{n} = B_{n} \cup K_{n - 1}^{'}

. Since

K_{n - 1}

and

K_{n - 1}^{'}

are closed in

{DMC}_{X, [n - 1]}^{(o)}

and since

{DMC}_{X, [n - 1]}^{(o)}

is closed in

{DMC}_{X, [n]}^{(o)}

, we can see that

K_{n - 1}

and

K_{n - 1}^{'}

are closed in

{DMC}_{X, [n]}^{(o)}

. Therefore,

C_{n}

and

D_{n}

are closed in

{DMC}_{X, [n]}^{(o)}

. Moreover, we have

\begin{matrix} C_{n} \cap D_{n} & = (A_{n} \cup K_{n - 1}) \cap (B_{n} \cup K_{n - 1}^{'}) \\ = (A_{n} \cap B_{n}) \cup (A_{n} \cap K_{n - 1}^{'}) \cup (K_{n - 1} \cap B_{n}) \cup (K_{n - 1} \cap K_{n - 1}^{'}) \\ \overset{(a)}{=} (A_{n} \cap K_{n - 1}^{'} \cap {DMC}_{X, [n - 1]}^{(o)}) \cup (K_{n - 1} \cap {DMC}_{X, [n - 1]}^{(o)} \cap B_{n}) \\ = (A_{n - 1} \cap K_{n - 1}^{'}) \cup (K_{n - 1} \cap B_{n - 1}) \subset (K_{n - 1} \cap K_{n - 1}^{'}) \cup (K_{n - 1} \cap K_{n - 1}^{'}) = \emptyset, \end{matrix}

where (a) follows from the fact that

A_{n} \cap B_{n} = K_{n - 1} \cap K_{n - 1}^{'} = \emptyset

and the fact that

K_{n - 1} \subset {DMC}_{X, [n - 1]}^{(o)}

and

K_{n - 1}^{'} \subset {DMC}_{X, [n - 1]}^{(o)}

.

Since

{DMC}_{X, [n]}^{(o)}

is normal (because it is metrizable), and since

C_{n}

and

D_{n}

are closed disjoint subsets of

{DMC}_{X, [n]}^{(o)}

, there exist two sets

U_{n}, U_{n}^{'} \subset {DMC}_{X, [n]}^{(o)}

that are open in

{DMC}_{X, [n]}^{(o)}

and two sets

K_{n}, K_{n}^{'} \subset {DMC}_{X, [n]}^{(o)}

that are closed in

{DMC}_{X, [n]}^{(o)}

such that

C_{n} \subset U_{n} \subset K_{n}

,

D_{n} \subset U_{n}^{'} \subset K_{n}^{'}

and

K_{n} \cap K_{n}^{'} = \emptyset

. Clearly,

A_{n} \subset U_{n} \subset K_{n} \subset {DMC}_{X, [n]}^{(o)}

,

B_{n} \subset U_{n}^{'} \subset K_{n}^{'} \subset {DMC}_{X, [n]}^{(o)}

,

K_{n - 1} \subset U_{n}

and

K_{n - 1}^{'} \subset U_{n}^{'}

. This concludes the recursive construction.

Now define

U = ⋃_{n \geq 0} U_{n} = ⋃_{n \geq 1} U_{n}

and

U^{'} = ⋃_{n \geq 0} U_{n}^{'} = ⋃_{n \geq 1} U_{n}^{'}

. Since

A_{n} \subset U_{n}

for every

n \geq 1

, we have

\begin{matrix} A = A \cap {DMC}_{X, *}^{(o)} = A \cap (⋃_{n \geq 1} {DMC}_{X, [n]}^{(o)}) = ⋃_{n \geq 1} (A \cap {DMC}_{X, [n]}^{(o)}) = ⋃_{n \geq 1} A_{n} \subset ⋃_{n \geq 1} U_{n} = U . \end{matrix}

Moreover, for every

n \geq 1

we have

\begin{matrix} U \cap {DMC}_{X, [n]}^{(o)} = (⋃_{i \geq 1} U_{i}) \cap {DMC}_{X, [n]}^{(o)} \overset{(a)}{=} (⋃_{i \geq n} U_{i}) \cap {DMC}_{X, [n]}^{(o)} = ⋃_{i \geq n} (U_{i} \cap {DMC}_{X, [n]}^{(o)}), \end{matrix}

where (a) follows from the fact that

U_{i} \subset K_{i} \subset U_{i + 1}

for every

i \geq 0

, which means that the sequence

{(U_{i})}_{i \geq 1}

is increasing.

For every

i \geq n

, we have

{DMC}_{X, [n]}^{(o)} \subset {DMC}_{X, [i]}^{(o)}

and

U_{i}

is open in

{DMC}_{X, [i]}^{(o)}

, hence

U_{i} \cap {DMC}_{X, [n]}^{(o)}

is open in

{DMC}_{X, [n]}^{(o)}

. Therefore,

U \cap {DMC}_{X, [n]}^{(o)} = ⋃_{i \geq n} (U_{i} \cap {DMC}_{X, [n]}^{(o)})

is open in

{DMC}_{X, [n]}^{(o)}

. Since this is true for every

n \geq 1

, we conclude that U is strongly open in

{DMC}_{X, *}^{(o)}

.

We can similarly show that

B \subset U^{'}

and that

U^{'}

is strongly open in

{DMC}_{X, *}^{(o)}

. Finally, we have

\begin{matrix} U \cap U^{'} = (⋃_{n \geq 1} U_{n}) \cap (⋃_{n^{'} \geq 1} U_{n^{'}}^{'}) = ⋃_{n \geq 1, n^{'} \geq 1} (U_{n} \cap U_{n^{'}}^{'}) \overset{(a)}{=} ⋃_{n \geq 1} (U_{n} \cap U_{n}^{'}) & \subset ⋃_{n \geq 1} (K_{n} \cap K_{n}^{'}) = \emptyset, \end{matrix}

where (a) follows from the fact that for every

n \geq 1

and every

n^{'} \geq 1

, we have

U_{n} \cap U_{n^{'}}^{'} \subset U_{max {n, n^{'}}} \cap U_{max {n, n^{'}}}^{'}

because

{(U_{n})}_{n \geq 1}

and

{(U_{n}^{'})}_{n \geq 1}

are increasing. We conclude that

({DMC}_{X, *}^{(o)}, T_{s, X, *}^{(o)})

is normal.

Appendix E. Proof of Lemma 6

Let

W_{1}, W_{2} \in {DMC}_{X, Y}

, and let

{\hat{W}}_{1}

and

{\hat{W}}_{2}

be the

R_{X, Y}^{(o)}

-equivalence classes of

W_{1}

and

W_{2}

respectively.

Fix

m \geq 1

,

p \in Δ_{[m] \times X}

and

D \in {DMC}_{Y, [m]}

. We have:

\begin{matrix} \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} & p (u, x) W_{1} (y | x) D (u | y) \\ = (\sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) W_{2} (y | x) D (u | y)) + \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) \cdot (W_{1} (y | x) - W_{2} (y | x)) \cdot D (u | y) \\ \leq (sup_{D^{'} \in {DMC}_{Y, [m]}} \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) W_{2} (y | x) D^{'} (u | y)) + \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) \cdot (W_{1} (y | x) - W_{2} (y | x)) \cdot D (u | y) \\ \leq P_{c} (p, W_{2}) + \sum_{\begin{matrix} u \in [m], \\ x \in X \end{matrix}} p (u, x) \cdot \sum_{\begin{matrix} y \in Y : \\ W_{1} (y | x) > W_{2} (y | x) \end{matrix}} (W_{1} (y | x) - W_{2} (y | x)) \cdot (\sum_{u^{'} \in [m]} D (u^{'} | y)) \\ = P_{c} (p, W_{2}) + \sum_{\begin{matrix} u \in [m], \\ x \in X \end{matrix}} p (u, x) \cdot (\sum_{\begin{matrix} y \in Y : \\ W_{1} (y | x) > W_{2} (y | x) \end{matrix}} (W_{1} (y | x) - W_{2} (y | x))) \\ \overset{(a)}{\leq} P_{c} (p, W_{2}) + \sum_{\begin{matrix} u \in [m], \\ x \in X \end{matrix}} p (u, x) \cdot d_{X, Y} (W_{1}, W_{2}) = P_{c} (p, W_{2}) + d_{X, Y} (W_{1}, W_{2}), \end{matrix}

where (a) follows from the fact that

\begin{matrix} \sum_{\begin{matrix} y \in Y : \\ W_{1} (y | x) > W_{2} (y | x) \end{matrix}} (W_{1} (y | x) - W_{2} (y | x)) & = \frac{1}{2} \sum_{y \in Y} | W_{1} (y | x) - W_{2} (y | x) | \\ \leq \frac{1}{2} sup_{x \in X} \sum_{y \in Y} | W_{1} (y | x) - W_{2} (y | x) | = d_{X, Y} (W_{1}, W_{2}) . \end{matrix}

Therefore,

P_{c} (p, W_{1}) = sup_{D \in {DMC}_{Y, [m]}} \sum_{\begin{matrix} u \in [m], \\ x \in X, \\ y \in Y \end{matrix}} p (u, x) W_{1} (y | x) D (u | y) \leq P_{c} (p, W_{2}) + d_{X, Y} (W_{1}, W_{2}) .

Similarly, we can show that

P_{c} (p, W_{2}) \leq P_{c} (p, W_{1}) + d_{X, Y} (W_{1}, W_{2})

, hence

| P_{c} (p, W_{1}) - P_{c} (p, W_{2}) | \leq d_{X, Y} (W_{1}, W_{2}) .

We conclude that

\begin{matrix} d_{X, Y}^{(o)} ({\hat{W}}_{1}, {\hat{W}}_{2}) & = sup_{\begin{matrix} m \geq 1, \\ p \in Δ_{[m] \times X} \end{matrix}} | P_{c} (p, {\hat{W}}_{1}) - P_{c} (p, {\hat{W}}_{2}) | \\ = sup_{\begin{matrix} m \geq 1, \\ p \in Δ_{[m] \times X} \end{matrix}} | P_{c} (p, W_{1}) - P_{c} (p, W_{2}) | \\ \leq d_{X, Y} (W_{1}, W_{2}) . \end{matrix}

Appendix F. Proof of Lemma 7

Let

γ \in Γ ({MP}_{\hat{W}}, {MP}_{{\hat{W}}^{'}})

be a measure on

Δ_{X} \times Δ_{X}

that couples

{MP}_{\hat{W}}

and

{MP}_{{\hat{W}}^{'}}

.

Let

S = supp ({MP}_{\hat{W}})

and

S^{'} = supp ({MP}_{{\hat{W}}^{'}})

be the supports of

\hat{W}

and

{\hat{W}}^{'}

respectively. Since

{MP}_{\hat{W}}

and

{MP}_{{\hat{W}}^{'}}

are finitely supported,

γ

is also finitely supported and its support is a subset of

S \times S^{'}

. Therefore, there exists a collection of coefficients

α_{p, p^{'}} \in [0, 1]

such that

γ = \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} δ_{(p, p^{'})},

where

δ_{(p, p^{'})}

is a Dirac measure centered at

(p, p^{'}) \in Δ_{X} \times Δ_{X}

. Since

{MP}_{\hat{W}}

and

{MP}_{{\hat{W}}^{'}}

are the marginals of

γ

on the first and the second factors respectively, we have

{MP}_{\hat{W}} (p) = \sum_{p^{'} \in S^{'}} α_{p, p^{'}}

for every

p \in S

. Similarly,

{MP}_{{\hat{W}}^{'}} (p^{'}) = \sum_{p \in S} α_{p, p^{'}}

for every

p^{'} \in S^{'}

.

Let

Y = S \times S^{'}

and define the channels

W, W^{'} \in {DMC}_{X, Y}

as:

W (p, p^{'} | x) = | X | α_{p, p^{'}} \cdot p (x),

and

W^{'} (p, p^{'} | x) = | X | α_{p, p^{'}} \cdot p^{'} (x) .

For every

x \in X

, we have

\begin{matrix} \sum_{(p, p^{'}) \in Y} W (p, p^{'} | x) & = | X | \sum_{(p, p^{'}) \in S \times S^{'}} α_{p, p^{'}} \cdot p (x) = | X | \sum_{p \in S} {MP}_{\hat{W}} (p) \cdot p (x) \\ = | X | \int_{Δ_{X}} p (x) \cdot d {MP}_{\hat{W}} (p) = | X | \frac{1}{| X |} = 1 . \end{matrix}

Similarly,

\sum_{(p, p^{'}) \in Y} W^{'} (p, p^{'} | x) = 1

. Therefore, W and

W^{'}

are valid channels.

For every

(p, p^{'}) \in Y

, we have

P_{W}^{o} (p, p^{'}) = \sum_{x \in X} \frac{1}{| X |} W (p, p^{'} | x) = \sum_{x \in X} α_{p, p^{'}} \cdot p (x) = α_{p, p^{'}} .

Therefore,

Im (W) = {(p, p^{'}) \in Y : α_{p, p^{'}} > 0}

. For every

(p, p^{'}) \in Im (W)

and every

x \in X

, we have:

W_{p, p^{'}}^{- 1} (x) = \frac{W (p, p^{'} | x)}{| X | P_{W}^{o} (p, p^{'})} = \frac{| X | α_{p, p^{'}} \cdot p (x)}{| X | α_{p, p^{'}}} = p (x),

hence

W_{p, p^{'}}^{- 1} = p

for every

(p, p^{'}) \in Im (W)

, which shows that

supp ({MP}_{W}) \subset S

. Similarly, we can show that

Im (W^{'}) = {(p, p^{'}) \in Y : α_{p, p^{'}} > 0},

supp ({MP}_{W^{'}}) \subset S^{'}

, and for every

(p, p^{'}) \in Y

,

P_{W^{'}}^{o} (p, p^{'}) = α_{p, p^{'}}

and

{W_{p, p^{'}}^{'}}^{- 1} = p^{'}

.

For every

p \in S

, we have:

{MP}_{W} (p) = \sum_{\begin{matrix} y \in Im (W), \\ W_{y}^{- 1} = p \end{matrix}} P_{W}^{o} (y) = \sum_{\begin{matrix} p^{'} \in S^{'}, \\ α_{p, p^{'}} > 0 \end{matrix}} α_{p, p^{'}} = \sum_{p^{'} \in S^{'}} α_{p, p^{'}} = {MP}_{\hat{W}} (p) > 0 .

This shows that

supp ({MP}_{W}) = S = supp ({MP}_{\hat{W}})

and

{MP}_{W} (p) = {MP}_{\hat{W}} (p)

for every

p \in S

. Therefore,

{MP}_{W} = {MP}_{\hat{W}}

and so W is equivalent to every channel in

\hat{W}

. Similarly, we can show that

{MP}_{W^{'}} = {MP}_{{\hat{W}}^{'}}

and

W^{'}

is equivalent to every channel in

{\hat{W}}^{'}

.

Let

\tilde{W}

and

{\tilde{W}}^{'}

be the

R_{X, Y}^{(o)}

-equivalence classes of W and

W^{'}

respectively. We can write

\hat{W} = \tilde{W}

and

{\hat{W}}^{'} = {\tilde{W}}^{'}

because of the canonical identification of

{DMC}_{X, Y}^{(o)}

with

{DMC}_{X, [n]}^{(o)}

, where

n = | Y |

. We have:

\begin{matrix} d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) & = d_{X, Y}^{(o)} (\tilde{W}, {\tilde{W}}^{'}) \overset{(a)}{\leq} d_{X, Y} (W, W^{'}) = \frac{1}{2} max_{x \in X} \sum_{(p, p^{'}) \in Y} | W (p, p^{'} | x) - W^{'} (p, p^{'} | x) | \\ = \frac{1}{2} max_{x \in X} \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} | | X | α_{p, p^{'}} \cdot p (x) - | X | α_{p, p^{'}} \cdot p^{'} (x) | = \frac{1}{2} | X | max_{x \in X} \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} \cdot | p (x) - p^{'} (x) | \\ \leq \frac{1}{2} | X | \sum_{x \in X} \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} \cdot | p (x) - p^{'} (x) | = \frac{1}{2} | X | \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} \sum_{x \in X} | p (x) - p^{'} (x) | \\ = \frac{1}{2} | X | \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} ∥ p - p^{'} ∥_{1} = | X | \sum_{\begin{matrix} p \in S, \\ p^{'} \in S^{'} \end{matrix}} α_{p, p^{'}} d (p, p^{'}) = | X | \int_{Δ_{X} \times Δ_{X}} d (p, p^{'}) \cdot d γ (p, p^{'}), \end{matrix}

where (a) follows from Lemma 6, and

d (p, p^{'}) = \frac{1}{2} {∥ p - p^{'} ∥}_{1}

is the total variation distance between p and

p^{'}

. Therefore,

\begin{matrix} d_{X, *}^{(o)} (\hat{W}, {\hat{W}}^{'}) \leq | X | inf_{γ \in Γ ({MP}_{\hat{W}}, {MP}_{{\hat{W}}^{'}})} \int_{Δ_{X} \times Δ_{X}} d (p, p^{'}) \cdot d γ (p, p^{'}) = | X | \cdot W_{1} ({MP}_{\hat{W}}, {MP}_{{\hat{W}}^{'}}) . \end{matrix}

Appendix G. Proof of Proposition 12

If

| X | = 1

,

Δ_{X}

consists of a single probability distribution and

MP (X)

consists of a single meta-probability measure which is balanced and finitely supported, so

MP (X) = {MP}_{b} (X) = {MP}_{b f} (X)

.

Now assume that

| X | \geq 2

. We start by showing that

{MP}_{b} (X)

is weakly-* closed.

For every

x \in X

. Consider the mapping

f_{x} : Δ_{X} \to R

defined as

f_{x} (p) = p (x)

. Clearly,

f_{x}

is bounded and continuous. Therefore, the mapping

F_{x} : MP (X) \to R

defined as

F_{x} (MP) = \int_{Δ_{X}} f_{x} d MP = \int_{Δ_{X}} p (x) \cdot d MP (p)

is continuous in the weak-* topology. Therefore,

F_{x}^{- 1} (\{\frac{1}{| X |}\})

is weakly-* closed. It is easy to see that

{MP}_{b} (X) = ⋂_{x \in X} F_{x}^{- 1} (\{\frac{1}{| X |}\})

. This proves that

{MP}_{b} (X)

, which is the finite intersection of weakly-* closed sets, is weakly-* closed.

It remains to show that

{MP}_{b f} (X)

is weakly-* dense in

{MP}_{b} (X)

. We will show that for every

ϵ > 0

and every

MP \in {MP}_{b} (X)

, there exists

{MP}^{'} \in {MP}_{b f} (X)

such that

W_{1} (MP, {MP}^{'}) < ϵ

.

Fix

0 < ϵ < 1

and let

MP \in {MP}_{b} (X)

be any balanced meta-probability measure on

X

, i.e., for every

x \in X

we have

\int_{Δ_{X}} p (x) d MP (p) = \frac{1}{| X |} .

Now fix

x \in X

. By the definition of the Lebesgue integral, there exists a finite partition

{B_{x, i}}_{1 \leq i \leq k_{x}}

of

Δ_{X}

and a sequence of positive numbers

{(b_{x, i})}_{1 \leq i \leq k_{x}}

such that for every

1 \leq i \leq k_{x}

,

B_{x, i}

is a Borel set of

Δ_{X}

,

b_{x, i} \leq p (x)

for every

p \in B_{x, i}

, and

\sum_{i = 1}^{k_{x}} b_{x, i} MP (B_{x, i}) \geq (\int_{Δ_{X}} p (x) \cdot d MP (p)) - \frac{ϵ}{12 | X |} = \frac{1}{| X |} - \frac{ϵ}{12 | X |} .

By applying the same reasoning on the function

1 - p (x) \geq 0

, we can find a finite partition

{C_{x, i}}_{1 \leq i \leq m_{x}}

of

Δ_{X}

and a sequence of positive numbers

{(c_{x, i})}_{1 \leq i \leq m_{x}}

such that for every

1 \leq i \leq m_{x}

,

C_{x, i}

is a Borel set of

Δ_{X}

,

c_{x, i} \geq p (x)

for every

p \in C_{x, i}

and

\sum_{i = 1}^{m_{x}} c_{x, i} MP (C_{x, i}) \leq (\int_{Δ_{X}} p (x) \cdot d MP (p)) + \frac{ϵ}{12 | X |} = \frac{1}{| X |} + \frac{ϵ}{12 | X |} .

Let d be the total variation distance on

Δ_{X}

, i.e.,

d (p, p^{'}) = \frac{1}{2} {∥ p - p^{'} ∥}_{1}

. Since

Δ_{X}

is compact, it can be covered by a finite number of open balls of radius

\frac{ϵ}{4}

, i.e., there exist h points

p_{1}^{'}, \dots, p_{h}^{'}

such that

Δ_{X} = ⋃_{i = 1}^{h} B_{\frac{ϵ}{4}} (p_{i}^{'}) = ⋃_{i = 1}^{h} \{p \in Δ_{X} : d (p, p_{i}^{'}) < \frac{ϵ}{4}\}

. For every

1 \leq i \leq h

, define the set

D_{i} = B_{\frac{ϵ}{4}} (p_{i}^{'}) ∖ (⋃_{1 \leq j < i} B_{\frac{ϵ}{4}} (p_{j}^{'})) .

Clearly, the sets

{D_{i}}_{1 \leq i \leq h}

are disjoint Borel sets that cover

Δ_{X}

. Let

n = h \times \prod_{x \in X} (k_{x} \cdot m_{x})

, and let

A_{1}, \dots, A_{n}

be the Borel sets obtained by intersecting the sets in the collections

{D_{1}, \dots, D_{h}}

,

{B_{x, i}}_{1 \leq i \leq k_{x}}

and

{C_{x, i}}_{1 \leq i \leq m_{x}}

for every

x \in X

. In other words,

\begin{matrix} {A_{i} : & 1 \leq i \leq n} \\ = \{D_{i} \cap ⋂_{x \in X} (B_{x, i_{x}} \cap C_{x, j_{x}}) : 1 \leq i \leq h, and \forall x \in X, 1 \leq i_{x} \leq k_{x} and 1 \leq j_{x} \leq m_{x}\} . \end{matrix}

For every

1 \leq i \leq n

, let

l_{x, i} = b_{x, i^{'}}

where

i^{'}

is the unique integer satisfying

1 \leq i^{'} \leq k_{x}

and

A_{i} \subset B_{x, i^{'}}

. Similarly, let

u_{x, i} = c_{x, i^{″}}

where

i^{″}

is the unique integer satisfying

1 \leq i^{″} \leq k_{x}

and

A_{i} \subset C_{x, i^{″}}

. Clearly,

l_{x, i} \leq p (x) \leq u_{x, i}

for every

x \in A_{i}

. Moreover,

\sum_{i = 1}^{n} l_{x, i} MP (A_{i}) = \sum_{i = 1}^{k_{x}} b_{x, i} MP (B_{x, i}) \geq \frac{1}{| X |} - \frac{ϵ}{12 | X |},

and

\sum_{i = 1}^{n} u_{x, i} MP (A_{i}) = \sum_{i = 1}^{m_{x}} c_{x, i} MP (C_{x, i}) \leq \frac{1}{| X |} + \frac{ϵ}{12 | X |} .

For every

1 \leq i \leq n

, choose

p_{i} \in A_{i}

arbitrarily. Let

j_{i}

be the unique integer such that

A_{i} \subset D_{j_{i}}

. Since

D_{j_{i}} \subset B_{\frac{ϵ}{4}} (p_{j_{i}}^{'})

, we have

d (p, p_{j_{i}}^{'}) < \frac{ϵ}{4}

for every

p \in A_{i}

. Therefore,

d (p, p_{i}) \leq d (p, p_{j_{i}}^{'}) + d (p_{j_{i}}^{'}, p_{i}) < \frac{ϵ}{2}

for every

p \in A_{i}

.

Define the mapping

f : Δ_{X} \to Δ_{X}

as

f (p) = p_{i}

for every

p \in A_{i}

. Clearly,

d (p, f (p)) < \frac{ϵ}{2}

for every

p \in Δ_{X}

.

Now let

{MP}_{f} = f_{#} (MP)

, where

f_{#} (MP)

is the push-forward measure of MP by the mapping f, i.e.,

{MP}_{f} (B) = (f_{#} (MP)) (B) = MP (f^{- 1} (B))

for every Borel set B of

Δ_{X}

. We have:

{MP}_{f} (B) = \sum_{p_{i} \in B} MP (f^{- 1} ({p_{i}})) = \sum_{p_{i} \in B} MP (A_{i}) = \sum_{p_{i} \in B} α_{i},

where

α_{i} = MP (A_{i})

for every

1 \leq i \leq n

. Therefore,

{MP}_{f}

is finitely supported and

supp ({MP}_{f}) \subset {p_{i} : 1 \leq i \leq n} .

Moreover,

{MP}_{f} (p_{i}) = α_{i}

for every

1 \leq i \leq n

.

Now define the mapping

f_{\times} : Δ_{X} \to Δ_{X} \times Δ_{X}

as

f_{\times} (p) = (p, f (p))

, and define the measure

γ_{f}

on

Δ_{X} \times Δ_{X}

as the push-forward of MP by

f_{\times}

, i.e.,

γ_{f} (B) = MP (f_{\times}^{- 1} (B))

for every Borel set B of

Δ_{X} \times Δ_{X}

. It is easy to see that the marginals of

γ_{f}

on the first and second factors are MP and

{MP}_{f}

respectively. Therefore,

γ_{f}

is a coupling between MP and

{MP}_{f}

, hence

\begin{matrix} W_{1} (MP, {MP}_{f}) & = inf_{γ \in Γ (MP, {MP}_{f})} \int_{Δ_{X} \times Δ_{X}} d (p, p^{'}) \cdot d γ (p, p^{'}) \leq \int_{Δ_{X} \times Δ_{X}} d (p, p^{'}) \cdot d γ_{f} (p, p^{'}) \\ \overset{(a)}{=} \int_{Δ_{X}} d (p, f (p)) \cdot d MP (p) \overset{(b)}{\leq} \frac{ϵ}{2}, \end{matrix}

where (a) follows from the fact that

γ_{f}

is the push-forward of MP by

f_{\times} (p) = (p, f (p))

. (b) follows from the fact that

d (p, f (p)) < \frac{ϵ}{2}

for every

p \in Δ_{X}

. Therefore,

{MP}_{f}

well approximates MP and it is finitely supported. However,

{MP}_{f}

may not be balanced, so more work needs to be done in order to find a balanced and finitely supported meta-probability measure that well approximates MP.

For every

x \in X

, we have:

\begin{matrix} \int_{Δ_{X}} p (x) \cdot d {MP}_{f} (p) \overset{(a)}{=} \int_{Δ_{X}} (f (p)) (x) \cdot d MP (p) = \sum_{i = 1}^{n} p_{i} (x) MP (A_{i}) \overset{(b)}{\geq} \sum_{i = 1}^{n} l_{i, x} MP (A_{i}) \geq \frac{1}{| X |} - \frac{ϵ}{12 | X |}, \end{matrix}

where (a) follows from the fact that

{MP}_{f}

is the push-forward of MP by f. (b) follows from the fact that

p_{i} \in A_{i}

and so

p_{i} (x) \geq l_{i, x}

for every

1 \leq i \leq n

. Similarly, we have

\begin{matrix} \int_{Δ_{X}} p (x) \cdot d {MP}_{f} (p) = \sum_{i = 1}^{n} p_{i} (x) MP (A_{i}) \overset{(c)}{\leq} \sum_{i = 1}^{n} u_{i, x} MP (A_{i}) \leq \frac{1}{| X |} + \frac{ϵ}{12 | X |}, \end{matrix}

where (c) follows from the fact that

p_{i} \in A_{i}

and so

p_{i} (x) \leq u_{i, x}

for every

1 \leq i \leq n

. We conclude that for every

x \in X

, we have

\begin{matrix} |π_{X} (x) - \int_{Δ_{X}} p (x) \cdot d {MP}_{f} (p)| \leq \frac{ϵ}{12 | X |}, \end{matrix}

where

π_{X}

is the uniform distribution on

X

. Define

\tilde{p} \in Δ_{X}

as:

\tilde{p} = \int_{Δ_{X}} p \cdot d {MP}_{f} (p) .

For every

x \in X

, define

p^{'} (x) = \frac{6 (π_{X} (x) - \tilde{p} (x))}{ϵ} + \tilde{p} (x) .

Clearly,

\sum_{x \in X} p^{'} (x) = 1

. Moreover,

\begin{matrix} p^{'} (x) & = \frac{6 (π_{X} (x) - \tilde{p} (x))}{ϵ} + \tilde{p} (x) \\ \overset{(a)}{\geq} \frac{6 (π_{X} (x) - π_{X} (x) - \frac{ϵ}{12 | X |})}{ϵ} + \frac{1}{| X |} - \frac{ϵ}{12 | X |} = \frac{1}{2 | X |} - \frac{ϵ}{12 | X |} \geq 0, \end{matrix}

where (a) follows from the fact that

| π_{X} (x) - \tilde{p} (x) | \leq \frac{ϵ}{12 | X |}

. We conclude that

p^{'} \in Δ_{X}

. Now define the meta-probability measure

{MP}^{'}

as follows:

{MP}^{'} = \frac{ϵ}{6} \cdot δ_{p^{'}} + (1 - \frac{ϵ}{6}) {MP}_{f},

where

δ_{p^{'}}

is a Dirac measure centered at

p^{'}

.

For every

x \in X

, we have

\begin{matrix} \int_{Δ_{X}} p (x) \cdot d {MP}^{'} (p) & = \frac{ϵ}{6} \cdot p^{'} (x) + (1 - \frac{ϵ}{6}) \int_{Δ_{X}} p (x) \cdot d {MP}_{f} (p) = \frac{ϵ}{6} \cdot p^{'} (x) + (1 - \frac{ϵ}{6}) \cdot \tilde{p} (x) \\ = π_{X} (x) - \tilde{p} (x) + \frac{ϵ}{6} \cdot \tilde{p} (x) + (1 - \frac{ϵ}{6}) \tilde{p} (x) = π_{X} (x) . \end{matrix}

Therefore,

{MP}^{'}

is balanced and finitely supported. Moreover,

\begin{matrix} W_{1} (MP, {MP}^{'}) & \leq W_{1} (MP, {MP}_{f}) + W_{1} ({MP}_{f}, {MP}^{'}) \overset{(a)}{\leq} \frac{ϵ}{2} + {∥ {MP}_{f} - {MP}^{'} ∥}_{T V} \\ = \frac{ϵ}{2} + {∥{MP}_{f} - (1 - \frac{ϵ}{6}) {MP}_{f} - \frac{ϵ}{6} \cdot δ_{p^{'}}∥}_{T V} \leq \frac{ϵ}{2} + {∥\frac{ϵ}{6} \cdot {MP}_{f}∥}_{T V} + {∥\frac{ϵ}{6} δ_{p^{'}}∥}_{T V} \\ = \frac{ϵ}{2} + \frac{ϵ}{6} + \frac{ϵ}{6} < ϵ, \end{matrix}

where (a) follows from the fact that the 1^st Wasserstein metric is upper bounded by the total variation multiplied by the diameter of

Δ_{X}

(which is equal to 1 in our case) [13]. We conclude that

{MP}_{b f} (X)

is dense in

{MP}_{b} (X)

which is weakly-* closed. Therefore,

{MP}_{b} (X)

is the weak-* closure of

{MP}_{b f} (X)

.

References

Nasser, R. Topological structures on DMC spaces. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 3175–3179. [Google Scholar]
Martin, K. Topology in information theory in topology. Theor. Comput. Sci. 2008, 405, 75–87. [Google Scholar] [CrossRef]
Schwarte, H. On weak convergence of probability measures, channel capacity and code error probabilities. IEEE Trans. Inf. Theory 1996, 42, 1549–1551. [Google Scholar] [CrossRef]
Richardson, T.; Urbanke, R. Modern Coding Theory; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
Rathi, V.; Urbanke, R. Density evolution, thresholds and the stability condition for non-binary LDPC codes. IEE Proc. Commun. 2005, 152, 1069–1074. [Google Scholar] [CrossRef]
Bennatan, A.; Burshtein, D. Design and analysis of nonbinary LDPC codes for arbitrary discrete-memoryless channels. IEEE Trans. Inf. Theory 2006, 52, 549–583. [Google Scholar] [CrossRef]
Cam, L.; Yang, G. Asymptotics in Statistics: Some Basic Concepts; Springer Series in Statistics; Springer: New York, NY, USA, 2000. [Google Scholar]
Nasser, R. Continuity of Channel Parameters and Operations under Various DMC Topologies. arXiv, 2017; arXiv:1701.04466. [Google Scholar]
Kelley, J. General Topology; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1975. [Google Scholar]
Munkres, J. Topology; Featured Titles for Topology Series; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Franklin, S. Spaces in which sequences suffice. Fundam. Math. 1965, 57, 107–115. [Google Scholar] [CrossRef]
Steenrod, N.E. A convenient category of topological spaces. Mich. Math. J. 1967, 14, 133–152. [Google Scholar] [CrossRef]
Villani, C. Topics in Optimal Transportation; American Mathematical Society: Providence, RI, USA, 2003. [Google Scholar]
Torgersen, E. Comparison of Statistical Experiments; Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
Buscemi, F. Degradable Channels, Less Noisy Channels, and Quantum Statistical Morphisms: An Equivalence Relation. Probl. Inf. Transm. 2016, 52, 201–213. [Google Scholar] [CrossRef]
Shannon, C. A Note on a Partial Ordering for Communication Channels. Inf. Contr. 1958, 1, 390–397. [Google Scholar] [CrossRef]
Raginsky, M. Shannon meets Blackwell and Le Cam: Channels, codes, and statistical experiments. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 1220–1224. [Google Scholar]
Nasser, R. A Characterization of the Shannon Ordering of Communication Channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017. [Google Scholar]
Nasser, R. On the input-degradedness and input-equivalence between channels. arXiv, 2017; arXiv:1702.00727. [Google Scholar]

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nasser, R. Topological Structures on DMC Spaces ^†. Entropy 2018, 20, 343. https://doi.org/10.3390/e20050343

AMA Style

Nasser R. Topological Structures on DMC Spaces ^†. Entropy. 2018; 20(5):343. https://doi.org/10.3390/e20050343

Chicago/Turabian Style

Nasser, Rajai. 2018. "Topological Structures on DMC Spaces ^†" Entropy 20, no. 5: 343. https://doi.org/10.3390/e20050343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Topological Structures on DMC Spaces †

Abstract

1. Introduction

2. Preliminaries

2.1. Set-Theoretic Notations

2.2. Topological Spaces

2.3. Separation Axioms

2.4. Relativization

2.5. Continuous Mappings

2.6. Compact Spaces and Sequentially Compact Spaces

2.7. Connected Spaces

2.8. Product of Topological Spaces

2.9. Disjoint Union

2.10. Quotient Topology

2.11. Metric Spaces

2.12. Complete Metric Spaces

2.13. Polish Spaces and Baire Spaces

2.14. Sequential Spaces

2.15. Compactly Generated Spaces

3. Measure-Theoretic Notations

3.1. Probability Measures

3.2. Probabilities on Finite Sets

3.3. Borel Sets and the Support of A Measure

3.4. Convergence of Probability Measures and the Weak-* Topology

3.5. Metrization of the Weak-* Topology

3.6. Meta-Probability Measures

4. The Space of Channels from X to Y

5. Equivalent Channels and Their Representation

6. Space of Equivalent Channels from X to Y

6.1. The DMC X , Y ( o ) Space

6.2. Canonical Embedding and Canonical Identification

7. Space of Equivalent Channels

7.1. Natural Topologies on DMC X , ∗ ( o )

8. Strong Topology on DMC X , ∗ ( o )

Compact Subspaces of ( DMC X , ∗ ( o ) , T s , X , ∗ ( o ) )

9. The Noisiness Metric on DMC Spaces

9.1. Noisiness Metric on DMC X , Y ( o )

9.2. Noisiness Metric on DMC X , ∗ ( o )

10. Topologies from Blackwell Measures

10.1. Weak-* Topology

10.2. Total Variation Topology

11. The Natural Borel σ -algebra on DMC X , ∗ ( o )

12. Discussion and Conclusions

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Proof of Proposition 2

Appendix B. Proof of Lemma 3

Appendix C. Proof of Proposition 4

Appendix D. Proof of Lemma 5

Appendix E. Proof of Lemma 6

Appendix F. Proof of Lemma 7

Appendix G. Proof of Proposition 12

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Topological Structures on DMC Spaces ^†

4. The Space of Channels from $X$ to $Y$

6. Space of Equivalent Channels from $X$ to $Y$

6.1. The ${DMC}_{X, Y}^{(o)}$ Space

7.1. Natural Topologies on ${DMC}_{X, *}^{(o)}$

8. Strong Topology on ${DMC}_{X, *}^{(o)}$

Compact Subspaces of $({DMC}_{X, }^{(o)}, T_{s, X, }^{(o)})$

9.1. Noisiness Metric on ${DMC}_{X, Y}^{(o)}$

9.2. Noisiness Metric on ${DMC}_{X, *}^{(o)}$

11. The Natural Borel $σ$ -algebra on ${DMC}_{X, *}^{(o)}$