Topological Structures on DMC Spaces

Two channels are said to be equivalent if they are degraded from each other. The space of equivalent channels with input alphabet X and output alphabet Y can be naturally endowed with the quotient of the Euclidean topology by the equivalence relation. A topology on the space of equivalent channels with fixed input alphabet X and arbitrary but finite output alphabet is said to be natural if and only if it induces the quotient topology on the subspaces of equivalent channels sharing the same output alphabet. We show that every natural topology is σ-compact, separable and path-connected. The finest natural topology, which we call the strong topology, is shown to be compactly generated, sequential and T4. On the other hand, the strong topology is not first-countable anywhere, hence it is not metrizable. We introduce a metric distance on the space of equivalent channels which compares the noise levels between channels. The induced metric topology, which we call the noisiness topology, is shown to be natural. We also study topologies that are inherited from the space of meta-probability measures by identifying channels with their Blackwell measures.


Introduction
This paper is an extended version of the paper published in International Symposium on Information Theory 2017 (ISIT 2017) [1].
A topology on a given set is a mathematical structure that allows us to formally talk about the neighborhood of a given point of the set. This makes it possible to define continuous mappings and converging sequences. Topological spaces generalize metric spaces which are mathematical structures that specify distances between the points of the space. Links between information theory and topology were investigated in [2]. In this paper, we aim to construct meaningful topologies and metrics for the space of equivalent channels sharing a common input alphabet. Let X and Y be two fixed finite sets. Every discrete memoryless channel (DMC) with input alphabet X and output alphabet Y can be determined by its transition probabilities. Since there are |X | × |Y | such probabilities, the space of all channels from X to Y can be seen as a subset of R |X |×|Y | . Therefore, this space can be naturally endowed with the Euclidean metric, or any other equivalent metric. A generalization of this topology to infinite input and output alphabets was considered in [3].
There are a few drawbacks to this approach. For example, consider the case where X = Y = F 2 := {0, 1}. The binary symmetric channels BSC( ) and BSC(1 − ) have non-zero Euclidean distance if = 1 2 . On the other hand, BSC( ) and BSC(1 − ) are completely equivalent from an operational point of view: both channels have exactly the same probability of error under optimal decoding for any fixed code. Moreover, any sub-optimal decoder for one channel can be transformed to a sub-optimal decoder for the other channel without changing the probability of error nor the computational complexity. This is why it makes sense, from an information-theoretic point of view, to identify equivalent channels and consider them as one point in the space of "equivalent channels".
The limitation of the Euclidean metric is clearer when we consider channels with different output alphabets. For example, BSC 1 2 and BEC (1) are completely equivalent but they do not have the same output alphabet, and so there is no way to compare them with the Euclidean metric because they do not belong to the same space.
The standard approach to solve this problem is to find a "canonical sufficient statistic" and find a representation of each channel in terms of this sufficient statistic. This makes it possible to compare channels with different output-alphabets. One standard sufficient statistic that has been widely used for binary-input channels is the log-likelihood ratio. Each binary-input channel can be represented as a density of log-likelihood ratios (called L-density in [4]). This representation makes it possible to "topologize" the space of "equivalent" binary-input channels by considering the topology of convergence in distribution [4]. A similar approach can be adopted for non-binary-input channels (see [5,6]). Another (equivalent) way to "topologize" the space of equivalent channels is by using the Le Cam deficiency distance [7].
One issue (which is secondary and only relevant for conceptual purposes) with the current formulation of this topology is that it does not allow us to see it as a "natural topology". Consider a fixed output alphabet Y and let us focus on the space of "equivalent channels" from X to Y. Since this space is the quotient of the space of channels from X to Y, which is naturally topologized by the Euclidean metric, it seems that the most natural topology on this space is the quotient of the Euclidean topology by the equivalence relation. This motivates us to consider a topology on the space of "equivalent channels" with input alphabet X and arbitrary but finite output alphabet as natural if and only if it induces the quotient topology on the subspaces of "equivalent channels" from X to Y for any output alphabet Y. A legitimate question to ask now is whether the L-density topology is natural in this sense or not.
In this paper, we study general and particular natural topologies on DMC spaces. In Section 2, we provide a brief summary of the basic concepts and theorems in general topology. The measure-theoretic notations that we use are introduced in Section 3. The space of channels from X to Y and its topology is studied in Section 4. We formally define the equivalence relation between channels in Section 5. It is shown that the equivalence class of a channel can be determined by the distribution of its posterior probability distribution. This is the standard generalization of L-densities to non-binary-input channels. This distribution is called the Blackwell measure of the channel. In Section 6, we study the space of equivalent channels from X to Y and the quotient topology.
In Section 7, we define the space of equivalent channels with input alphabet X and we study the properties of general natural topologies. The finest natural topology, which we call the strong topology, is studied in Section 8. A metric for the space of equivalent channels is proposed in Section 9. The induced topology by this metric is called the noisiness topology. In Section 10, we study the topologies that are inherited from the space of meta-probability measures by identifying equivalent channels with their Blackwell measures. We show that the weak- * topology (which is the standard generalization of the L-density topology to non-binary-input channels) is exactly the same as the noisiness topology. The total variation topology is also investigated in Section 10. The Borel σ-algebra of Hausdorff natural topologies is studied in Section 11.
The continuity (under the topologies introduced here) of mappings that are relevant to information theory (such as capacity, mutual information, Bhattacharyya parameter, probability of error of a fixed code, optimal probability of error of a given rate and blocklength, channel sums and products, etc, . . . ) is studied in [8].

Preliminaries
In this section, we recall basic definitions and well known theorems in general topology. The reader who is already familiar with the basic concepts of topology may skip this section and refer to it later if necessary. Proofs of all non-referenced facts can be found in any standard textbook on General Topology (e.g., [9]). Definitions and theorems that may not be widely known can be found in Sections 2.10, 2.14 and 2.15.

Set-Theoretic Notations
For every integer n > 0, we denote the set {1, . . . , n} as [n]. The set of mappings from a set A to a set B is denoted as B A . Let A be a subset of B. The indicator mapping 1 A,B : B → {0, 1} of A in B is defined as: If the superset B is clear from the context, we simply write 1 A to denote the indicator mapping of A in B.
The power set of B is the set of subsets of B. Since every subset of B can be identified with its indicator mapping, we denote the power set of B as 2 B := {0, 1} B .
A collection A ⊂ 2 B of subsets of B is said to be finer than another collection A ⊂ 2 B if A ⊂ A. If this is the case, we also say that A is coarser than A.
Let (A i ) i∈I be a collection of arbitrary sets indexed by I. The disjoint union of (A i ) i∈I is defined as i∈I A i = i∈I (A i × {i}). For every i ∈ I, the i th -canonical injection is the mapping φ i : A i → j∈I A j defined as φ i (x i ) = (x i , i). If no confusions can arise, we can identify A i with A i × {i} through the canonical injection. Therefore, we can see A i as a subset of j∈I A j for every i ∈ I.
A relation R on a set T is a subset of T × T. For every x, y ∈ T, we write xRy to denote (x, y) ∈ R. A relation is said to be reflexive if xRx for every x ∈ T. It is symmetric if xRy implies yRx for every x, y ∈ T. It is anti-symmetric if xRy and yRx imply x = y for every x, y ∈ T. It is transitive if xRy and yRz imply xRz for every x, y, z ∈ T.
An order relation is a relation that is reflexive, anti-symmetric and transitive. An equivalence relation is a relation that is reflexive, symmetric and transitive.
Let R be an equivalence relation on T. For every x ∈ T, the setx = {y ∈ T : xRy} is the R-equivalence class of x. The collection of R-equivalence classes, which we denote as T/R, forms a partition of T, and it is called the quotient space of T by R. The mapping Proj R : T → T/R defined as Proj R (x) =x for every x ∈ T is the projection mapping onto T/R.

Topological Spaces
A topological space is a pair (T, U ), where U ⊂ 2 T is a collection of subsets of T satisfying: The intersection of a finite collection of members of U is also a member of U .

•
The union of an arbitrary collection of members of U is also a member of U .
If (T, U ) is a topological space, we say that U is a topology on T.
The power set 2 T of T is clearly a topology. It is called the discrete topology on T. If A is a an arbitrary collection of subsets of T, we can construct a topology on T starting from A as follows: This is the coarsest topology on T that contains A. It is called the topology on T generated by A.
Let (T, U ) be a topological space. The subsets of T that are members of U are called the open sets of T. Complements of open sets are called closed sets. We can easily see that the closed sets satisfy the following: The union of a finite collection of closed sets is closed.

•
The intersection of an arbitrary collection of closed sets is closed.
Let A be an arbitrary subset of T. The closure cl(A) of A is the smallest closed set containing A: The interior A • of A is the largest open subset of A: We say that (T, U ) is first-countable if every point x ∈ T has a countable neighborhood basis.
A collection of open sets B ⊂ U is said to be a base for the topology U if every open set U ∈ U can be written as the union of elements of B.
We say that (T, U ) is a second-countable space if the topology U has a countable base. It is a well known fact that every second-countable space is first-countable and separable. We say that a sequence (x n ) n≥0 of elements of T converges to x ∈ T if for every neighborhood O of x, there exists n 0 ≥ 0 such that for every n ≥ n 0 , we have x n ∈ O. We say that x is a limit of the sequence (x n ) n≥0 . Note that the limit does not need to be unique if there is no constraint on the topology.

Separation Axioms
(T, U ) is said to be a T 1 -space if for every x, y ∈ T, there exists an open set U ∈ U such that x ∈ U and y / ∈ U. It is easy to see that (T, U ) is T 1 if and only if all singletons are closed. (T, U ) is said to be a Hausdorff space (or T 2 -space) if for every x, y ∈ T, there exist two open sets U, V ∈ U such that x ∈ U, y ∈ V and U ∩ V = ∅.
If (T, U ) is Hausdorff, the limit of every converging sequence is unique.
(T, U ) is said to be regular if for every x ∈ T and every closed set F not containing x, there exist two open sets U, V ∈ U such that x ∈ U, F ⊂ V and U ∩ V = ∅.
(T, U ) is said to be normal if for every two disjoint closed sets A and B, there exist two open sets If (T, U ) is normal, disjoint closed sets can be separated by disjoint closed neighborhoods. i.e., for every two disjoint closed sets A and B, there exist two open sets U, U ∈ U and two closed sets It is easy to see that T 4 ⇒ T 3 ⇒ T 2 ⇒ T 1 .

Relativization
If (T, U ) is a topological space and A is an arbitrary subset of T, then A inherits a topology U A from (T, U ) as follows: It is easy to check that U A is a topology on A. If (T, U ) is first-countable (respectively second-countable, or Hausdorff), then (A, U A ) is first-countable (respectively second-countable, or Hausdorff).
If (T, U ) is normal and A is closed, then (A, U A ) is normal. The union of a countable number of separable subspaces is separable.

Continuous Mappings
Let (T, U ) and (S, V ) be two topological spaces. A mapping f : A bijection f : T → S is a homeomorphism if both f and f −1 are continuous. In this case, for every A ⊂ T, A ∈ U if and only if f (A) ∈ V. This means that (T, U ) and (S, V ) have the same topological structure and share the same topological properties.

Compact Spaces and Sequentially Compact Spaces
(T, U ) is a compact space if every open cover of T admits a finite sub-cover, i.e., if (U i ) i∈I is a collection of open sets such that T = i∈I U i then there exists n > 0 and i 1 , . . . , i n ∈ I such that T = n j=1 U i j .
If (T, U ) is compact, then every closed subset of T is compact (with respect to the inherited topology).
If f : T → S is a continuous mapping from a compact space (T, U ) to an arbitrary topological space (S, V ), then f (T) is compact.
If A is a compact subset of a Hausdorff topological space, then A is closed.
(T, U ) is said to be locally compact if every point has at least one compact neighborhood. A compact space is automatically locally compact.
If (T, U ) is Hausdorff and locally compact, then for every point x ∈ T and every neighborhood O of x, O contains a compact neighborhood of x.
A compact Hausdorff space is always normal. (T, U ) is a σ-compact space if it is the union of a countable collection of compact subspaces. (T, U ) is countably compact if every countable open cover of T admits a finite sub-cover. This is a weaker condition compared to compactness.
(T, U ) is said to be sequentially compact if every sequence in T has a converging subsequence. In general, compactness does not imply sequential compactness nor the other way around. (T, U ) is path-connected if every two points of T can be joined by a continuous path. I.e., for every x, y ∈ T, there exists a continuous mapping f : [0, 1] → T such that f (0) = x and f (1) = y, where [0, 1] is endowed with the well known Euclidean topology (See Section 2.11 for the definition of the Euclidean metric and its induced topology).

Connected Spaces
A path-connected space is connected but the converse is not true in general.
A subset A of T is said to be connected (respectively path-connected) if (A, U A ) is connected (respectively path-connected).
If (A i ) i∈I is a collection of connected (respectively path-connected) subsets of T such that i∈I A i = ∅, then i∈I A i is connected (respectively path-connected).

Product of Topological Spaces
Let {(T i , U i )} i∈I be a collection of topological spaces indexed by I. Let T = ∏ i∈I T i be the product of this collection. For every j ∈ I, the j th -canonical projection is the mapping Proj j : The product topology U := i∈I U i on T is the coarsest topology that makes all the canonical projections continuous. It can be shown that U is generated by the collection of sets of the form ∏ i∈I U i , where U i ∈ U i for all i ∈ I, and U i = T i for only finitely many i ∈ I. The product of T 1 (respectively, Hausdorff, regular, T 3 , compact, connected, or path-connected) spaces is T 1 (respectively, Hausdorff, regular, T 3 , compact, connected, or path-connected).

Disjoint Union
Let {(T i , U i )} i∈I be a collection of topological spaces indexed by I. Let T = i∈I T i be the disjoint union of this collection. The disjoint union topology U := i∈I U i on T is the finest topology which makes all the canonical injections continuous. It can be shown that U ∈ U if and only if U ∩ T i ∈ U i for every i ∈ I.
A mapping f : T → S from (T, U ) to a topological space (S, V ) is continuous if and only if it is continuous on T i for every i ∈ I.
The disjoint union of T 1 (respectively Hausdorff) spaces is T 1 (respectively Hausdorff). The disjoint union of two or more non-empty spaces is always disconnected.
Products are distributive with respect to the disjoint union, i.e., if (S, V ) is a topological space

Quotient Topology
Let (T, U ) be a topological space and let R be an equivalence relation on T. The quotient topology on T/R is the finest topology that makes the projection mapping Proj R continuous. It is given by Lemma 1. Let f : T → S be a continuous mapping from (T, U ) to (S, V ). If f (x) = f (x ) for every x, x ∈ T satisfying xRx , then we can define a transcendent mapping f : T/R → S such that f (x) = f (x ) for any x ∈x. f is well defined on T/R . Moreover, f is a continuous mapping from (T/R, U /R) to (S, V ).
T/R is said to be upper semi-continuous if for everyx ∈ T/R and every open set U ∈ U satisfyinĝ x ⊂ U, there exists an open set V ∈ U such thatx ⊂ V ⊂ U, and V can be written as the union of members of T/R.
The following Lemma characterizes upper semi-continuous quotient spaces:

Lemma 2. [9] T/R is upper semi-continuous if and only if
Proj R is a closed mapping.
The following theorem is very useful to prove many topological properties for the quotient space: Theorem 1. [9] Let (T, U ) be a topological space, and let R be an equivalence relation on T such that T/R is upper semi-continuous andx is a compact subset of T for everyx ∈ T/R. If (T, U ) is Hausdorff (respectively, regular, locally compact, or second-countable) then (T/R, U /R) is Hausdorff (respectively, regular, locally compact, or second-countable). If (M, d) is a metric space, we say that d is a metric (or distance) on M.

Metric Spaces
The Euclidean metric on R n is defined as d( and y = (y i ) 1≤i≤n . R n is second countable. Moreover, a subset of R n is compact if and only if it is bounded and closed.
For every x ∈ M and every > 0, we define the open ball of center x and radius as: The metric topology U d on M induced by d is the coarsest topology on M which makes d a continuous mapping from M × M to R + . It is generated by all the open balls.
The metric topology is always T 4 and first-countable. Moreover, (M, U d ) is separable if and only if it is second-countable.
Since every metric space is Hausdorff, we can see that every subset of a compact metric space is closed if and only if it is compact.
Every σ-compact metric space is second-countable. For metric spaces, compactness and sequential compactness are equivalent. A function f : M 1 → M 2 from a metric space (M 1 , d 1 ) to a metric space (M 2 , d 2 ) is said to be uniformly continuous if for every > 0, there exists δ > 0 such that for every x, x ∈ M 1 satisfying If f : M 1 → M 2 is a continuous mapping from a compact metric space (M 1 , d 1 ) to an arbitrary metric space (M 2 , d 2 ), then f is uniformly continuous.
A topological space (T, U ) is said to be metrizable if there exists a metric d on T such that U is the metric topology on T induced by d.
The disjoint union of metrizable spaces is always metrizable.
The following theorem shows that all separable metrizable spaces are characterized topologically: Theorem 2. [9] A topological space (T, U ) is metrizable and separable if and only if it is Hausdorff, regular and second countable.

Complete Metric Spaces
A sequence (x n ) n≥0 is said to be a Cauchy sequence in (M, d) if for every > 0, there exists n 0 ≥ 0 such that for every n 1 , n 2 ≥ n 0 we have d(x n 1 , x n 2 ) < .
Every converging sequence is Cauchy, but the converse is not true in general. A metric space is said to be complete if every Cauchy sequence converges in it. A closed subset of a complete space is always complete.
A complete subspace of an arbitrary metric space is always closed. Every compact metric space is complete, but the converse is not true in general. For every metric space (M, d), there exists a superspace (M, d) containing M such that: The space (M, d) is said to be a completion of (M, d).

Polish Spaces and Baire Spaces
A topological space (T, U ) that is both separable and completely metrizable (i.e., has a metrization that is complete) is called a Polish space.
A topological space is said to be a Baire space if the intersection of countably many dense open subsets is dense. The following facts can be found in [10]: Every open subset of a Baire space is Baire.

Sequential Spaces
Sequential spaces were introduced by Franklin [11] to answer the following question: Assume we know all the converging sequences of a topological space. Is this enough to uniquely determine the topology of the space? Sequential spaces are the most general category of spaces for which converging sequences suffice to determine the topology.
Let (T, U ) be a topological space. A subset U ⊂ T is said to be sequentially open if for every sequence (x n ) n≥0 that converges to a point of U lies eventually in U, i.e., there exists n 0 ≥ 0 such that x n ∈ U for every n ≥ n 0 . Clearly, every open subset of T is sequentially open, but the converse is not true in general.
A topological space (T, U ) is said to be sequential if every sequentially open subset of T is open. A mapping f : T → S from a sequential topological space (T, U ) to an arbitrary topological space (S, V ) is continuous if and only if for every sequence (x n ) n≥0 in T that converges to x ∈ T, the sequence ( f (x n )) n≥0 converges to f (x) in (S, V ) [11].
The following facts were shown in [11]: • Every first-countable space is sequential. Therefore, every metrizable space is sequential.

•
The quotient of a sequential space is sequential.

•
All closed and open subsets of a sequential space are sequential.

•
Every countably compact sequential Hausdorff space is sequentially compact.

•
A topological space is sequential if and only if it is the quotient of a metric space.

Compactly Generated Spaces
A topological space (T, U ) is compactly generated if it is Hausdorff and for every subset F of T, F is closed if and only if F ∩ K is closed for every compact subset K of T. Equivalently, (T, U ) is compactly generated if it is Hausdorff and for every subset U of T, U is open in T if and only if U ∩ K is open in K for every compact subset K of T.
The following facts can be found in [12]: • All locally compact Hausdorff spaces are compactly generated.

•
All first-countable Hausdorff spaces are compactly generated. Therefore, every metrizable space is compactly generated.

•
A Hausdorff quotient of a compactly generated space is compactly generated. • If (T, U ) is compactly generated and (S, V ) is Hausdorff locally compact, then (T × S, U ⊗ V ) is compactly generated.

Measure-Theoretic Notations
In this section, we introduce the measure-theoretic notations that we are using. We assume that the reader is familiar with the basic definitions and theorems of Measure Theory.

Probability Measures
If A ⊂ 2 M is a collection of subsets of M, we denote the σ-algebra that is generated by A as σ(A). The set of probability measures on (M, Σ) is denoted as P (M, Σ). If the σ-algebra Σ is known from the context, we simply write P (M) to denote the set of probability measures.
If P ∈ P (M, Σ) and {x} is a measurable singleton, we simply write P(x) to denote P({x}). For every P 1 , P 2 ∈ P (M, Σ), the total variation distance between P 1 and P 2 is defined as: The space P (M, Σ) is a complete metric space under the total variation distance.

Probabilities on Finite Sets
We always endow finite sets with their finest σ-algebra, i.e., the power set. In this case, every probability measure is completely determined by its value on singletons, i.e., if P is a measure on a finite set X , then for every A ⊂ X , we have If X is a finite set, we denote the set of probability distributions on X as ∆ X . Note that ∆ X is an (|X | − 1)-dimensional simplex in R X . We always endow ∆ X with the total variation distance and its induced topology. For every p 1 , p 2 ∈ ∆ X , we have: Note that the total variation topology on ∆ X is the same as the one inherited from the Euclidean topology of R X by relativisation. Since ∆ X is a closed and bounded subset of R X , it is compact.

Borel Sets and the Support of A Measure
Let (T, U ) be a Hausdorff topological space. The Borel σ-algebra of (T, U ) is the σ-algebra generated by U . We denote the Borel σ-algebra of (T, U ) as B(T, U ). If the topology U is known from the context, we simply write B(T) to denote the Borel σ-algebra. The sets in B(T) are called the Borel sets of T.
The support of a probability measure P ∈ P (T, B(T)) is the set of all points x ∈ T for which every neighborhood has a strictly positive measure: If P is a probability measure on a Polish space, then P T \ supp(P) = 0.

Convergence of Probability Measures and the Weak- * Topology
We have many notions of convergence of probability measures. If the measurable space does not have a topological structure, we have two notions of convergence:

•
The total-variation convergence: we say that a sequence (P n ) n≥0 of probability measures in P (M, Σ) converges in total variation to P ∈ P (M, Σ) if and only if lim n→∞ P n − P TV = 0.
• The strong convergence: we say that a sequence (P n ) n≥0 in P (M, Σ) strongly converges to P ∈ P (M, Σ) if and only if lim n→∞ P n (A) = P(A) for every A ∈ Σ.
Clearly, total-variation convergence implies strong convergence. The converse is not true in general. However, if we are working in the Borel σ-algebra of a Polish space T and (P n ) n≥0 strongly converges to a finitely supported probability measure P, then which implies that (P n ) n≥0 also converges to P in total variation. Therefore, in a Polish space, total variation convergence and strong convergence to finitely supported probability measures are equivalent.
Let (T, U ) be a Hausdorff topological space. We say that a sequence (P n ) n≥0 of probability measures in P (T, B(T)) weakly- * converges to P ∈ P (T, B(T)) if and only if for every bounded and continuous function f from T to R, we have Note that many authors call this notion "weak convergence" rather than weak- * convergence. We will refrain from using the term "weak convergence" in order to be consistent with the functional analysis notation.
The weak- * topology on P (T, B(T)) is the coarsest topology which makes the mappings continuous over P (T, B(T)), for every bounded and continuous function f from T to R.

Metrization of the Weak- * Topology
If (T, U ) is a Polish space, the weak- * topology on P (T, B(T)) is also Polish [13]. There are many known metrizations for the weak- * topology. One metrization that is particularly convenient for us is the Wasserstein metric.
The 1 st -Wasserstein distance on P (T, B(T)) is defined as where Γ(P, P ) is the collection of all probability measures on T × T with marginals P and P on the first and second factors respectively, and d is a metric on T that induces the topology U . Γ(P, P ) is called the set of couplings of P and P . If d is bounded and (T, d) is separable and complete, then W 1 metrizes the weak- * topology [13]. If (T, U ) is compact, then (P (T), W 1 ) is also compact [13]. [13]. In other words, the Wasserstein metric is controlled by total variation.

Meta-Probability Measures
Let X be a finite set. A meta-probability measure on X is a probability measure on the Borel sets of ∆ X . It is called a meta-probability measure because it is a probability measure on the space of probability distributions on X .
We denote the set of meta-probability measures on X as MP (X ). Clearly, MP (X ) = P (∆ X ). A meta-probability measure MP on X is said to be balanced if it satisfies where π X is the uniform probability distribution on X . We denote the set of all balanced meta-probability measures on X as MP b (X ). The set of all balanced and finitely supported meta-probability measures on X is denoted as MP b f (X ).

The Space of Channels from
For every (x, y) ∈ X × Y, we denote p W (x, y) as W(y|x), which we interpret as the conditional probability of receiving y at the output, given that x is the input.
Let DMC X ,Y be the set of all channels having X as input alphabet and Y as output alphabet. For every W, W ∈ DMC X ,Y , define the distance between W and W as follows: It is easy to check the following properties of d X ,Y : Throughout this paper, we always associate the space DMC X ,Y with the metric distance d X ,Y and the metric topology T X ,Y induced by it.
For every x ∈ X , the mapping y → W(y|x) is a probability distributions on Y. Therefore, every channel W can be seen as a collection of probability distributions on Y, and the collection is indexed by x ∈ X . This allows us to identify the space DMC the set of probability distributions on Y. It is easy to see that the topology given by the metric d X ,Y on DMC X ,Y is the same as the product topology on (∆ Y ) X , which is also the same as the topology inherited from the Euclidean topology of R X ×Y by relativization.
It is known that ∆ Y is a closed and bounded subset of R Y . Therefore, ∆ Y is compact, which implies that (∆ Y ) X is compact. We conclude that the metric space DMC If W ∈ DMC X ,Y and V ∈ DMC Y,Z , we define the composition V • W ∈ DMC X ,Z of W and V as follows: It is easy to see that the mapping (W, For every mapping f : X → Y, define the deterministic channel D f ∈ DMC X ,Y as follows:

Equivalent Channels and Their Representation
Let W ∈ DMC X ,Y and W ∈ DMC X ,Z be two channels having the same input alphabet. We say The channels W and W are said to be equivalent if each one is degraded from the other. In the rest of this section, we describe one way to check whether two given channels are equivalent.
Let ∆ X and ∆ Y be the space of probability distributions on X and Y respectively. Define This can be interpreted as the probability distribution of the output when the input is uniformly distributed in X . The image of W is the set of output-symbols y ∈ Y having strictly positive probabilities: For every y ∈ Im(W), define W −1 y ∈ ∆ X as follows: , ∀x ∈ X .
W −1 y (x) can be interpreted as the posterior probability of x, given that the output is y, and assuming a uniform prior distribution on the input. In other words, if X is a random variable uniformly distributed in X and Y is the output of the channel W when X is the input, then: On the other hand, if P o W (y) = 0, then we must have W(y|x) = 0. We conclude that P o W and the collection {W −1 y } y∈Im(W) uniquely determine W. The Blackwell measure, denoted MP W , (In an earlier version of this work, I called MP W the posterior meta-probability distribution of W. Maxim Raginsky thankfully brought to my attention the fact that MP W is called Blackwell measure) of W is a probability distribution on ∆ X having masses P o W (y) on W −1 y for each y ∈ Im(W): Another way to express MP W is as follows: is a Dirac measure centered at W −1 y ∈ ∆ X . MP W can be interpreted as follows: after the receiver obtains the output of the channel, he can compute the posterior probabilities of the input as the conditional probability distribution of the input given the output symbol that he received. However, before receiving the output symbol, the receiver does not know what he we will receive. He just has different probabilities for different possible output symbols. Therefore, the posterior probability distribution that will be computed by the receiver is itself random, and so we need a meta-probability measure to describe it. MP W is exactly this meta-probability measure.
Since Im(W) is finite, the support of MP W is finite and it consists of all points in ∆ X having strictly positive mass: The rank of W is the size of the support of its Blackwell measure: Notice that for every x ∈ X , we have where (a) follows from the fact that W(y|x) = 0 for every y / ∈ Im(W). Therefore, we can write where π X is the uniform probability distribution on X . This shows that MP W is a balanced meta-probability measure.
The following proposition characterizes the Blackwell measures of DMCs with input alphabet X : Proposition 1. [14] A meta-probability measure MP on X is the Blackwell measure of some DMC with input alphabet X if and only if MP is balanced and finitely supported.
Proof. This proposition is known [14], but we provide a proof for completeness.
The above discussion shows that if MP is the Blackwell measure of some channel with input alphabet X , then it is balanced and finitely supported. Now assume that MP is balanced and finitely supported, and let Y = supp(MP). Define the channel W ∈ DMC X ,Y as W(p|x) = |X |MP(p)p(x) for every x ∈ X and every p ∈ Y = supp(MP). For every x ∈ X , we have: Therefore, W is a valid channel. For every p ∈ Y, we have which implies that Im(W) = Y. For every (x, p) ∈ X × Y we have: Therefore, W −1 p = p for every p ∈ Y. For every Borel subset B of ∆ X , we have: We conclude that MP W = MP.
In [4], equivalent representations for binary memoryless symmetric (BMS) channels (namely L, D and G densities) were provided. A necessary and sufficient condition for the degradation of a BMS channel W with respect to another BMS channel W was given in [4] in terms of the |D|-densities of W and W . It immediately follows from this condition that two BMS channels are equivalent if and only if they have the same |D|-densities. One can deduce from this that two BMS channels (with finite output alphabets) are equivalent if and only if they have the same Blackwell measure. The following proposition shows that this is also true for channels with arbitrary (but finite) input and output alphabets: Proof. This proposition is known [14], but we provide a proof in Appendix A for completeness. Corollary 1. If W ∈ DMC X ,Y and rank(W) > |Z |, then W is not equivalent to any channel in DMC X ,Z .
Proof. Since rank(W ) = | supp(MP W )| ≤ |Z | for every W ∈ DMC X ,Z , it is impossible for W to be equivalent to any channel W in DMC X ,Z . Corollary 2. If |X | = 1, all channels with input alphabet X are equivalent.

Space of Equivalent Channels from
X ,Y is an equivalence relation on DMC X ,Y . Definition 1. The space of equivalent channels with input alphabet X and output alphabet Y is the quotient of the space of channels from X to Y by the equivalence relation: We define the topology T Unless we explicitly state otherwise, we always associate DMC X ,Y is a compact, path-connected and metrizable space.
Proof. Since DMC X ,Y is compact and path-connected, DMC X ,Y is compact and path-connected as well.
Since the projection map Proj of Lemma 3 is closed, Lemma 2 implies that the quotient space X ,Y is upper semi-continuous. On the other hand, Corollary 3 shows that all the members of DMC X ,Y are compact in DMC X ,Y . Therefore, the conditions of Theorem 1 are satisfied. Since DMC X ,Y is a metric space, it is Hausdorff and regular. Moreover, since it can be seen as a subspace of R |X |·|Y | , it is also second-countable. By Theorem 1 we get that DMC is Hausdorff, regular and second-countable, and from Theorem 2 we conclude that DMC X ,Y is separable and metrizable.

Canonical Embedding and Canonical Identification
Let X , Y 1 and Y 2 be three finite sets such that |Y 1 | ≤ |Y 2 |. We will show that there is a canonical embedding from DMC . In other words, there exists an explicitly constructable A and the homeomorphism depend only on X , Y 1 and Y 2 (this is why we say that they are canonical). Moreover, we can show that A depends only on |Y 1 |, X and Y 2 .
Corollary 4. For every W, W ∈ DMC X ,Y 1 and every two injections f , g from Y 1 to Y 2 , we have: For every W ∈ DMC X ,Y 1 , we denote the R (o) X ,Y 1 -equivalence class of W asŴ, and for every Proposition 3. Let f : Y 1 → Y 2 be any fixed injection between Y 1 and Y 2 . Define the mapping F : DMC -equivalence classŴ of W, Lemma 1 implies that F is continuous. Moreover, we can see from Corollary 4 that F is an injection.
For every closed subset B of DMC Hausdorff (as it is metrizable). Therefore, F is a closed mapping. Now since F is an injection that is both continuous and closed, we can deduce that F is a homeomorphism between DMC . We would like now to show that depends only on |Y 1 |, X and Y 2 . Let Y 1 be a finite set such that |Y 1 | = |Y 1 |. For every -equivalence class of W.
Let g : Y 1 → Y 1 be a fixed bijection from Y 1 to Y 1 and let f = f • g. Define F : DMC As above, F is well defined, and it is a homeomorphism from DMC Since this is true for every W ∈ DMC By exchanging the roles of Y 1 and Y 1 and using the fact that depends only on |Y 1 |, X and Y 2 .
Finally, for every W ∈Ŵ and every W ∈ F(Ŵ) = D f • W , W is equivalent to D f • W and D f • W is equivalent to W (by Lemma 4), hence W is equivalent to W .
Proof. Let f be a bijection from Y 1 to Y 2 . Define the mapping F : DMC Also, define the mapping F : DMC where V ∈Ṽ and Proj 1 : -equivalence classes. Proposition 3 shows that F and F are well defined. For every W ∈ DMC X ,Y 1 , we have: where (a) follows from the fact that W ∈Ŵ and (b) follows from the fact that We can similarly show that F(F (Ṽ)) =Ṽ for everyṼ ∈ DMC (o) X ,Y 2 . Therefore, both F and F are bijections. Proposition 3 now implies that F is a homeomorphism from DMC . Moreover, F depends only on X , Y 1 and Y 2 .

Corollary 5 allows us to identify DMC
In the rest of this paper, we identify DMC   Proof. See Appendix C.

Space of Equivalent Channels
We would like to form the space of all equivalent channels having the same input alphabet X . The previous section showed that if |Y 1 | = |Y 2 |, there is a canonical identification between DMC  The subscript * indicates that the output alphabets of the considered channels are arbitrary but finite.
We define the equivalence relation R (o) X , * on DMC X , * as follows: Definition 2. The space of equivalent channels with input alphabet X is the quotient of the space of channels with input alphabet X by the equivalence relation: For every n ≥ 1 and every W, W ∈ DMC X ,[n] , we have WR  Remember that for every m ≥ n ≥ 1 and every W ∈ DMC X ,[n] , we identifiedŴ with [n] . We conclude that identifying DMC For any W, W ∈ DMC X , * , Proposition 2 shows that WR  X , * is the quotient of DMC X , * and since DMC X , * was not given any topology, there is no "standard topology" on DMC X , * is studied in [8].
In this paper, we focus on one particular requirement that we consider the most basic property required from any "acceptable" topology on DMC The reason why we consider such topology as natural is because DMC Before discussing any particular natural topology, we would like to discuss a few properties that are common to all natural topologies.

Proof. Assume on the contrary that there exists a non-empty open set
which is a contradiction. Corollary 6. If |X | ≥ 2 and T is a natural topology, then for every n ≥ 1, the interior of DMC    Proof. Assume on the contrary that there exists a compact neighborhood K ofŴ. There exists an open set U such thatŴ ∈ U ⊂ K.
Since K is compact and Hausdorff, it is a Baire space. Moreover, since U is an open subset of K, U is also a Baire space.
Fix n ≥ 1. Since the interior of DMC in U is also empty. Therefore, U \ DMC

Strong Topology on DMC
The first natural topology that we study is the strong topology T X , * , which is the finest natural topology.
Since the spaces {DMC X ,[n] } n≥1 are disjoint and since there is no a priori way to (topologically) compare channels in DMC X ,[n] with channels in DMC X ,[n ] for n = n , the "most natural" topology that we can define on DMC X , * is the disjoint union topology T s,X , * := n≥1 T X , [n] . Clearly, the space (DMC X , * , T s,X , * ) is disconnected. Moreover, T s,X , * is metrizable because it is the disjoint union of metrizable spaces. It is also σ-compact because it is the union of countably many compact spaces.
We added the subscript s to emphasize the fact that T s,X , * is a strong topology (remember that the disjoint union topology is the finest topology that makes the canonical injections continuous).  Hence, where (a) and (c) follow from the properties of the quotient topology, and (b) follows from the properties of the disjoint union topology.
We conclude that U ⊂ DMC  s,X , * is the finest natural topology.
We can also characterize the strongly closed subsets of DMC X ,[n] is metrizable for every n ≥ 1, it is also normal. We can use this fact to prove that the strong topology on DMC The following theorem shows that the strong topology satisfies many desirable properties. s,X , * ) is a compactly generated, sequential and T 4 space.
Proof. Since (DMC X , * , T s,X , * ) is metrizable, it is sequential. Therefore, (DMC s,X , * ), which is the quotient of a sequential space, is sequential.
Let us now show that DMC The following proposition shows that every rank-unbounded sequence does not converge in (DMC X , * is rank-unbounded. This cannot happen unless |X | ≥ 2. In order to show that (Ŵ n ) n≥0 does not converge, it is sufficient to show that there exists a subsequence of (Ŵ n ) n≥0 which does not converge.
Let (Ŵ n k ) k≥0 be any subsequence of (Ŵ n ) n≥0 where the rank strictly increases, i.e., rank(W n k ) < rank(W n k ) for every 0 ≤ k < k . We will show that (Ŵ n k ) k≥0 does not converge.
Assume on the contrary that (Ŵ n k ) k≥0 converges toŴ ∈ DMC X , * \A. Since A is strongly closed, U is strongly open. Moreover, U containŝ W, so U is a neighborhood ofŴ. Therefore, there exists k 0 ≥ 0 such thatŴ n k ∈ U for every k ≥ k 0 . Now since the rank of (Ŵ n k ) k≥0 strictly increases, we can find k ≥ k 0 such that rank(Ŵ n k ) > rank(Ŵ). This means thatŴ n k =Ŵ and soŴ n k ∈ A. Therefore,Ŵ n k / ∈ U which is a contradiction.
We conclude that every converging sequence in (DMC s,X , * ) must be rank-bounded. Now let (Ŵ n ) n≥0 be a rank-bounded sequence in DMC for every n ≥ n 0 . This implies thatŴ n ∈ O for every n ≥ n 0 . Therefore (Ŵ n ) n≥0 converges toŴ in (DMC For every n ≥ 1, Proposition 6 implies that U n (which is non-empty and strongly open) is rank-unbounded, so it cannot be contained in DMC X ,[n] , we have rank(Ŵ n ) > n for every n ≥ 1. Therefore, (Ŵ n ) n≥1 is rank-unbounded. Proposition 9 implies that (Ŵ n ) n≥1 does not converge in (DMC  s,X , * ) is Hausdorff, A is strongly closed. It remains to show that A is rank-bounded.
Assume on the contrary that A is rank-unbounded. We can construct a sequence (Ŵ n ) n≥0 in A where the rank is strictly increasing, i.e., rank(Ŵ n ) < rank(Ŵ n ) for every 0 ≤ n < n . Since the rank of (Ŵ n ) n≥0 is strictly increasing, every subsequence of (Ŵ n ) n≥0 is rank-unbounded. Proposition 9 implies that every subsequence of (Ŵ n ) n≥0 does not converge in (DMC s,X , * ). On the other hand, we have:

•
A is countably compact because it is compact.
• Since A is strongly closed and since (DMC s,X , * ) is Hausdorff. Now since every countably compact sequential Hausdorff space is sequentially compact [11], A must be sequentially compact. Therefore, (Ŵ n ) n≥0 has a converging subsequence which is a contradiction. We conclude that A must be rank-bounded.

Theorem 3 implies that DMC
(o) X ,[n] is metrizable for every n ≥ 1. One might ask whether the spaces DMC X , [n] . In this section, we will show that such metrics can be constructed.

Noisiness Metric on DMC
P c (p, W) can be interpreted as follows: let (U, X) be a pair of random variables distributed according to p, send X through the channel W, and let Y be the output of W in such a way that U − X − Y is a Markov chain. LetÛ be the estimate of U obtained by applying a random decoder D ∈ DMC Y, [m] . In this interpretation, p can be seen as a random encoder. The probability of correctly guessing U by using the decoder D is given by Therefore, P c (p, W) is the optimal probability of correctly guessing U from Y. Note that we can take the supremum in (2) over only deterministic channels D ∈ DMC Y,[m] because we can always choose an optimal decoder that is deterministic.
It is well known that if W is degraded from W , then P c (p, W) ≤ P c (p, W ) for every p ∈ ∆ [m]×X and every m ≥ 1. It was shown in [15] that the converse is also true. Therefore, W is equivalent to W if and only if P c (p, W) = P c (p, W ) for every p ∈ ∆ [m]×X and every m ≥ 1. This shows that the quantity X ,Y , we can define P c (p,Ŵ) := P c (p, W ) for any W ∈Ŵ.
Define the noisiness distance d X ,Y → R + as follows: It is easy to see that 0 ≤ d X ,Y . Moreover, we have: X ,Y (Ŵ 1 ,Ŵ 2 ) = 0, then P c (p,Ŵ 1 ) = P c (p,Ŵ 2 ) for every p ∈ ∆ [m]×X and every m ≥ 1, which implies that the channels inŴ 1 are equivalent to the channels in This shows that d X ,Y is called the noisiness metric because it compares the "noisiness" ofŴ 1 with that ofŴ 2 : if P c (p,Ŵ 1 ) is close to P c (p,Ŵ 2 ) for every random encoder p, thenŴ 1 andŴ 2 have close "noisiness levels".
A natural question to ask is whether the metric topology on DMC X ,Y is the same as the quotient topology T (o) X ,Y that we defined in Section 6.1. To answer this question, we need the following lemma. Lemma 6. For every W 1 , W 2 ∈ DMC X ,Y , we have: X ,Y ). We have: The reader might be wondering why we considered and studied the quotient topology T  On the other hand, the existence of a natural standard topology on DMC X ,Y makes the quotient topology the most natural starting point. • If one wants to show that a mapping f : DMC X ,Y ) to a topological space (S, V ), it is much easier to prove it through the quotient topology T (o) X ,Y rather than proving it directly using the metric d It is worth mentioning that in the proof of Proposition 11, the only topological property of (DMC X ,Y ) that we used is its compactness. This means that we do not need Lemma 3 to prove Theorem 3. An alternative proof of Theorem 3 would be to show the compactness and path-connectedness by inheriting those properties from DMC X ,Y , and then show that (DMC X ,Y ) as in Proposition 11. The main reason why we restricted ourselves to topological methods in Section 6.1 is because they might be useful if one wants to generalize our results to spaces of non-discrete channels. It might not be easy to find an explicit metric for those spaces, or even worse, those spaces might fail to be metrizable. Therefore, one might want to prove weaker topological properties such as being Hausdorff and/or regular. In such cases, the methods of Section 6.1 might be useful.  X , * as follows:

Noisiness Metric on DMC
It is easy to see that d

Topologies from Blackwell Measures
We saw in Section 8 that for everyŴ ∈ DMC X , * can be identified with its Blackwell measure. On the other hand, Proposition 1 shows that the collection of Blackwell measures of the channels with input alphabet X is the same as the collection of balanced and finitely supported meta-probability measures on X .
Therefore, the mappingŴ → MPŴ is a bijection from DMC (o) X , * to MP b f (X ). We call this mapping the canonical bijection from DMC Since ∆ X is a metric space, there are many standard ways to construct topologies on MP (X ). If we choose any of these standard topologies on MP (X ) and then relativize it to the subspace MP b f (X ), we can construct topologies on DMC (o) X , * through the canonical bijection. We saw in Section 3.4 that there are three topologies that can be constructed on MP (X ): the total variation topology, the strong convergence topology, and the weak- * topology. However, since every measure in MP b f (X ) is a finitely supported measure, strong convergence and total variation convergence are equivalent in MP b f (X ) (see Section 3.4). Therefore, it is sufficient to study the total-variation topology and the weak- * topology. We will start by studying the weak- * topology.

Weak- * Topology
We first note that in the case of binary input channels, the weak- * topology is equivalent to the topology induced by the convergence in distribution of D-densities (or L-densities, or G-densities) that was defined in [4]. Note also that the weak- * topology is equivalent to the topology that is induced by the Le Cam deficiency distance [7]. X , * . In this section, we show that the weak- * topology is the same as the noisiness topology T (o) X , * . We will show this using the Wasserstein metric.

Consider the topology on DMC
Since ∆ X is complete and separable, the 1 st -Wasserstein distance metrizes the weak- * topology [13]. Therefore, in order to show that the weak- * topology and the noisiness topology T X , * ) is a homeomorphism. Note that since ∆ X is compact, the metric space (MP (X ), W 1 ) is compact as well [13]. X , * (F can (MP), F can (MP )) ≤ |X | · W 1 (MP, MP ). This shows that the canonical bijection F can is continuous. Therefore, the weak- * topology is at least as strong as T (o) X , * . It remains to show that F −1 can is continuous. One approach to prove the continuity of F −1 can is to find a lower bound of d X , * (Ŵ,Ŵ ) in terms of the Wasserstein metric, but this is tedious. We will follow another approach in order to show that the canonical bijection F can is a homeomorphism. We need the following proposition: Proof. See Appendix G. TV,X , * (Ŵ n 1 ,Ŵ n 2 ) = MP n 1 − MP n 2 TV = 1 for every n 2 > n 1 ≥ 1. Therefore, no subsequence of (MP n ) n≥1 can converge. This means that DMC TV,X , * is metrizable, we conclude that DMC X , [2] would be compact, and this is not the case.
Since the noisiness topology is the same as the weak- * topology, T TV,X , * (Ŵ,Ŵ ) < . Let p, p , (p n ) n≥1 and (p n ) n≥1 be as in Proposition 13. For every n ≥ 1, define MP n ∈ MP (X ) as follows: Clearly, MP n is balanced and finitely supported, so MP n ∈ MP b f (X ). Moreover, X ,[n] for every n ≥ 1. We conclude that U is rank-unbounded. Note that the sequence (F can (MP n )) n≥1 in the proof of Proposition 14 is rank-unbounded and converges in total variation toŴ. On the other hand, Proposition 9 implies that (F can (MP n )) n≥1 does not converge in (DMC Proof. Let MP b,n (X ) be the set of balanced meta-probability measures whose support is of size at most n: TV,X , * ) is isometric to (MP b,n (X ), · TV ), and since (MP (X ), · TV ) is complete, it is sufficient to show that MP b,n (X ) is TV-closed in MP (X ).
Let MP be in the TV-closure of MP b,n (X ). Since we are working in a metric space, there exists a sequence (MP m ) m≥0 in MP b,n (X ) that TV-converges to MP. Assume that MP / ∈ MP b,n (X ). There exist p 1 , . . . , p n+1 ∈ ∆ X that are pairwise different and which satisfy MP(p i ) > 0 for every 1 ≤ i ≤ n + 1. Since (MP m ) m≥0 TV-converges to MP, there exists m 0 ≥ 0 such that MP m 0 (p i ) > 0 for every 1 ≤ i ≤ n + 1. This contradicts the fact MP m 0 ∈ MP b,n (X ). Therefore, MP ∈ MP b,n (X ) for every MP in the TV-closure of MP b,n (X ). This shows that MP b,n (X ) is TV-closed. Therefore, We conclude that a subset A of DMC

Discussion and Conclusions
The fact that the noisiness and weak- * topologies are the same gives us more freedom in proving theorems. Statements that can be hard to prove using the weak- * formulation might be easier to prove using the noisiness formulation.
The strong topology is too strong to be adopted as the "standard natural topology". However, it can still be useful because it is relatively easy to work with as it has a quotient formulation. Moreover, since it is finer than the noisiness/weak- * topology, many statements that are true for the strong topology are also true for coarser topologies, e.g., any sequence that converges in the strong topology also converges in the noisiness/weak- * topology.
Although the total variation topology is not natural, it can still be useful because it is finer than the noisiness/weak- * topology.
Many interesting questions remain open: Are all natural topologies Hausdorff? Can we find more topological properties that are common for all natural topologies? Is there a coarsest natural topology? Is there a natural topology that is coarser than the noisiness/weak- * one?
Finding meaningful measures on DMC (o) X , * might be challenging. One might be tempted to require that the measure of DMC  X , * will have a zero measure because it is a countable union of these subspaces. Nevertheless, statements such as "the property X is true for almost all channels" can still make sense. One possible definition of null-sets is as follows: for every set A in the natural Borel σ-algebra, we say that A is a null-set if and only if there exists n 0 ≥ 1 such that P n Proj −1 n (A ∩ DMC X ,[n] ) = 0. Another notion of equivalence is the Shannon-equivalence that allows randomization at both the input and the output, as well as shared randomness between the transmitter and the receiver [16]. The Shannon deficiency that was introduced in [17] compares a particular channel with the Shannon-equivalence-class of another channel, but it is not a metric distance between Shannon-equivalence-classes. In [18], we provide a characterization of the Shannon ordering and we prove that some of the results of this paper holds for the space of Shannon-equivalent channels.
In [19], we introduce the notions of input-degradedness and input-equivalence. A channel W is said to be input-degraded from another channel W if W can be simulated from W by local operations at the input. In [19], we provide a characterization of input-degradedness and and we prove that many of the results of this paper hold for the space of input-equivalent channels.
Acknowledgments: I would like to thank Emre Telatar and Mohammad Bazzi for helpful discussions. I am also grateful to Maxim Raginsky for his comments.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

BEC
Binary erasure channel BSC Binary symmetric channel TV Total variation DMC Discrete memoryless channel

Appendix A. Proof of Proposition 2
For every A ⊂ ∆ X , let co(A) be the convex hull of A. We say that p ∈ A is convex-extreme if it is an extreme point of co(A), i.e., for every p 1 , . . . , p n ∈ co(A) and every λ 1 , . . . , λ n > 0 satisfying λ i p i = p, we have p 1 = . . . = p n = p. It is easy to see that if A is finite, then the convex-extreme points of A coincide with the extreme points of co(A). We denote the set of convex-extreme points of A as CE(A).
Let W ∈ DMC X ,Y and W ∈ DMC X ,Z be such that W is degraded from W. There exists V ∈ DMC Y,Z such that W = V • W. Let X be a random variable uniformly distributed in X , let Y be the output of W when X is the input, and let Z be the output of V when Y is the input in such a way that X − Y − Z is a Markov chain. Clearly, P Z|X (z|x) = W (z|x) for every (x, z) ∈ X × Z.
For every z ∈ Z, we have: Note that for every (y, z) ∈ Im(W) × Im(W ), we have V −1 (y|z) = 0 if and only if V(z|y) = 0. For every (x, z) ∈ X × Im(W ), we have: where (a) follows from the fact that X − Y − Z is a Markov chain. Equation (A2) shows that for every z ∈ Im(W ), we have Now for every p ∈ ∆ X , define Similarly, Z p := {z ∈ Im(W ) : Let p ext ∈ CE(supp(MP W )) and let z ∈ Im(W ). Equation (A2) shows that if z ∈ Z p ext , then V −1 (y|z) = 0 for every y ∈ Im(W) \ Y p ext . Now since V −1 (y|z) = 0 ⇔ V(z|y) = 0 for every (y, z) ∈ Im(W) × Im(W ), we deduce that if z ∈ Z p ext then V(z|y) = 0 for every y ∈ Im(W) \ Y p ext . Therefore, where (a) follows from Equation (A1), and (b) follows from the fact that for every y ∈ Im(W) \ Y p ext , we have V(z|y) = 0. Now assume that W and W are equivalent. Equation (A3) (applied twice) implies that we must have co(supp(MP W )) = co(supp(MP W )) which implies that supp(MP W ) and supp(MP W ) have the same convex-extreme points. Now fix a convex-extreme point p ext ∈ CE(supp(MP W )) = CE(supp(MP W )). Equation (A4) (applied twice) implies that MP W (p ext ) = MP W (p ext ). By using Equation (A4) again we obtain: However, P o W (y) > 0 for every y ∈ Y p ext . Therefore, for every z ∈ Im(W ) \ Z p ext and every y ∈ Y p ext , we must have V(z|y) = 0 (which implies that V −1 (y|z) = 0). We conclude that for every z ∈ Im(W ) \ Z p ext , we can rewrite Equations (A1) and (A2) as: We can now repeat the above argument but on supp(MP W ) \ {p ext } and supp(MP W ) \ {p ext } instead of supp(MP W ) and supp(MP W ). We deduce that co(supp(MP W ) \ {p ext }) = co(supp(MP W ) \ {p ext }) so supp(MP W ) \ {p ext } and supp(MP W ) \ {p ext } have the same convex-extreme points. We can also prove that MP W (p ext ) = MP W (p ext ) for every Notice that any point of supp(MP W ) (respectively supp(MP W )) becomes convex-extreme after removing a finite number of elements from supp(MP W ) (respectively supp(MP W )). Therefore, after inductively applying the above argument a finite number of times, we can deduce that supp(MP W ) = supp(MP W ) and MP W (p) = MP W (p) for every p ∈ supp(MP W ) = supp(MP W ), hence MP W = MP W . Now let W ∈ DMC X ,Y and W ∈ DMC X ,Z be any two channels satisfying MP W = MP W . We have supp(MP W ) = supp(MP W ), and for every p ∈ supp(MP W ) = supp(MP W ), we have Define the channel V ∈ DMC Y,Z as A simple calculation shows that ∑ z∈Z V(z|y) = 1 for every y ∈ Y, so V is a valid channel.
Notice that for every (y, z) ∈ Im(W) × Im(W ), we have: . Therefore, we can rewrite V as: For every z ∈ Z \ Im(W ), Equation (A1) implies that: where (a) follows from the fact that V(z|y) = 0 if y ∈ Im(W) and z / ∈ Im(W ). On the other hand, for every z ∈ Im(W ), Equation (A1) implies that: Therefore, P o W (z) = P o W (z) for every z ∈ Z, which implies that Im(W ) = Im(W ). Now define V −1 ∈ DMC Im(W ),Im(W) as . Equation (A2) implies that for every z ∈ Im(W ) = Im(W ), we have: where (a) and (b) follow from the fact that for every (y, z) ∈ Im(W) × Im(W ), we have V −1 (y|z) = 0 if and only if V(z|y) = 0. We conclude that P o W = P o W , and for every z ∈ Im(W ) = Im(W ), we have W −1 z = W −1 z . Therefore, W = W = V • W and so W is degraded from W. By exchanging the roles of W and W we get that W is also degraded from W , hence W and W are equivalent.
Since DMC X ,Y is a metric space and since A R is compact, Proj −1 (Proj(A)) = A R is closed in DMC X ,Y . On the other hand, we have Proj −1 DMC where π X ∈ ∆ X is the uniform probability distribution on X . A simple calculation shows that Notice that for y ∈ Im(W), since 0 < δ < 1, (1 − δ)W −1 y + δπ X lies inside the interior of the probability distribution simplex ∆ X . This means that for δ small enough, (1 − δ)W −1 y + δπ X + δ v y ∈ ∆ X for every y ∈ Im(W), and π X + δ v y ∈ ∆ X for every y / ∈ Im(W). For every 0 < δ < 1, choose δ := δ (δ) so that 0 < δ < δ and W −1 y ∈ ∆ X for every y ∈ [m]. It is easy to see that for δ small enough, W −1 y 1 = W −1 y 2 for every y 1 , y 2 ∈ [m] satisfying y 1 = y 2 . Define the channel W ∈ DMC X ,[m] as follows:  where (a) follows from the fact that U i ⊂ K i ⊂ U i+1 for every i ≥ 0, which means that the sequence (U i ) i≥1 is increasing.
For every i ≥ n, we have DMC where (a) follows from the fact that for every n ≥ 1 and every n ≥ 1, we have U n ∩ U n ⊂ U max{n,n } ∩ U max{n,n } because (U n ) n≥1 and (U n ) n≥1 are increasing. We conclude that (DMC s,X , * ) is normal.

Appendix G. Proof of Proposition 12
If |X | = 1, ∆ X consists of a single probability distribution and MP (X ) consists of a single meta-probability measure which is balanced and finitely supported, so MP (X ) = MP b (X ) = MP b f (X ). Now assume that |X | ≥ 2. We start by showing that MP b (X ) is weakly- * closed.
For every x ∈ X . Consider the mapping f x : ∆ X → R defined as f x (p) = p(x). Clearly, f x is bounded and continuous. Therefore, the mapping is continuous in the weak- * topology. Therefore, F −1 x 1 |X | is weakly- * closed. It is easy to see |X | . This proves that MP b (X ), which is the finite intersection of weakly- * closed sets, is weakly- * closed. It remains to show that MP b f (X ) is weakly- * dense in MP b (X ). We will show that for every > 0 and every MP ∈ MP b (X ), there exists MP ∈ MP b f (X ) such that W 1 (MP, MP ) < .
Fix 0 < < 1 and let MP ∈ MP b (X ) be any balanced meta-probability measure on X , i.e., for every x ∈ X we have ∆ X p(x)dMP(p) = 1 |X | .
Now fix x ∈ X . By the definition of the Lebesgue integral, there exists a finite partition {B x,i } 1≤i≤k x of ∆ X and a sequence of positive numbers (b x,i ) 1≤i≤k x such that for every 1 ≤ i ≤ k x , B x,i is a Borel set of ∆ X , b x,i ≤ p(x) for every p ∈ B x,i , and By applying the same reasoning on the function 1 − p(x) ≥ 0, we can find a finite partition {C x,i } 1≤i≤m x of ∆ X and a sequence of positive numbers (c x,i ) 1≤i≤m x such that for every 1 ≤ i ≤ m x , C x,i is a Borel set of ∆ X , c x,i ≥ p(x) for every p ∈ C x,i and Let d be the total variation distance on ∆ X , i.e., d(p, p ) = 1 2 p − p 1 . Since ∆ X is compact, it can be covered by a finite number of open balls of radius 4 , i.e., there exist h points p 1 , . . . , p h such that p ∈ ∆ X : d(p, p i ) < 4 . For every 1 ≤ i ≤ h, define the set