Abstract
We study the continuity of many channel parameters and operations under various topologies on the space of equivalent discrete memoryless channels (DMC). We show that mutual information, channel capacity, Bhattacharyya parameter, probability of error of a fixed code and optimal probability of error for a given code rate and block length are continuous under various DMC topologies. We also show that channel operations such as sums, products, interpolations and Arıkan-style transformations are continuous.
1. Introduction
This paper is an extended version of our paper that is published in the International Symposium on Information Theory 2017 (ISIT 2017) [1].
Let and be two finite sets, and let W be a fixed channel with input alphabet and output alphabet . It is well known that the input-output mutual information is continuous on the simplex of input probability distributions. Many other parameters that depend on the input probability distribution were shown to be continuous on the simplex in [2].
Polyanskiy studied in [3] the continuity of the Neyman–Pearson function for a binary hypothesis test that arises in the analysis of channel codes. He showed that for arbitrary input and output alphabets, this function is continuous in the input distribution in the total variation topology. He also showed that under some regularity assumptions, this function is continuous in the weak-∗ topology.
If and are finite sets, the space of channels with input alphabet and output alphabet can naturally be endowed with the topology of the Euclidean metric, or any other equivalent metric. It is well known that the channel capacity is continuous in this topology. If and are arbitrary, one can construct a topology on the space of channels using the weak-∗ topology on the output alphabet. It was shown in [4] that the capacity is lower semi-continuous in this topology.
The continuity results that are mentioned in the previous paragraph do not take into account “equivalence” between channels. Two channels are said to be equivalent if they are degraded from each other. This means that each channel can be simulated from the other by local operations at the receiver. Two channels that are degraded from each other are completely equivalent from an operational point of view: both channels have exactly the same probability of error under optimal decoding for any fixed code. Moreover, any sub-optimal decoder for one channel can be transformed to a sub-optimal decoder for the other channel with the same probability of error and essentially the same computational complexity. This is why it makes sense, from an information-theoretic point of view, to identify equivalent channels and consider them as one point in the space of “equivalent channels”.
In [5], equivalent binary-input channels were identified with their L-density (i.e., the density of log-likelihood ratios). The space of equivalent binary-input channels was endowed with the topology of convergence in distribution of L-densities. Since the symmetric capacity (the symmetric capacity is the input-output mutual information with uniformly-distributed input) and the Bhattacharyya parameter can be written as an integral of a continuous function with respect to the L-density [5], it immediately follows that these parameters are continuous in the L-density topology.
In [6], many topologies were constructed for the space of equivalent channels sharing a fixed input alphabet. In this paper, we study the continuity of many channel parameters and operations under these topologies. The continuity of channel parameters and operations might be helpful in the following two problems:
- If a parameter (such as the optimal probability of error of a given code) is difficult to compute for a channel W, one can approximate it by computing the same parameter for a sequence of channels that converges to W in some topology where the parameter is continuous.
- The study of the robustness of a communication system against the imperfect specification of the channel.
In Section 2, we introduce the preliminaries for this paper. In Section 3, we recall the main results of [6] that we need here. In Section 4, we introduce the channel parameters and operations that we investigate in this paper. In Section 5, we study the continuity of these parameters and operations in the quotient topology of the space of equivalent channels with fixed input and output alphabets. The continuity in the strong topology of the space of equivalent channels sharing the same input alphabet is studied in Section 6. Finally, the continuity in the noisiness/weak-∗ and the total variation topologies is studied in Section 7.
2. Preliminaries
We assume that the reader is familiar with the basic concepts of General Topology. The main concepts and theorems that we need can be found in the Preliminaries Section of [6].
2.1. Set-Theoretic Notations
For every integer , we denote the set as .
The set of mappings from a set A to a set B is denoted as .
Let A be a subset of B. The indicator mapping of A in B is defined as:
If the superset B is clear from the context, we simply write to denote the indicator mapping of A in B.
The power set of B is the set of subsets of B. Since every subset of B can be identified with its indicator mapping, we denote the power set of B as .
Let be a collection of arbitrary sets indexed by I. The disjoint union of is defined as . For every , the i-th-canonical injection is the mapping defined as . If no confusions can arise, we can identify with through the canonical injection. Therefore, we can see as a subset of for every .
Let R be an equivalence relation on a set T. For every , the set is the R-equivalence class of x. The collection of R-equivalence classes, which we denote as , forms a partition of T, and it is called the quotient space of T by R. The mapping defined as for every is the projection mapping onto .
2.2. Topological Notations
A topological space is said to be contractible to if there exists a continuous mapping such that and for every , where is endowed with the Euclidean topology. is strongly contractible to if we also have for every .
Intuitively, T is contractible if it can be “continuously shrinked” to a single point . If this “continuous shrinking” can be done without moving , T is strongly contractible.
Note that contractibility is a very strong notion of connectedness: every contractible space is path-connected and simply connected. Moreover, all its homotopy, homology and cohomology groups of order are zero.
Let be a collection of topological spaces indexed by I. The product topology on is denoted by . The disjoint union topology on is denoted by .
The following lemma is useful to show the continuity of many functions.
Lemma 1.
Let and be two compact topological spaces, and let be a continuous function on . For every and every , there exists a neighborhood of s such that for every , we have:
Proof.
See Appendix A. ☐
2.3. Quotient Topology
Let be a topological space, and let R be an equivalence relation on T. The quotient topology on is the finest topology that makes the projection mapping continuous. It is given by:
Lemma 2.
Let be a continuous mapping from to . If for every satisfying , then we can define a transcendent mapping such that for any . f is well defined on . Moreover, f is a continuous mapping from to .
Let and be two topological spaces, and let R be an equivalence relation on T. Consider the equivalence relation on defined as if and only if and . A natural question to ask is whether the canonical bijection between and is a homeomorphism. It turns out that this is not the case in general. The following theorem, which is widely used in Algebraic Topology, provides a sufficient condition:
Theorem 1.
[7] If is locally compact and Hausdorff, then the canonical bijection between , and is a homeomorphism.
Corollary 1.
Let and be two topological spaces, and let and be two equivalence relations on T and S, respectively. Define the equivalence relation R on as if and only if and . If and are locally compact and Hausdorff, then the canonical bijection between and is a homeomorphism.
Proof.
We just need to apply Theorem 1 twice. Define the equivalence relation on as follows: if and only if and . Since is locally compact and Hausdorff, Theorem 1 implies that the canonical bijection from to is a homeomorphism. Let us identify these two spaces through the canonical bijection.
Now, define the equivalence relation on as follows: if and only if and . Since is locally compact and Hausdorff, Theorem 1 implies that the canonical bijection from to is a homeomorphism.
Since we identified and through the canonical bijection (which is a homeomorphism), can be seen as an equivalence relation on . It is easy to see that the canonical bijection from to is a homeomorphism. We conclude that the canonical bijection from to is a homeomorphism. ☐
2.4. Measure-Theoretic Notations
If is a measurable space, we denote the set of probability measures on as . If the -algebra is known from the context, we simply write to denote the set of probability measures.
If and is a measurable singleton, we simply write to denote .
For every , the total variation distance between and is defined as:
- The push-forward probability measure:Let P be a probability measure on , and let be a measurable mapping from to another measurable space . The push-forward probability measure of P by f is the probability measure on defined as for every .A measurable mapping is integrable with respect to if and only if is integrable with respect to P. Moreover,The mapping from to is continuous if these spaces are endowed with the total variation topology:where (a) follows from Property 1 of [8].
- Probability measures on finite sets:We always endow finite sets with their finest -algebra, i.e., the power set. In this case, every probability measure is completely determined by its value on singletons, i.e., if P is a probability measure on a finite set , then for every , we have:If is a finite set, we denote the set of probability distributions on as . Note that is an -dimensional simplex in . We always endow with the total variation distance and its induced topology. For every , we have:
- Products of probability measures:We denote the product of two measurable spaces and as . If and , we denote the product of and as .If , and are endowed with the total variation topology, the mapping is a continuous mapping (see Appendix B).
- Borel sets and the support of a probability measure:Let be a Hausdorff topological space. The Borel -algebra of is the -algebra generated by . We denote the Borel -algebra of as . If the topology is known from the context, we simply write to denote the Borel -algebra. The sets in are called the Borel sets of T.
The support of a measure is the set of all points for which every neighborhood has a strictly positive measure:
If P is a probability measure on a Polish space, then .
2.5. Random Mappings
Let M and be two arbitrary sets, and let be a -algebra on . A random mapping from M to is a mapping R from M to . For every , can be interpreted as the probability distribution of the random output given that the input is x.
Let be a -algebra on M. We say that R is a measurable random mapping from to if the mapping defined as is measurable for every .
Note that this definition of measurability is consistent with the measurability of ordinary mappings: let f be a mapping from M to , and let be the random mapping defined as for every , where is a Dirac measure centered at . We have:
where (a) and (b) follow from the fact that is either one or zero, depending on whether or not.
Let P be a probability measure on , and let R be a measurable random mapping from to . The push-forward probability measure of P by R is the probability measure on defined as:
Note that this definition is consistent with the push-forward of ordinary mappings: if f and are as above, then for every , we have:
Proposition 1.
Let R be a measurable random mapping from to . If is a -measurable mapping, then the mapping is a measurable mapping from to . Moreover, for every , we have:
Proof.
See Appendix C. ☐
Corollary 2.
If is bounded and -measurable, then the mapping:
is bounded and Σ-measurable. Moreover, for every , we have:
Proof.
Write (where and ), and use the fact that every bounded measurable function is integrable over any probability distribution. ☐
Lemma 3.
For every measurable random mapping R from to , the push-forward mapping is continuous from to under the total variation topology.
Proof.
See Appendix D. ☐
Lemma 4.
Let be a Polish (This assumption can be dropped. We assumed that is Polish just to avoid working with Moore–Smith nets.) topology on M, and let be an arbitrary topology on . Let R be a measurable random mapping from to . Moreover, assume that R is a continuous mapping from to when the latter space is endowed with the weak-∗ topology. Under these assumptions, the push-forward mapping is continuous from to under the weak-∗ topology.
Proof.
See Appendix D. ☐
2.6. Meta-Probability Measures
Let be a finite set. A meta-probability measure on is a probability measure on the Borel sets of . It is called a meta-probability measure because it is a probability measure on the space of probability distributions on .
We denote the set of meta-probability measures on as . Clearly, .
A meta-probability measure MP on is said to be balanced if it satisfies:
where is the uniform probability distribution on .
We denote the set of all balanced meta-probability measures on as . The set of all balanced and finitely-supported meta-probability measures on is denoted as .
The following lemma is useful to show the continuity of functions defined on .
Lemma 5.
Let be a compact topological space, and let be a continuous function on . The mapping defined as:
is continuous, where is endowed with the weak-∗ topology.
Proof.
See Appendix E. ☐
Let f be a mapping from a finite set to another finite set . f induces a push-forward mapping taking probability distributions in to probability distributions in . is continuous because and are endowed with the total variation distance. in turn induces another push-forward mapping taking meta-probability measures in to meta-probability measures in . We denote this mapping as , and we call it the meta-push-forward mapping induced by f. Since is a continuous mapping from to , is a continuous mapping from to under both the weak-∗ and the total variation topologies.
Let and be two finite sets. Let be defined as . For every and , we define the tensor product of and as .
Note that since , and are endowed with the total variation topology, is a continuous mapping from to . Therefore, is a continuous mapping from to under both the weak-∗ and the total variation topologies. On the other hand, Appendix B and Appendix F imply that the mapping from to is continuous under both the weak-∗ and the total variation topologies. We conclude that the tensor product is continuous under both of these topologies.
3. The Space of Equivalent Channels
In this section, we summarize the main results of [6].
3.1. Space of Channels from to
A discrete memoryless channel W is a three-tuple where is a finite set that is called the input alphabet of W, is a finite set that is called the output alphabet of W and is a function satisfying .
For every , we denote as , which we interpret as the conditional probability of receiving y at the output, given that x is the input.
Let be the set of all channels having as the input alphabet and as the output alphabet.
For every , define the distance between W and as follows:
We always endow with the metric distance . This metric makes a compact path-connected metric space. The metric topology on that is induced by is denoted as .
3.2. Equivalence between Channels
Let and be two channels having the same input alphabet. We say that is degraded from W if there exists a channel such that:
W and are said to be equivalent if each one is degraded from the other.
Let and be the space of probability distributions on and , respectively. Define as for every . The image of W is the set of output-symbols having strictly positive probabilities:
For every , define as follows:
For every , we have . On the other hand, if and , we have . This shows that and the collection uniquely determine W.
The Blackwell measure (denoted ) of W is a meta-probability measure on defined as:
where is a Dirac measure centered at . In an earlier version of this work, I called the posterior meta-probability distribution of W. Maxim Raginsky thankfully brought to my attention the fact that is called the Blackwell measure.
It is known that a meta-probability measure MP on is the Blackwell measure of some discrete memoryless channels (DMC) with input alphabet if and only if it is balanced and finitely supported [9].
It is also known that two channels and are equivalent if and only if [9].
3.3. Space of Equivalent Channels from to
Let and be two finite sets. Define the equivalence relation on as follows:
The space of equivalent channels with input alphabet and output alphabet is the quotient of by the equivalence relation:
Quotient topology:
We define the topology on as the quotient topology . We always associate with the quotient topology .
We have shown in [6] that is a compact, path-connected and metrizable space.
If and are two finite sets of the same size, there exists a canonical homeomorphism between and [6]. This allows us to identify with , where and .
Moreover, for every , there exists a canonical subspace of that is homeomorphic to [6]. Therefore, we can consider as a compact subspace of .
Noisiness metric:
For every , let be the space of probability distributions on . Let be a finite set, and let . For every , define as follows:
The quantity depends only on the -equivalence class of W (see [6]). Therefore, if , we can define for any .
Define the noisiness distance as follows:
We have shown in [6] that is topologically equivalent to .
3.4. Space of Equivalent Channels with Input Alphabet
The space of channels with input alphabet is defined as:
We define the equivalence relation on as follows:
The space of equivalent channels with input alphabet is the quotient of by the equivalence relation:
For every and every , we identify the -equivalence class of W with the -equivalence class of it. This allows us to consider as a subspace of . Moreover,
Since any two equivalent channels have the same Blackwell measure, we can define the Blackwell measure of as for any . The rank of is the size of the support of its Blackwell measure:
We have:
A topology on is said to be natural if and only if it induces the quotient topology on for every .
Every natural topology is -compact, separable and path-connected [6]. On the other hand, if , a Hausdorff natural topology is not Baire, and it is not locally compact anywhere [6]. This implies that no natural topology can be completely metrized if .
Strong topology on :
We associate with the disjoint union topology . The space is disconnected, metrizable and -compact [6].
The strong topology on is the quotient of by :
We call open and closed sets in as strongly-open and strongly-closed sets, respectively. If A is a subset of , then A is strongly open if and only if is open in for every . Similarly, A is strongly closed if and only if is closed in for every .
We have shown in [6] that is the finest natural topology. The strong topology is sequential, compactly generated and [6]. On the other hand, if , the strong topology is not first-countable anywhere [6]; hence, it is not metrizable.
Noisiness metric:
Define the noisiness metric on as follows:
is well-defined because does not depend on as long as . We can also express as follows:
The metric topology on that is induced by is called the noisiness topology on , and it is denoted as . We have shown in [6] that is a natural topology that is strictly coarser than .
Topologies from Blackwell measures:
The mapping is a bijection from to . We call this mapping the canonical bijection from to .
Since is a metric space, there are many standard ways to construct topologies on . If we choose any of these standard topologies on and then relativize it to the subspace , we can construct topologies on through the canonical bijection.
In [6], we studied the weak-∗ and the total variation topologies. We showed that the weak-∗ topology is exactly the same as the noisiness topology.
The total-variation metric distance on is defined as:
The total-variation topology is the metric topology that is induced by on . We proved in [6] that if , we have:
- is not natural, nor Baire, hence it is not completely metrizable.
- is not locally compact anywhere.
4. Channel Parameters and Operations
4.1. Useful Parameters
Let be the space of probability distributions on . For every and every , define as the mutual information , where X is distributed as p and Y is the output of W when X is the input. The mutual information is computed using the natural logarithm. The capacity of W is defined as .
For every , the error probability of the MAP decoder of W under prior p is defined as:
Clearly, .
For every , define the Bhattacharyya parameter of W as:
It is easy to see that .
It was shown in [10,11] that , where is the uniform distribution on .
An -code on the alphabet is a subset of such that . The integer n is the block length of , and M is the size of the code. The rate of is , and it is measured in nats. The error probability of the ML decoder for the code when it is used for a channel is given by:
The optimal error probability of -codes for a channel W is given by:
The following proposition shows that all the above parameters are continuous:
Proposition 2.
We have:
- is continuous, concave in p and convex in W.
- is continuous and convex.
- is continuous, concave in p and concave in W.
- is continuous.
- For every code on , is continuous.
- For every and every , the mapping is continuous.
Proof.
These facts are well known, especially the continuity of I, its concavity in p and its convexity in W [12]. Since C is the supremum of a family of mappings that are convex in W, it is also convex in W. For a proof of the continuity of C, see Appendix G. The continuity of Z, and follows immediately from their definitions. Moreover, since is the minimum of a finite number of continuous mappings, it is continuous. The concavity of in p and in W can also be easily seen from the definition. ☐
4.2. Channel Operations
If and , we define the composition of W and V as follows:
For every function , define the deterministic channel as follows:
It is easy to see that if and , then .
For every two channels and , define the channel sum of and as:
arises when the transmitter has two channels and at its disposal, and it can use exactly one of them at each channel use. It is an easy exercise to check that (remember that we compute the mutual information using the natural logarithm).
We define the channel product of and as:
arises when the transmitter has two channels and at its disposal, and it uses both of them at each channel use. It is an easy exercise to check that , or equivalently . Channel sums and products were first introduced by Shannon in [13].
For every , and every , we define the -interpolation between and as:
Channel interpolation arises when a channel behaves as with probability and as with probability . The transmitter has no control on which behavior the channel chooses, but on the other hand, the receiver knows which one was chosen. Channel interpolations were used in [14] to construct interpolations between polar codes and Reed–Muller codes.
Now, fix a binary operation ∗ on . For every , define and as:
and:
These operations generalize Arıkan’s polarization transformations [15].
Proposition 3.
We have:
- The mapping from to is continuous.
- The mapping from to the space is continuous.
- The mapping from to is continuous.
- The mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
Proof.
The continuity immediately follows from the definitions. ☐
5. Continuity on
It is well known that the parameters defined in Section 4.1 depend only on the -equivalence class of W. Therefore, we can define those parameters for any through the transcendent mapping (defined in Lemma 2). The following proposition shows that those parameters are continuous on :
Proposition 4.
We have:
- is continuous and concave in p.
- is continuous.
- is continuous and concave in p.
- is continuous.
- For every code on , is continuous.
- For every and every , the mapping is continuous.
Proof.
Since the corresponding parameters are continuous on (Proposition 2), Lemma 2 implies that they are continuous on . The only cases that need a special treatment are those of I and Z. We will only prove the continuity of I since the proof of continuity of Z is similar.
Define the relation R on as:
It is easy to see that depends only on the R-equivalence class of . Since I is continuous on , Lemma 2 implies that the transcendent mapping of I is continuous on . On the other hand, since is locally compact, Theorem 1 implies that can be identified with , and the two spaces have the same topology. Therefore, I is continuous on . ☐
With the exception of channel composition, all the channel operations that were defined in Section 4.2 can also be “quotiented”. We just need to realize that the equivalence class of the resulting channel depends only on the equivalence classes of the channels that were used in the operation. Let us illustrate this in the case of channel sums:
Let and and assume that is degraded from and is degraded from . There exists and such that and . It is easy to see that , which shows that is degraded from . This was proven by Shannon in [16].
Therefore, if is equivalent to and is equivalent to , then is equivalent to . This allows us to define the channel sum for every and every as for any and any , where is the -equivalence class of .
With the exception of channel composition, we can “quotient” all the channel operations of Section 4.2 in a similar fashion. Moreover, we can show that they are continuous:
Proposition 5.
We have:
- The mapping from to is continuous.
- The mapping from to is continuous.
- The mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
Proof.
We only prove the continuity of the channel sum because the proof of continuity of the other operations is similar.
Let be the projection onto the -equivalence classes. Define the mapping as . Clearly, f is continuous.
Now, define the equivalence relation R on as:
The discussion before the proposition shows that depends only on the R-equivalence class of . Lemma 2 now shows that the transcendent map of f defined on is continuous.
Notice that can be identified with . Therefore, we can define f on through this identification. Moreover, since and are locally compact and Hausdorff, Corollary 1 implies that the canonical bijection between and is a homeomorphism.
Now, since the mapping f on is just the channel sum, we conclude that the mapping from to is continuous. ☐
6. Continuity in the Strong Topology
The following lemma provides a way to check whether a mapping defined on is continuous:
Lemma 6.
Let be an arbitrary topological space. A mapping is continuous on if and only if it is continuous on for every .
Proof.
☐
Since the channel parameters I, C, , Z, and are defined on for every (see Section 5), they are also defined on . The following proposition shows that those parameters are continuous in the strong topology:
Proposition 6.
Let be the standard topology on . We have:
- is continuous on and concave in p.
- is continuous on .
- is continuous on and concave in p.
- is continuous on .
- For every code on , is continuous on .
- For every and every , the mapping is continuous on .
Proof.
The continuity of and immediately follows from Proposition 4 and Lemma 6. Since the proofs of the continuity of I and Z are similar, we only prove the continuity for I.
Due to the distributivity of the product with respect to disjoint unions, we have:
and:
Therefore, is the disjoint union of the spaces . Moreover, I is continuous on for every . We conclude that I is continuous on .
Define the relation R on as follows: if and only if and . Since depends only on the R-equivalence class of , Lemma 2 shows that the transcendent map of I is a continuous mapping from to . On the other hand, since is locally compact and Hausdorff, Theorem 1 implies that can be identified with . Therefore, I is continuous on . ☐
It is also possible to extend the definition of all the channel operations that were defined in Section 5 to . Moreover, it is possible to show that many channel operations are continuous in the strong topology:
Proposition 7.
Assume that all equivalent channel spaces are endowed with the strong topology. We have:
- The mapping from to is continuous.
- The mapping from to is continuous.
- The mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
- For any binary operation ∗ on , the mapping from to is continuous.
Proof.
We only prove the continuity of the channel interpolation because the proof of the continuity of other operations is similar.
Let be the standard topology on . Due to the distributivity of the product with respect to disjoint unions, we have:
and:
Therefore, the space is the topological disjoint union of the spaces .
For every , let be the projection onto the -equivalence classes, and let be the canonical injection from to .
Define the mapping as:
where n is the unique integer satisfying . and are the and -equivalence classes of and , respectively.
Due to Proposition 3 and due to the continuity of and , the mapping f is continuous on for every . Therefore, f is continuous on .
Let be the equivalence relation defined on as follows: if and only if and . Furthermore, define the equivalence relation R on as follows: if and only if and .
Since depends only on the R-equivalence class of , Lemma 2 implies that the transcendent mapping of f is continuous on .
Since is Hausdorff and locally compact, Theorem 1 implies that the canonical bijection from to is a homeomorphism. On the other hand, since and are Hausdorff and locally compact, Corollary 1 implies that the canonical bijection from to is a homeomorphism. We conclude that the channel interpolation is continuous on . ☐
Corollary 3.
is strongly contractible to every point in .
Proof.
Fix . Define the mapping as . H is continuous by Proposition 7. We also have and for every . Moreover, for every . Therefore, is strongly contractible to every point in . ☐
The reader might be wondering why channel operations such as the channel sum were not shown to be continuous on the whole space instead of the smaller space . The reason is because we cannot apply Corollary 1 to and since neither , nor is locally compact (under the strong topology).
One potential method to show the continuity of the channel sum on is as follows: let R be the equivalence relation on defined as if and only if and . We can identify with through the canonical bijection. Using Lemma 2, it is easy to see that the mapping is continuous from to .
It was shown in [17] that the topology is homeomorphic to through the canonical bijection, where is the coarsest topology that is both compactly generated and finer than . Therefore, the mapping is continuous on . This means that if is compactly generated, we will have , and so, the channel sum will be continuous on . Note that although and are compactly generated, their product might not be compactly generated.
7. Continuity in the Noisiness/Weak-∗ and the Total Variation Topologies
We need to express the channel parameters and operations in terms of the Blackwell measures.
7.1. Channel Parameters
The following proposition shows that many channel parameters can be expressed as an integral of a continuous function with respect to the Blackwell measure:
Proposition 8.
For every , we have:
where is the entropy of p and is the product measure on obtained by multiplying with itself n times. Note that we adopt the standard convention that .
Proof.
By choosing any representative channel and replacing by in the definitions of the channel parameters, all the above formulas immediately follow. Let us show how this works for :
where (a) is true because for . ☐
Proposition 9.
Let be the standard topology on . We have:
- is continuous on and concave in p.
- is continuous on .
- is continuous on and concave in p.
- is continuous on .
- For every code on , is continuous on .
- For every and every , the mapping is continuous on .
Proof.
We associate the space with the weak-∗ topology. Define the mapping:
as follows:
Lemma 5 implies that is continuous. On the other hand, Proposition 8 shows that . Therefore, I is continuous on . We can prove the continuity of and Z similarly.
Now, define the mapping as:
Fix , and let . Since is compact (under the weak-∗ topology), Lemma 1 implies the existence of a weakly-∗ open neighborhood of MP such that for every and every . Therefore, for every and every , we have:
hence,
Similarly, we can show that . This shows that for every . Therefore, is continuous. However, , so C is continuous on .
Now for every , define the mapping backward-recursively as follows:
- .
- For every , define:
Clearly is continuous. Now, let , and assume that is continuous. If we let , Lemma 5 implies that the mapping defined as:
is continuous. However, , so is also continuous. Therefore, is continuous. By noticing that , we conclude that is continuous on . Moreover, since is the minimum of a finite family of continuous mappings, it is continuous. ☐
It is worth mentioning that Proposition 6 can be shown from Proposition 9 because the noisiness topology is coarser than the strong topology.
Corollary 4.
All the mappings in Proposition 9 are also continuous if we replace the noisiness topology with the total variation topology .
Proof.
This is true because is finer than . ☐
7.2. Channel Operations
In the following, we show that we can express the channel operations in terms of Blackwell measures. We have all the tools to achieve this for the channel sum, channel product and channel interpolation. In order to express the channel polarization transformations in terms of the Blackwell measures, we need to introduce new definitions.
Let be a finite set, and let ∗ be a binary operation on a finite set . We say that ∗ is uniformity preserving if the mapping is a bijection from to itself [18]. For every , we denote the unique element satisfying as . Note that is a binary operation, and it is uniformity preserving. is called the right-inverse of ∗. It was shown in [11] that a binary operation is polarizing if and only if it is uniformity preserving and its inverse is strongly ergodic.
Binary operations that are not uniformity preserving are not interesting for polarization theory because they do not preserve the symmetric capacity [11]. Therefore, we will only focus on polarization transformations that are based on uniformity preserving operations.
Let ∗ be a fixed uniformity preserving operation on . Define the mapping as
The probability distribution can be interpreted as follows: let and be two independent random variables in that are distributed as and , respectively, and let be the random pair in defined as , or equivalently . is the probability distribution of .
Clearly, is continuous. Therefore, the push-forward mapping is continuous from to under both the weak-∗ and the total variation topologies (see Section 2.6). For every , we define the -convolution of and as:
Since the product of meta-probability measures is continuous under both the weak-∗ and the total variation topologies (Appendix B and Appendix F), the -convolution is also continuous under these topologies.
For every and every , define as:
The probability distribution can be interpreted as follows: if and are as above, is the conditional probability distribution of given .
Define the mapping as follows:
where is a Dirac measure centered at .
If and are as above, is the meta-probability measure that describes the possible conditional probability distributions of that are seen by someone having knowledge of . Clearly, is a random mapping from to . In Appendix H, we show that is a measurable random mapping. We also show in Appendix H that is a continuous mapping from to when the latter space is endowed with the weak-∗ topology. Lemmas 3 and 4 now imply that the push-forward mapping is continuous under both the weak-∗ and the total variation topologies.
For every , we define the -convolution of and as:
Since the product of meta-probability measures is continuous under both the weak-∗ and the total variation topologies (Appendix B and Appendix F), the -convolution is also continuous under these topologies.
Proposition 10.
We have:
- For every and , we have:where (respectively ) is the meta-push-forward of (respectively ) by the canonical injection from (respectively ) to .
- For every and , we have:
- For every and every , we have:
- For every uniformity preserving binary operation ∗ on , and every , we have:
- For every uniformity preserving binary operation ∗ on and every , we have:
Proof.
See Appendix I. ☐
Note that the polarization transformation formulas in Proposition 10 generalize the formulas given by Raginsky in [19] for binary-input channels.
Proposition 11.
Assume that all equivalent channel spaces are endowed with the noisiness/weak-∗ or the total variation topology. We have:
- The mapping from to is continuous.
- The mapping from to is continuous.
- The mapping from to is continuous.
- For every uniformity preserving binary operation ∗ on , the mapping from to is continuous.
- For every uniformity preserving binary operation ∗ on , the mapping from to is continuous.
Proof.
The proposition directly follows from Proposition 10 and the fact that all the meta-probability measure operations that are involved in the formulas are continuous under both the weak-∗ and the total variation topologies. ☐
Corollary 5.
Both and are strongly contractible to every point in .
Proof.
We can use the same proof of Corollary 3. ☐
8. Discussion and Conclusions
Section 5 and Section 6 show that the quotient topology is relatively easy to work with. If one is interested in the space of equivalent channels sharing the same input and output alphabets, then using the quotient formulation of the topology seems to be the easiest way to prove theorems.
The continuity of the channel sum and the channel product on the whole product space remains an open problem. As we mentioned in Section 6, it is sufficient to prove that the product topology is compactly generated.
Acknowledgments
I would like to thank Emre Telatar and Mohammad Bazzi for helpful discussions. I am also grateful to Maxim Raginsky for his comments.
Conflicts of Interest
The author declares no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| DMC | Discrete memoryless channel |
| TV | Total variation |
Appendix A. Proof of Lemma 1
Fix , and let . Since f is continuous, there exists a neighborhood of in such that for every , we have . Moreover, since products of open sets form a base for the product topology, there exists an open neighborhood of s in and an open neighborhood of t in T such that .
Since and are compact, the product space is also compact. On the other hand, we have , so is an open cover of . Therefore, there exist and such that .
Now, fix , and define . Since is the intersection of finitely many open sets containing s, is an open neighborhood of s in . Let and . Since , there exists such that . Since , we have , and so, . Therefore, , hence:
However, this is true for every . Therefore,
Appendix B. Continuity of the Product of Measures
For every subset A of and every , define . Similarly, for every , define . Let and . We have:
This shows that the product of measures is continuous under the total variation topology.
Appendix C. Proof of Proposition 1
Define the mapping as follows:
For every , define the mapping as follows:
Clearly, for every we have:
- for all .
- for all .
- .
Moreover, for every fixed , we have:
- is -measurable.
- takes values in .
For every , let . Since is -measurable, we have for every . Now, for every , define the mapping as follows:
Since the random mapping R is measurable and since , the mapping is -measurable for every . Therefore, is -measurable for every . Moreover, for every , we have:
where (a) follows from the monotone convergence theorem. We conclude that G is -measurable because it is the point-wise limit of -measurable functions. On the other hand, we have:
Therefore,
where (a) and (b) follow from the monotone convergence theorem.
Appendix D. Continuity of the Push-Forward by a Random Mapping
Let R be a measurable random mapping from to . Let . Define the signed measure , and let be the Jordan measure decomposition of . It is easy to see that . For every , we have:
where (a) follows from the fact that for every . We can similarly show that:
Therefore,
This shows that the push-forward mapping from to is continuous under the total variation topology. This concludes the proof of Lemma 3.
Now, assume that is a Polish topology on M and is an arbitrary topology on . Let R be measurable random mapping from to . Moreover, assume that R is a continuous mapping from to when the latter space is endowed with the weak-∗ topology. Let be a sequence of probability measures in that weakly-∗ converges to .
Let be a bounded and continuous mapping. Define the mapping as follows:
For every sequence converging to x in M, the sequence weakly-∗ converges to in because of the continuity of R. This implies that the sequence converges to . Since is a Polish topology (hence, metrizable and sequential [20]), this shows that G is a bounded and continuous mapping from to . Therefore, we have:
where (a) and (c) follow from Corollary 2, and (b) follows from the fact that weakly-∗ converges to P. This shows that weakly-∗ converges to . Now, since is Polish, the weak-∗ topology on is metrizable [21]; hence, it is sequential [20]. This shows that the push-forward mapping from to is continuous under the weak-∗ topology.
Appendix E. Proof of Lemma 5
For every , define the mapping as . Clearly is continuous for every . Therefore, the mapping defined as:
is continuous in the weak-∗ topology of .
Fix , and let . Since is continuous, there exists a weakly-∗ open neighborhood of MP such that for every . On the other hand, Lemma 1 implies the existence of an open neighborhood of s in such that for every , we have:
Clearly is an open neighborhood of in . For every , we have:
where (a) follows from the fact that is a meta-probability measure and for every . We conclude that F is continuous.
Appendix F. Weak-∗ Continuity of the Product of Meta-Probability Measures
Let and be two sequences that weakly-∗ converge to and in and , respectively. Let be a continuous and bounded mapping. Define the mapping as follows:
Fix . Since is continuous, Lemma 5 implies that F is continuous. Therefore, the mapping is continuous on , which implies that it is also bounded because is compact. Therefore,
because weakly-∗ converges to . This means that there exists such that for every , we have:
On the other hand, since F is continuous and since is compact under the weak-∗ topology [21], Lemma 1 implies the existence of a weakly-∗ open neighborhood of such that for every and every . Moreover, since weakly-∗ converges to , there exists such that for every .
Therefore, for every , we have:
where (a) follows from the fact for every . Therefore,
where (a) and (b) follow from Fubini’s theorem. We conclude that weakly-∗ converges to . Therefore, the product of meta-probability measures is weakly-∗ continuous.
Appendix G. Continuity of the Capacity
Since the mapping I is continuous and since the space is compact, the mapping I is uniformly continuous, i.e., for every , there exists such that for every , if and , then
Let be such that . For every , we have , so we must have . Therefore,
Therefore,
Similarly, we can show that . This implies that ; hence, C is continuous.
Appendix H. Measurability and Continuity of C+,∗
Let us first show that the random mapping is measurable. We need to show that the mapping is measurable for every , where:
For every , define the set:
Clearly, is open in (and so it is measurable). The mapping is defined on , and it is clearly continuous. Therefore, for every , is measurable. We have:
where (a) follows from the fact that if and only if and . This shows that is measurable for every . Therefore, is a measurable random mapping.
Let be a converging sequence to in . Since is continuous, we have for every . Therefore, for every , there exists such that for every , we have . Let . For every , we have . Therefore, for every continuous and bounded mapping , we have:
where (b) follows from the continuity of g and and the continuity of on for every . (a) follows from the fact that:
We conclude that the mapping is a continuous mapping from to when the latter space is endowed with the weak-∗ topology.
Appendix I. Proof of Proposition 10
Let and . Fix and , and let and be the output alphabets of and , respectively. We may assume without loss of generality that and .
Let . We have:
For every , we have:
On the other hand, for every , we have:
Therefore, , where is the canonical injection from to .
Similarly, for every , we have and , where is the canonical injection from to . For every , we have:
Therefore,
This shows the first formula of Proposition 10.
For every , we have:
For every , we have:
For every , we have:
Therefore,
This shows the second formula of Proposition 10.
Now, let and . Fix and , and let and be the output alphabets of and , respectively. We may assume without loss of generality that and . Let . If , then W is equivalent to and . If , then W is equivalent to and .
Assume now that . For every , we have:
For every , we have:
Similarly, for every , we have and . Therefore,
Therefore,
This shows the third formula of Proposition 10.
Now, let , and let ∗ be a uniformity preserving binary operation on . Fix , and let be the output alphabet of W. We may assume without loss of generality that .
Let be two independent random variables uniformly distributed in . Let and . Send and through two independent copies of W, and let and be the output, respectively.
For every , we have:
For every , we have:
For every , we have:
Therefore,
This shows the forth formula of Proposition 10.
For every , we have:
Therefore,
For every , we have:
For every , we have:
Therefore,
This shows the fifth and last formula of Proposition 10.
References
- Nasser, R. Continuity of Channel Parameters and Operations under Various DMC Topologies. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 3185–3189. [Google Scholar]
- Polyanskiy, Y.; Poor, H.V.; Verdu, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
- Polyanskiy, Y. Saddle Point in the Minimax Converse for Channel Coding. IEEE Trans. Inf. Theory 2013, 59, 2576–2595. [Google Scholar] [CrossRef]
- Schwarte, H. On weak convergence of probability measures, channel capacity and code error probabilities. IEEE Trans. Inf. Theory 1996, 42, 1549–1551. [Google Scholar] [CrossRef]
- Richardson, T.; Urbanke, R. Modern Coding Theory; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
- Nasser, R. Topological Structures on DMC spaces. arXiv, 2017; arXiv:1701.04467. [Google Scholar]
- Engelking, R. General Topology; Monografie Matematyczne: Warsaw, Poland, 1977. [Google Scholar]
- Schieler, C.; Cuff, P. The Henchman Problem: Measuring Secrecy by the Minimum Distortion in a List. IEEE Trans. Inf. Theory 2016, 62, 3436–3450. [Google Scholar] [CrossRef]
- Torgersen, E. Comparison of Statistical Experiments; Encyclopedia of Mathematics and its Applications, Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
- Şaşoğlu, E.; Telatar, E.; Arıkan, E. Polarization for Arbitrary Discrete Memoryless Channels. In Proceedings of the IEEE Information Theory Workshop, Taormina, Italy, 11–16 October 2009; pp. 144–148. [Google Scholar]
- Nasser, R. An Ergodic Theory of Binary Operations, Part II: Applications to Polarization. IEEE Trans. Inf. Theory 2017, 63, 1063–1083. [Google Scholar] [CrossRef]
- Cover, T.; Thomas, J. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Shannon, C. The zero error capacity of a noisy channel. IRE Trans. Inf. Theory 1956, 2, 8–19. [Google Scholar] [CrossRef]
- Mondelli, M.; Hassani, S.H.; Urbanke, R.L. From Polar to Reed-Muller Codes: A Technique to Improve the Finite-Length Performance. IEEE Trans. Inf. Theory 2014, 62, 3084–3091. [Google Scholar]
- Arıkan, E. Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels. IEEE Trans. Inf. Theory 2009, 55, 3051–3073. [Google Scholar] [CrossRef]
- Shannon, C. A Note on a Partial Ordering for Communication Channels. Inform. Contr. 1958, 1, 390–397. [Google Scholar] [CrossRef]
- Steenrod, N.E. A convenient category of topological spaces. Michigan Math. J. 1967, 14, 133–152. [Google Scholar] [CrossRef]
- Nasser, R. An Ergodic Theory of Binary Operations, Part I: Key Properties. IEEE Trans. Inf. Theory 2016, 62, 6931–6952. [Google Scholar] [CrossRef]
- Raginsky, M. Channel Polarization and Blackwell Measures. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 56–60. [Google Scholar]
- Franklin, S. Spaces in which sequences suffice. Fundam. Math. 1965, 57, 107–115. [Google Scholar] [CrossRef]
- Villani, C. Topics in Optimal Transportation; Graduate studies in mathematics, American Mathematical Society: Madison, WI, USA, 2003. [Google Scholar]
© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).