On the Non-Adaptive Zero-Error Capacity of the Discrete Memoryless Two-Way Channel

On the Non-Adaptive Zero-Error Capacity of the Discrete Memoryless Two-Way Channel^†

On the Non-Adaptive Zero-Error Capacity of the Discrete Memoryless Two-Way Channel †

On the Non-Adaptive Zero-Error Capacity of the Discrete Memoryless Two-Way Channel^†

Abstract

1. Introduction

2. Preliminaries

3. Outer Bounds

4. Inner Bounds

5. Certain Types of DM-TWC

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Abstract

1. Introduction

2. Preliminaries

3. Outer Bounds

4. Inner Bounds

5. Certain Types of DM-TWC

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Abstract

1. Introduction

2. Preliminaries

3. Outer Bounds

4. Inner Bounds

5. Certain Types of DM-TWC

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

2.1. Shannon Capacity of a Graph

2.2. Confusion Graphs of Channels

2.3. Dual Graph Homomorphisms

2.4. One-Shot Zero-Error Communication

2.5. Information-Theoretic Notations

3.1. Simple Bounds

3.2. An Improved Bound

3.3. An Outer Bound via Shannon Capacity of a Graph

4.1. Random Coding

4.2. Linear Codes

5.1. Probabilistic Refinement of the Shannon Capacity of a Graph

5.2. An Outer Bound via the Asymptotic Spectrum of Graphs

5.3. Explicit Constructions

2.1. Shannon Capacity of a Graph

2.2. Confusion Graphs of Channels

2.3. Dual Graph Homomorphisms

2.4. One-Shot Zero-Error Communication

2.5. Information-Theoretic Notations

3.1. Simple Bounds

3.2. An Improved Bound

3.3. An Outer Bound via Shannon Capacity of a Graph

4.1. Random Coding

4.2. Linear Codes

5.1. Probabilistic Refinement of the Shannon Capacity of a Graph

5.2. An Outer Bound via the Asymptotic Spectrum of Graphs

5.3. Explicit Constructions

Citations

Article Access Statistics

2.1. Shannon Capacity of a Graph

2.2. Confusion Graphs of Channels

2.3. Dual Graph Homomorphisms

2.4. One-Shot Zero-Error Communication

2.5. Information-Theoretic Notations

3.1. Simple Bounds

3.2. An Improved Bound

3.3. An Outer Bound via Shannon Capacity of a Graph

4.1. Random Coding

4.2. Linear Codes

5.1. Probabilistic Refinement of the Shannon Capacity of a Graph

5.2. An Outer Bound via the Asymptotic Spectrum of Graphs

5.3. Explicit Constructions

Citations

Article Access Statistics

Yujie Gu; Ofer Shayevitz

doi:10.3390/e23111518

and

¹

Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka 819-0395, Japan

²

Department of EE—Systems, Tel Aviv University, Tel Aviv 69978, Israel

^*

Authors to whom correspondence should be addressed.

^†

This work was presented in part at the 2019 IEEE International Symposium on Information Theory.

Entropy2021, 23(11), 1518;https://doi.org/10.3390/e23111518

This article belongs to the Special Issue Combinatorial Aspects of Shannon Theory

Version Notes

Order Reprints

We study the problem of communicating over a discrete memoryless two-way channel using non-adaptive schemes, under a zero probability of error criterion. We derive single-letter inner and outer bounds for the zero-error capacity region, based on random coding, linear programming, linear codes, and the asymptotic spectrum of graphs. Among others, we provide a single-letter outer bound based on a combination of Shannon’s vanishing-error capacity region and a two-way analogue of the linear programming bound for point-to-point channels, which, in contrast to the one-way case, is generally better than both. Moreover, we establish an outer bound for the zero-error capacity region of a two-way channel via the asymptotic spectrum of graphs, and show that this bound can be achieved in certain cases.

Keywords:

zero-error capacity; two-way channel; Shannon capacity of a graph

The problem of reliable communication over a discrete memoryless two-way channel (DM-TWC) was originally introduced and investigated by Shannon [1] in a seminal paper that has marked the inception of multi-user information theory. A DM-TWC is characterized by a quadruple of finite input and output alphabets

X_{1}

,

X_{2}

,

Y_{1}

,

Y_{2}

, and a conditional probability distribution

P_{Y_{1}, Y_{2} | X_{1}, X_{2}} (y_{1}, y_{2} | x_{1}, x_{2})

, where

x_{1} \in X_{1}

,

x_{2} \in X_{2}

,

y_{1} \in Y_{1}

,

y_{2} \in Y_{2}

. The channel is memoryless in the sense that channel uses are independent, that is, for any i,

\begin{matrix} P_{Y_{1 i}, Y_{2 i} | X_{1}^{i}, X_{2}^{i}, Y_{1}^{i - 1}, Y_{2}^{i - 1}} (y_{1 i}, y_{2 i} | x_{1}^{i}, x_{2}^{i}, y_{1}^{i - 1}, y_{2}^{i - 1}) = P_{Y_{1}, Y_{2} | X_{1}, X_{2}} (y_{1 i}, y_{2 i} | x_{1 i}, x_{2 i}) . \end{matrix}

In [1], Shannon provided inner and outer bounds for the vanishing-error capacity region of the DM-TWC, in the general setting where the users are allowed to adapt their transmissions on the fly based on past observations. We note that Shannon’s inner bound is tight for non-adaptive schemes, namely when the users map their messages to codewords in advance. The non-adaptive DM-TWC is also sometimes called the restricted DM-TWC [2]. Shannon’s inner and outer bounds have later been improved by utilizing auxiliary random variable techniques [3,4,5], and sufficient conditions under which his bounds coincide have been obtained [6,7]. However, despite much effort, the capacity region of a general DM-TWC under the vanishing-error criterion remains elusive. In fact, a strong indicator for the inherent difficulty of the problem can be observed in Blackwell’s binary multiplying channel, a simple, deterministic, common-output channel whose capacity remains unknown hitherto [4,5,8,9,10].

In yet another seminal work, Shannon proposed and studied the zero-error capacity of the point-to-point discrete memoryless channel [2], also known as the Shannon capacity of a graph. This problem has been extensively studied by others, most notably in [11,12], yet remains generally unsolved. In this paper, we consider the problem of zero-error communication over a DM-TWC. We limit our discussion to the case of non-adaptive schemes, for which the capacity region is known in the vanishing-error case [1]. Despite the obvious difficulty of the problem (the point-to-point zero-error capacity is a special case), its two-way nature adds a new combinatorial dimension that renders it interesting to study. To the best of our knowledge, this problem has not been addressed before, except in the special case of the binary multiplying channel, where upper and lower bounds on non-adaptive zero-error sum capacity have been obtained [13,14,15]. Our bounds are partially based on generalizations of these ideas and an earlier short version [16].

The problem of non-adaptive communication over a DM-TWC can be formulated as follows. Alice and Bob would like to simultaneously convey messages

m_{1} \in [2^{n R_{1}}]

and

m_{2} \in [2^{n R_{2}}]

respectively to each other, over n uses of the DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

. To that end, Alice maps her message to an input sequence (codeword)

x_{1}^{n} \in X_{1}^{n}

using an encoding function

f_{1} : [2^{n R_{1}}] \to X_{1}^{n}

, and Bob maps his message into an input sequence (codeword)

x_{2}^{n} \in X_{2}^{n}

using an encoding function

f_{2} : [2^{n R_{2}}] \to X_{2}^{n}

. We call the pair of codeword collections

(f_{1} ([2^{n R_{1}}]), f_{2} ([2^{n R_{2}}]))

a codebook pair. Note that the encoding functions depend only on the messages, and not on the observed outputs during the transmission, hence the name non-adaptive. When transmissions end, Alice and Bob observe the resulting (random) channel outputs

Y_{1}^{n} \in Y_{1}^{n}

and

Y_{2}^{n} \in Y_{2}^{n}

respectively, and attempt to decode the message sent by their counterpart, without error. When this is possible, that is, when there exist decoding functions

ϕ_{1} : [2^{n R_{1}}] \times Y_{1}^{n} \to [2^{n R_{2}}]

and

ϕ_{2} : [2^{n R_{2}}] \times Y_{2}^{n} \to [2^{n R_{1}}]

such that

m_{2} = ϕ_{1} (m_{1}, Y_{1}^{n})

and

m_{1} = ϕ_{2} (m_{2}, Y_{2}^{n})

, for all

m_{1}, m_{2}

, with probability one, then the codebook pair (or the encoding functions) is called

(n, R_{1}, R_{2})

uniquely decodable. A rate pair

(R_{1}, R_{2})

is achievable for the DM-TWC if an

(n, R_{1}, R_{2})

uniquely decodable code exists for some n. The non-adaptive zero-error capacity region of a DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

is the closure of the set of all achievable rate pairs, and is denoted here by

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

. Moreover, the non-adaptive zero-error sum-capacity of a DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

, denoted by

C_{z e}^{s u m} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

, is the supremum of the sum-rate

R_{1} + R_{2}

taken over all achievable rate pairs.

The main objective of this paper is to provide several single-letter outer and inner bounds on the non-adaptive zero-error capacity region of the DM-TWC. The remainder of this paper is organized as follows. In Section 2, we provide some necessary mathematical preliminaries, discussing in particular the characterization of zero-error DM-TWC capacity via confusion graphs, behavior under graph homomorphisms, and one-shot zero-error communication. Section 3 is devoted to three general outer bounds of the zero-error capacity region of DM-TWC, which are based on Shannon’s vanishing-error non-adaptive capacity region, a two-way analogue of the linear programming bound for point-to-point channels, and the Shannon capacity of a graph. In Section 4, we provide two general inner bounds using random coding and random linear codes respectively. In Section 5, we establish outer bounds for certain types of DM-TWC via the asymptotic spectra of graphs, and also explicitly construct the uniquely decodable codebook pairs achieving the outer bound. Some concluding remarks appear in Section 6.

Let

G = (V, E)

be a graph with vertex set V and edge set E. Two vertices

v_{1}, v_{2}

are adjacent, denoted as

v_{1} \sim v_{2}

, if there is an edge between

v_{1}

and

v_{2}

, that is,

{v_{1}, v_{2}} \in E

. An independent set in G is a subset of pairwise non-adjacent vertices. A maximum independent set is an independent set with the largest possible number of vertices. The size of a maximum independent set in G is called the independence number of G, denoted by

α (G)

. The complement of a graph G, denoted by

\bar{G}

, is a graph with the same vertex set, where two distinct vertices of

\bar{G}

are adjacent if and only if they are not adjacent in G. We write

K_{n}

and

{\bar{K}}_{n}

for the complete graph (containing all possible edges) and the empty graph (containing no edges) over n vertices, respectively.

Let

G = (V (G), E (G))

and

H = (V (H), E (H))

be two graphs. The strong product (or normal product)

G ⊠ H

of the graphs G and H is a graph such that

(1): the vertex set of $G ⊠ H$ is the Cartesian product $V (G) \times V (H)$ ;
(2): two vertices $(u, u^{'})$ and $(v, v^{'})$ are adjacent if and only if one of the followings holds: (a) $u = v$ and $u^{'} \sim v^{'}$ ; (b) $u \sim v$ and $u^{'} = v^{'}$ ; (c) $u \sim v$ and $u^{'} \sim v^{'}$ .

The n-fold strong product of graph G with itself is denoted as

G^{n}

. The Shannon capacity of graph G was defined in [2] to be:

Θ (G) ≜ sup_{n} \frac{1}{n} log α (G^{n}) = lim_{n \to \infty} \frac{1}{n} log α (G^{n}),

where the limit exists by Fekete’s lemma. We note that throughout the paper all logarithms are taken to base 2.

The disjoint union

G ⊔ H

of the graphs G and H is a graph such that

V (G ⊔ H) = V (G) ⊔ V (H)

and

E (G ⊔ H) = E (G) ⊔ E (H)

. A graph homomorphism from G to H, denoted by

G \to H

, is a mapping

φ : V (G) \to V (H)

such that if

g_{1} \sim g_{2}

in G, then

φ (g_{1}) \sim φ (g_{2})

in H. We write

G ≼ H

if there exists a graph homomorphism

\bar{G} \to \bar{H}

from the complement of G to the complement of H.

In [17], Zuiddam introduced the asymptotic spectrum of graphs notion, and provided a dual characterisation of the Shannon capacity of a graph by applying Strassen’s theory of asymptotic spectra, which includes the Lovász theta number [12], the fractional clique cover number, the complement of the fractional orthogonal rank [18], and the fractional Haemers’ bound over any field [11,19,20] as specific elements of the asymptotic spectrum (also called spectral points).

Theorem 1

([17]). Let

G

be a collection of graphs that is closed under the disjoint union ⊔ and the strong product ⊠, and also contains the graph with a single vertex

K_{1}

. Define the asymptotic spectrum

Δ (G)

as the set of all mappings

η : G \to R_{\geq 0}

such that for all

G, H \in G

:

(1): if $G ≼ H$ , then $η (G) \leq η (H)$ ;
(2): $η (G ⊔ H) = η (G) + η (H)$ ;
(3): $η (G ⊠ H) = η (G) η (H)$ ;
(4): $η (K_{1}) = 1$ .

Then,

Θ (G) = inf_{η \in Δ (G)} log η (G)

. In other words,

inf_{η \in Δ (G)} η (G) = 2^{Θ (G)}

and

α (G) \leq inf_{η \in Δ (G)} η (G)

.

As remarked in [17],

2^{Θ (G)}

is in general not an element of

Δ (G)

. In fact,

2^{Θ (G)}

is not additive under ⊔ by a result of Alon [21], and also not multiplicative under ⊠ by a result of Haemers [11]. In Section 3.3, to derive an outer bound for zero-error capacity of a DM-TWC, we will employ the multiplicativity of

η (G)

for

η \in Δ (G)

under the ⊠ operation.

In this subsection, we characterize the zero-error capacity of a discrete memoryless point-to-point channel, as well as the zero-error capacity region of a DM-TWC, in terms of suitably defined graphs. The point-to-point characterization is well known and goes back to Shannon [2], and the DM-TWC case is a natural generalization thereof.

A discrete memoryless point-to-point channel consists of a finite input alphabet

X

, a finite output alphabet

Y

, and a conditional probability distribution

P_{Y | X} (y | x)

, where

x \in X

,

y \in Y

. The channel is memoryless in the sense that

P_{Y_{i} | X^{i}, Y^{i - 1}} (y_{i} | x^{i},

y^{i - 1}) = P_{Y | X} (y_{i} | x_{i})

for the ith channel use. Suppose that a transmitter would like to convey a message

m \in [2^{n R}]

to a receiver over the channel. To that end, the transmitter sends an input sequence

x^{n} \in X^{n}

using an encoding function

f : [2^{n R}] \to X^{n}

, and the receiver, after observing the corresponding channel outputs

y^{n} \in Y^{n}

, guesses the message using a decoding function

ϕ : Y^{n} \to [2^{n R}]

. This pair

(f, ϕ)

is called an

(n, R)

code, and such a code is called uniquely decodable if

m = ϕ (y^{n})

holds for any

m \in [2^{n R}]

and any correspondingly possible

y^{n}

. A rate R is called achievable if an

(n, R)

uniquely decodable code exists for some n. The zero-error capacity of the channel is defined as the supremum of all achievable rates.

A channel

P_{Y | X}

is associated with a confusion graph G, whose vertex set is the input alphabet

X

, and two vertices

x, x^{'} \in X

are adjacent, denoted as

x \sim x^{'}

, if and only if there exists

y \in Y

that is possible under both of them, that is, such that

P_{Y | X} (y | x) > 0

and

P_{Y | X} (y | x^{'}) > 0

. It is easy to verify that

C

is an

(n, R)

uniquely decodable code if and only if

C

is an independent set of the graph

G^{n}

, the n-fold strong product of graph G. Consequently, the zero-error capacity of a point-to-point channel is equal to the Shannon capacity of its confusion graph G. Note that there are infinitely many distinct channels with the same confusion graph, and all of these channels have the same zero-error capacity.

We now proceed to similarly associate a DM-TWC with a collection of confusion graphs, which would then be shown to characterize its zero-error capacity region. To that end, note that when Alice sends a letter

x_{1} \in X_{1}

, the resulting channel from Bob back to Alice at that same instant is the point-to-point channel

P_{Y_{1} | X_{1} = x_{1}, X_{2}}

. This channel is associated with a confusion graph

G_{x_{1}}

, whose vertex set is

X_{2}

and where two vertices

x_{2}, x_{2}^{'} \in X_{2}

are adjacent, denoted in this case by

x_{2} \overset{x_{1}}{\sim} x_{2}^{'}

, if and only if there exists some

y_{1} \in Y_{1}

such that both

P_{Y_{1} | X_{1}, X_{2}} (y_{1} | x_{1}, x_{2}) > 0, P_{Y_{1} | X_{1}, X_{2}} (y_{1} | x_{1}, x_{2}^{'}) > 0,

where

P_{Y_{1} | X_{1}, X_{2}} (y_{1} | x_{1}, x_{2}) ≜ \sum_{y_{2} \in Y_{2}} P_{Y_{1}, Y_{2} | X_{1}, X_{2}} (y_{1}, y_{2} | x_{1}, x_{2}) .

Symmetrically, when Bob sends a letter

x_{2} \in X_{2}

, the resulting channel from Alice to Bob at that same instant is associated with a confusion graph

H_{x_{2}}

, whose vertex set is

X_{1}

, and where two vertices

x_{1}, x_{1}^{'} \in X_{1}

are adjacent, denoted in this case by

x_{1} \overset{x_{2}}{\sim} x_{1}^{'}

, if and only if there exists some

y_{2} \in Y_{2}

such that both:

P_{Y_{2} | X_{1}, X_{2}} (y_{2} | x_{1}, x_{2}) > 0, P_{Y_{2} | X_{1}, X_{2}} (y_{2} | x_{1}^{'}, x_{2}) > 0,

where

P_{Y_{2} | X_{1}, X_{2}} (y_{2} | x_{1}, x_{2}) ≜ \sum_{y_{1} \in Y_{1}} P_{Y_{1}, Y_{2} | X_{1}, X_{2}} (y_{1}, y_{2} | x_{1}, x_{2}) .

Based on the foregoing discussion, a DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

can be decomposed into a collection of discrete memoryless point-to-point channels, and hence is associated with a corresponding collection of confusion graphs, denoted by

[G_{1}, \dots, G_{| X_{1} |}; H_{1},

\dots, H_{| X_{2} |}]

, where

V (G_{1}) = \dots = V (G_{| X_{1} |}) = X_{2}

and

V (H_{1}) = \dots = V (H_{| X_{2} |}) = X_{1}

. The following useful observation is immediate, and in particular shows that the zero-error capacity region of a DM-TWC is a function of its confusion graphs only. Thus, from here and on, we will sometimes identify the channel with its collection of confusion graphs.

Proposition 1.

Consider a DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

, associated with the collection of confusion graphs

[G_{1}, \dots, G_{| X_{1} |};

H_{1}, \dots,

H_{| X_{2} |}]

. A codebook pair

(A, B)

is uniquely decodable for

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

if and only if for any

a^{n} = (a_{1}, \dots, a_{n}) \in A

and

b^{n} = (b_{1}, \dots, b_{n}) \in B

, it holds that

B

is an independent set of

G_{a_{1}} ⊠ \dots ⊠ G_{a_{n}}

, and

A

is an independent set of

H_{b_{1}} ⊠ \dots ⊠ H_{b_{n}}

.

In particular, we see that the capacity region

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

depends only on the corresponding confusion graphs

[G_{1}, \dots, G_{| X_{1} |}; H_{1},

\dots, H_{| X_{2} |}]

. Hence, in the sequel, we will write

C_{z e} ([G_{1}, \dots, G_{| X_{1} |}; H_{1}, \dots, H_{| X_{2} |}])

and

C_{z e}^{s u m} ([G_{1}, \dots, G_{| X_{1} |}; H_{1},

\dots, H_{| X_{2} |}])

to represent

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

and

C_{z e}^{s u m} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

, respectively. We will also often identify the channel with its confusion graphs, and refer to it as

[{G_{i}}; {H_{j}}]

, when it is clear from the context. This also leads to the following immediate observation, analogues to the point-to-point case.

Proposition 2.

If

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

and

Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}

have the same confusion graphs up to some relabeling on input symbols, then

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}}) = C_{z e} (Q_{Y_{1}, Y_{2} | X_{1}, X_{2}})

.

This further immediately implies:

Proposition 3.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

depends only on the conditional marginal distributions

P_{Y_{1} | X_{1}, X_{2}}

and

P_{Y_{2} | X_{1}, X_{2}}

.

The strong product of two DM-TWCs

[G_{1}, \dots, G_{| X_{1} |}; H_{1}, \dots, H_{| X_{2} |}]

and

[G_{1}^{'}, \dots,

G_{| X_{1}^{'} |}^{'};

H_{1}^{'}, \dots, H_{| X_{2}^{'} |}^{'}]

, denoted by

[{G_{i}}; {H_{j}}] ⊠ [{G_{i}^{'}}; {H_{j}^{'}}]

, refers to a DM-TWC having input alphabets

X_{1} \times X_{1}^{'}

and

X_{2} \times X_{2}^{'}

, as well as confusion graphs

[{G_{i} ⊠ G_{i^{'}}^{'} : i \in X_{1}, i^{'} \in X_{1}^{'}}; {H_{j} ⊠ H_{j^{'}}^{'} : j \in X_{2}, j^{'} \in X_{2}^{'}}] .

Considering the zero-error sum-capacity with respect to the strong product, we have the lemma below.

Lemma 1.

C_{z e}^{s u m} ([{G_{i}}; {H_{j}}] ⊠ [{G_{i}^{'}}; {H_{j}^{'}}]) \geq C_{z e}^{s u m} ([{G_{i}}; {H_{j}}]) + C_{z e}^{s u m} ([{G_{i}^{'}}; {H_{j}^{'}}]) .

Proof.

To prove this lemma, it is sufficient to prove that, for any

(n, R_{1}, R_{2})

(resp.

(n, R_{1}^{'},

R_{2}^{'})

) uniquely decodable codebook pair

(A, B)

(resp.

(A^{'}, B^{'})

) for channel

[{G_{i}}; {H_{j}}]

(resp.

[{G_{i}^{'}}; {H_{j}^{'}}]

), there exists an

(n, R_{1} + R_{1}^{'}, R_{2} + R_{2}^{'})

uniquely decodable codebook pair for the associated product channel

[{G_{i}}; {H_{j}}] ⊠ [{G_{i}^{'}}; {H_{j}^{'}}]

. To that end, let

\begin{matrix} A^{*} & = {((a_{1}, a_{1}^{'}), \dots, (a_{n}, a_{n}^{'})) : a^{n} \in A, a^{' n} \in A^{'}}, \\ B^{*} & = {((b_{1}, b_{1}^{'}), \dots, (b_{n}, b_{n}^{'})) : b^{n} \in B, b^{' n} \in B^{'}} . \end{matrix}

It is easy to verify that

(A^{*}, B^{*})

is uniquely decodable for the product channel. Moreover,

| A^{*} | = | A | | A^{'} | = 2^{n (R_{1} + R_{1}^{'})}

and

| B^{*} | = | B | | B^{'} | = 2^{n (R_{2} + R_{2}^{'})}

. The lemma follows. □

In this subsection we study the behavior of the zero-error capacity region of a DM-TWC under graph homomorphisms, generalizing a similar analysis from the point-to-point channel case [2]. Let

[{G_{i}}; {H_{j}}]

and

[{G_{i}^{'}}; {H_{j}^{'}}]

be two collections of confusion graphs corresponding to two DM-TWCs such that

V (G_{i}) = V (G)

,

V (H_{j}) = V (H)

,

V (G_{i}^{'}) = V (G^{'})

and

V (H_{j}^{'}) = V (H^{'})

. A dual graph homomorphism from

[{G_{i}}; {H_{j}}]

to

[{G_{i}^{'}}; {H_{j}^{'}}]

, denoted by

[{G_{i}}; {H_{j}}] \to [{G_{i}^{'}}; {H_{j}^{'}}]

, is a pair of mappings

(φ, ψ)

, where

φ : V (H) \to V (H^{'})

and

ψ : V (G) \to V (G^{'})

, such that

(1): if $v_{1} \sim v_{2}$ in $G_{i}$ , then $ψ (v_{1}) \sim ψ (v_{2})$ in $G_{φ (i)}^{'}$ ; and
(2): if $u_{1} \sim u_{2}$ in $H_{j}$ , then $φ (u_{1}) \sim φ (u_{2})$ in $H_{ψ (j)}^{'}$ .

It is easy to see that the dual graph homomorphism is a natural generalization of the standard graph homomorphism of two graphs, in the sense that they are both adjacency preserving. We write

[{G_{i}}; {H_{j}}] ⪯ [{G_{i}^{'}}; {H_{j}^{'}}]

if there exists a dual graph homomorphism from

[{{\bar{G}}_{i}}; {{\bar{H}}_{j}}]

to

[{{\bar{G}}_{i}^{'}}; {{\bar{H}}_{j}^{'}}]

. Then:

Lemma 2.

If

[{G_{i}}; {H_{j}}] ⪯ [{G_{i}^{'}}; {H_{j}^{'}}]

, and

{\bar{G}}_{i}

and

{\bar{H}}_{j}

do not have self-loops, then

C_{z e} ([{G_{i}}; {H_{j}}]) \subseteq C_{z e} ([{G_{i}^{'}}; {H_{j}^{'}}]) .

Proof.

Suppose

(φ, ψ) : [{{\bar{G}}_{i}}; {{\bar{H}}_{j}}] \to [{{\bar{G}}_{i}^{'}}; {{\bar{H}}_{j}^{'}}]

and

(A, B)

is a uniquely decodable codebook pair of length n for the DM-TWC

[{G_{i}}; {H_{j}}]

. Let

\begin{matrix} Φ (A) & = {φ (a^{n}) = (φ (a_{1}), \dots, φ (a_{n})) : a^{n} \in A}, \\ Ψ (B) & = {ψ (b^{n}) = (ψ (b_{1}), \dots, ψ (b_{n})) : b^{n} \in B} . \end{matrix}

We now show that

(Φ (A), Ψ (B))

is a uniquely decodable codebook pair for the DM-TWC

[{G_{i}^{'}}; {H_{j}^{'}}]

. To that end, it suffices to show that for any distinct

a^{n}, {\tilde{a}}^{n} \in A

and

b^{n}, {\tilde{b}}^{n} \in B

, we have

\begin{matrix} φ (a^{n}) & ≁ φ ({\tilde{a}}^{n}) in H_{ψ (b_{1})} ⊠ \dots ⊠ H_{ψ (b_{n})}, \\ ψ (b^{n}) & ≁ ψ ({\tilde{b}}^{n}) in G_{φ (a_{1})} ⊠ \dots ⊠ G_{φ (a_{n})} . \end{matrix}

(1)

Indeed, since

(A, B)

is a uniquely decodable codebook pair, there exist coordinates

i, j \in [n]

such that

a_{i} ≁ {\tilde{a}}_{i}

in

H_{b_{i}}

and

b_{j} ≁ {\tilde{b}}_{j}

in

G_{a_{j}}

. By the definition of

(φ, ψ)

, we have that

φ (a_{i}) ≁ φ ({\tilde{a}}_{i})

in

H_{ψ (b_{i})}

and

ψ (b_{j}) ≁ ψ ({\tilde{b}}_{j})

in

G_{φ (a_{j})}

, implying (1). It is also evident that

| Φ (A) | = | A |

and

| Ψ (B) | = | B |

. The lemma now follows by taking the union over all uniquely decodable codebook pairs

(A, B)

for

[{G_{i}}; {H_{j}}]

. □

In this subsection, we consider the problem of zero-error communication over a DM-TWC with only a single channel use by the two parties (i.e.,

n = 1

). We refer to the associated set of achievable rate pairs as the one-shot zero-error capacity region, and the associated sum-rate as the one-shot zero-error sum-capacity. Recall that the one-shot zero-error capacity of a point-to-point channel is simply the logarithm of the independence number of its confusion graph; this quantity yields a lower bound on the zero-error capacity of the channel, and also provides an infinite-letter expression for the capacity when evaluated over the product graph. It is therefore interesting to study the analogue of the independence number in the two-way case, which in particular would yield an inner bound on the zero-error capacity region of the DM-TWC. For simplicity of exposition, we will focus here on the one-shot zero-error sum-capacity only.

For convenience we define some notions first. Let

[{G_{i}}; {H_{j}}]

be a DM-TWC such that

V (G_{i}) = X_{2}

and

V (H_{j}) = X_{1}

. A pair

(S, T)

of subsets

S \subseteq X_{1}

and

T \subseteq X_{2}

is called a dual clique pair of the DM-TWC if

t \overset{s}{\sim} t^{'}

and

s \overset{t}{\sim} s^{'}

for any distinct

s, s^{'} \in S

and distinct

t, t^{'} \in T

, that is, S is a clique in each

H_{t}

for

t \in T

, and T is a clique in each

G_{s}

for

s \in S

. A pair

(S, T)

of subsets

S \subseteq X_{1}

and

T \subseteq X_{2}

is called a dual independent pair of the DM-TWC if T is an independent set of the graph

G_{s}

for each

s \in S

, and S is an independent set of the graph

H_{t}

for each

t \in T

. A maximum dual independent pair is a dual independent pair

(S, T)

with the largest possible product of sizes

| S | | T |

. This product is called the independence product of

[{G_{i}}; {H_{j}}]

, denoted by

π ({G_{i}}; {H_{j}})

. According to the definition, the one-shot zero-error sum-capacity of the DM-TWC is

log π ({G_{i}}; {H_{j}})

. It is also readily seen that if two channels have the same confusion graphs up to some relabeling of input symbols, then they have the same collections of dual clique pairs and dual independent pairs, and hence the same one-shot zero-error sum-capacity.

For two graphs

G_{1}

and

G_{2}

, let

G_{1} \cup G_{2}

be the union of

G_{1}

and

G_{2}

such that

V (G_{1} \cup G_{2}) = V (G_{1}) \cup V (G_{2})

and

E (G_{1} \cup G_{2}) = E (G_{1}) \cup E (G_{2})

. Notice that the graph disjoint union ⊔ in Section 2.1 is a special case of the union ∪, when the vertex sets of

G_{1}

and

G_{2}

are disjoint. For notation convenience, in the rest of this subsection we let

| X_{1} | = m_{1}

and

| X_{2} | = m_{2}

. The following simple observations are now in order.

Proposition 4.

Suppose

(S, T)

is a dual independent pair of

[G_{1}, \dots, G_{m_{1}}; H_{1}, \dots, H_{m_{2}}]

. Then:

(1): If $| S | = 1$ , then $| T | \leq max_{1 \leq i \leq m_{1}} α (G_{i})$ . The equality holds by taking $S = {s}$ and T be a maximum independent set of $G_{s}$ , where $s \in arg {max}_{1 \leq i \leq m_{1}} α (G_{i})$ .
(2): $| S | \leq min_{t \in T} α (H_{t})$ .
(3): S is an independent set of $⋃_{t \in T} H_{t}$ .

Proof.

The results follow directly from the definition of dual independent pairs. □

Lemma 3.

Let

[G_{1}, \dots, G_{m_{1}}; H_{1}, \dots, H_{m_{2}}]

be a DM-TWC and G, H be graphs such that

V (G) = X_{2}

,

V (H) = X_{1}

. Then:

(1): $max \{max_{1 \leq i \leq m_{1}} α (G_{i}), max_{1 \leq j \leq m_{2}} α (H_{j})\} \leq π (G_{1}, \dots, G_{m_{1}}; H_{1}, \dots, H_{m_{2}}) \leq max_{1 \leq i \leq m_{1}} α (G_{i}) \cdot max_{1 \leq j \leq m_{2}} α (H_{j}) .$
(2): $π (G, \dots, G; H, \dots, H) = α (G) α (H) .$
(3): $π ({\bar{K}}_{m_{2}}, G, \dots, G; {\bar{K}}_{m_{1}}, H, \dots, H) = max {α (G) α (H), m_{1}, m_{2}} .$
(4): $π (G_{1}, \dots, G_{m_{1}}; K_{m_{1}}, \dots, K_{m_{1}}) = max_{1 \leq i \leq m_{1}} α (G_{i})$ .

Proof.

(1) The lower bound follows from Proposition 4 (1) and the symmetry of S and T. From Proposition 4 (2), we have

\begin{matrix} | S | & \leq min_{t \in T} α (H_{t}) \leq max_{1 \leq j \leq m_{2}} α (H_{j}), \\ | T | & \leq min_{s \in S} α (G_{s}) \leq max_{1 \leq i \leq m_{1}} α (G_{i}), \end{matrix}

yielding the upper bound.

(2) From claim (1) above, we have

π (G, \dots, G; H, \dots, H) \leq α (G) α (H)

. The equality holds by taking S and T as the maximum independent sets of H and G respectively.

(3) From claims (1) and (2) above, we have

π ({\bar{K}}_{m_{2}}, G, \dots, G; {\bar{K}}_{m_{1}}, H, \dots, H) \geq max {α (G) α (H), m_{1}, m_{2}} .

On the other hand, suppose

(S, T)

is a dual independent pair. We have the following three cases: (i) If

| S | = 1

then by Proposition 4, claim (1), we have

| S | | T | \leq m_{2}

. (ii) If

| T | = 1

, similar to case (i), we have

| S | | T | \leq m_{1}

. (iii) If

| S | \geq 2

and

| T | \geq 2

, then by Proposition 4, claim (2), we obtain

| S | | T | \leq α (G) α (H)

. Thus

π ({\bar{K}}_{m_{2}}, G, \dots, G; {\bar{K}}_{m_{1}}, H, \dots, H) \leq max {α (G) α (H), m_{1}, m_{2}}

.

(4) is a direct consequence of claim (1) above. The lemma follows. □

By graph homomorphisms we immediately have:

Proposition 5.

If

[{G_{i}}; {H_{j}}] ⪯ [{G_{i}^{'}}; {H_{j}^{'}}]

, then

π ({G_{i}}; {H_{j}}) \leq π ({G_{i}^{'}}; {H_{j}^{'}}) .

Next, we shall provide an upper bound for

π ({G_{i}}; {H_{j}})

via a generalization of the Lovász theta number [12]. Let

Γ

be an arbitrary

(m_{1} + m_{2}) \times (m_{1} + m_{2})

positive semi-definite matrix (i.e.,

Γ ⪰ 0

), and

Γ_{i, j}

be its

(i, j)

th entry. Let J be an

m_{1} \times m_{2}

all-one matrix, and

I_{n}

be an

n \times n

identity matrix. For any matrices A and B, denote

⟨ A, B ⟩ = trace (A B)

and denote

A^{T}

as the transpose of matrix A. Now define

ρ ({G_{i}}, {H_{j}})

as

\begin{matrix} maximize & ⟨ (\begin{matrix} 0 & J \\ J^{T} & 0 \end{matrix}), Γ ⟩ \\ subject to & ⟨ (\begin{matrix} I_{m_{1}} & 0 \\ 0 & 0 \end{matrix}), Γ ⟩ = 1, \\ ⟨ (\begin{matrix} 0 & 0 \\ 0 & I_{m_{2}} \end{matrix}), Γ ⟩ = 1, \\ Γ_{i, j + m_{1}} \cdot Γ_{i, k + m_{1}} = 0, \forall i \in X_{1}, j, k \in X_{2}, j \neq k, j \sim k in G_{i} \\ Γ_{i + m_{1}, j} \cdot Γ_{i + m_{1}, k} = 0, \forall i \in X_{2}, j, k \in X_{1}, j \neq k, j \sim k in H_{i} \\ Γ_{i, k + m_{1}} \cdot Γ_{j, k + m_{1}} = 0, \forall i, j \in X_{1}, k \in X_{2}, i \neq j, i \sim j in H_{k} \\ Γ_{i + m_{1}, k} \cdot Γ_{j + m_{1}, k} = 0, \forall i, j \in X_{2}, k \in X_{1}, i \neq j, i \sim j in G_{k} \\ Γ ⪰ 0 . \end{matrix}

(2)

Lemma 4.

π ({G_{i}}, {H_{j}}) \leq {(\frac{1}{2} ρ ({G_{i}}, {H_{j}}))}^{2}

.

Proof.

Suppose

(S, T)

with

S \subseteq X_{1}

,

T \subseteq X_{2}

is a maximum dual independent pair such that

| S | | T | = π ({G_{i}}, {H_{j}})

. For a number m and a set S, denote

m + S = {m + s : s \in S}

. Let

Γ

be an

(m_{1} + m_{2}) \times (m_{1} + m_{2})

matrix such that

Γ_{i, j} = \{\begin{matrix} \frac{1}{| S |}, & if i \in S, j \in S \\ \frac{1}{\sqrt{| S | | T |}}, & if i \in S, j \in m_{1} + T, or i \in m_{1} + T, j \in S \\ \frac{1}{| T |}, & if i \in m_{1} + T, j \in m_{1} + T \\ 0, & otherwise . \end{matrix}

Notice that for any vector

x^{m_{1} + m_{2}} = (x_{1}, \dots, x_{m_{1} + m_{2}})

we have

x^{m_{1} + m_{2}} \cdot Γ \cdot {(x^{m_{1} + m_{2}})}^{T} = {(\frac{1}{\sqrt{| S |}} \sum_{i \in S} x_{i} + \frac{1}{\sqrt{| T |}} \sum_{j \in T} x_{m_{1} + j})}^{2} \geq 0 .

This shows that

Γ

is a positive semi-definite matrix satisfying the equality constraints in (2). Accordingly,

Γ

is a feasible solution for program (2) and

\begin{matrix} ρ ({G_{i}}, {H_{j}}) & \geq ⟨ (\begin{matrix} 0 & J \\ J^{T} & 0 \end{matrix}), Γ ⟩ \\ = 2 \sqrt{| S | | T |} \\ = 2 \sqrt{π ({G_{i}}, {H_{j}})}, \end{matrix}

implying the result. This completes the proof. □

We recall some standard information-theoretic quantities that will be used in the sequel. Let

X, Y

be two discrete random variables taking values from sets

X, Y

according to a joint probability distribution

P_{X Y}

. Let

P_{X}

denote the marginal probability distribution for X, where

P_{X} (x) = \sum_{y \in Y} P_{X Y} (x, y)

, and

P_{Y}

be the marginal probability distribution for Y similarly. The Shannon entropy of X is denoted by

H (X)

where

H (X) = - \sum_{x \in X} P_{X} (x) log P_{X} (x)

. In particular, the binary entropy function is written as

h (x) = - x log x - (1 - x) log (1 - x)

, where

0 \leq x \leq 1

. The conditional entropy of X given Y is written as

H (X | Y)

where

H (X | Y) = - P_{X Y} (x, y) log \frac{P_{X Y} (x, y)}{P_{Y} (y)}

. The mutual information between X and Y is

I (X; Y) = H (X) - H (X | Y)

. The conditional mutual information of

X, Y

given another random variable Z is

I (X; Y | Z) = H (X | Z) - H (X | Y, Z)

. The following basic properties will be used in the arguments afterwards.

Proposition 6.

(1)

H (X) \geq 0, I (X; Y) \geq 0

. (Non-negativity)

(2)

H (X | Y) \leq H (X)

,

I (X; Y | Z) \leq I (X; Y)

. (Conditioning reduces entropy)

(3)

H (X_{1}, X_{2}, \dots, X_{n}) = \sum_{i = 1}^{n} H (X_{i} | X_{1}, \dots, X_{i - 1})

. (Entropy chain rule)

In this section, we provide single-letter outer bounds for the non-adaptive zero-error capacity region of the DM-TWC. First in Section 3.1, we present two simple outer bounds, one based on Shannon’s vanishing-error non-adaptive capacity region and the other on a two-way analogue of the linear programming bound for point-to-point channels. Next in Section 3.2, we combine the two bounds given in Section 3.1 and obtain an outer bound that is generally better than both. Finally, in Section 3.3 we derive another single-letter outer bound via the asymptotic spectra of graphs.

It is trivial to see that Shannon’s vanishing-error non-adaptive capacity region of the DM-TWC ([1], Theorem 3) contains its zero-error counterpart. First recall Shannon’s bound in [1].

Lemma 5

([1]). The vanishing-error non-adaptive capacity region of a DM-TWC

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

is the convex hull of the set:

⋃_{P_{X_{1}}, P_{X_{2}}} {(R_{1}, R_{2}) : R_{1} \geq 0, R_{2} \geq 0, R_{1} = I (X_{1}; Y_{2} | X_{2}), R_{2} = I (X_{2}; Y_{1} | X_{1})}

where the union is taken over all product input probability distributions

P_{X_{1}} \times P_{X_{2}}

.

Together with Proposition 2, this immediately yields the following outer bound.

Lemma 6.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

is contained in

\begin{matrix} ⋂_{Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}} ⋂_{0 \leq λ \leq 1} {(R_{1}, & R_{2}) : R_{1} \geq 0, R_{2} \geq 0, λ R_{1} + (1 - λ) R_{2} \leq max_{P_{X_{1}}, P_{X_{2}}} ϵ (λ)}, \end{matrix}

(3)

where

ϵ (λ) ≜ λ I (X_{1}; Y_{2} | X_{2}) + (1 - λ) I (X_{2}; Y_{1} | X_{1}) .

(4)

The first intersection is taken over all DM-TWCs

Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}

with the same adjacency as

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

, and the maximum is taken over all product input probability distributions

P_{X_{1}} \times P_{X_{2}}

.

Remark 1.

The bound (3) can also be written in the standard form

\begin{matrix} ⋂_{Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}} ⋃_{P_{X_{1}}, P_{X_{2}}} {(R_{1}, R_{2}) : & R_{1} \geq 0, R_{2} \geq 0, \\ R_{1} \leq I (X_{1}; Y_{2} | X_{2}), \\ R_{2} \leq I (X_{2}; Y_{1} | X_{1})} . \end{matrix}

Here we prefer however to use the form (3), for ease of comparison with forthcoming bounds.

We now proceed to obtain a combinatorial outer bound. Recall that a dual clique pair of a DM-TWC is a pair

(S, T)

of subsets

S \subseteq X_{1}

and

T \subseteq X_{2}

such that

t \overset{s}{\sim} t^{'}

and

s \overset{t}{\sim} s^{'}

for any distinct

s, s^{'} \in S

and distinct

t, t^{'} \in T

. In the sequel, we adopt the convention that

0^{0} = 1

.

Lemma 7.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

is contained in:

⋂_{0 \leq λ \leq 1} {(R_{1}, R_{2}) : R_{1} \geq 0, R_{2} \geq 0, λ R_{1} + (1 - λ) R_{2} \leq max_{P_{X_{1}}, P_{X_{2}}} - log l (λ)},

(5)

where

l (λ) ≜ max_{S, T} {(\sum_{x_{1} \in S} P_{X_{1}} (x_{1}))}^{λ} {(\sum_{x_{2} \in T} P_{X_{2}} (x_{2}))}^{1 - λ}

(6)

and the maximum in (5) is taken over all the input probability distributions

P_{X_{1}}

and

P_{X_{2}}

, and the maximum in (6) is taken over all the dual clique pairs

(S, T)

of

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

.

Proof.

Let

(A, B)

be a uniquely decodable codebook pair of length n. We will show that:

{| A |}^{λ} {| B |}^{1 - λ} \leq κ \cdot {(\frac{1}{l (λ)})}^{n}

(7)

by induction on n, where

κ

is a constant independent of n.

Indeed, for the base case

n = 1

, one could take subsets

A \subseteq X_{1}

,

B \subseteq X_{2}

such that for any distinct

a, a^{'} \in A

and distinct

b, b^{'} \in B

, we have

a \overset{b}{≁} a^{'}

and

b \overset{a}{≁} b^{'}

. Clearly,

| A | | B | \leq | X_{1} | | X_{2} |

and (7) follows by taking

κ

sufficiently large.

Assume that (7) holds for every length

n^{'} \leq n - 1

, and let us proceed to prove for length n. Suppose

(A, B) \subseteq X_{1}^{n} \times X_{2}^{n}

is a uniquely decodable codebook pair of length n. For a vector

x^{n}

, let

x^{n ∖ i} ≜ (x_{1}, \dots, x_{i - 1}, x_{i + 1}, \dots, x_{n})

be its projection over all coordinates not equal to i. For each coordinate

1 \leq i \leq n

and each

x_{1} \in X_{1}

,

x_{2} \in X_{2}

, let

\begin{matrix} A_{i} (x_{1}) & ≜ {a^{n ∖ i} : a^{n} \in A, a_{i} = x_{1}}, \\ B_{i} (x_{2}) & ≜ {b^{n ∖ i} : b^{n} \in B, b_{i} = x_{2}} \end{matrix}

(8)

be the projections of each codebook obtained by fixing the ith coordinate. Define the distributions induced by these projections over

X_{1}

and

X_{2}

respectively to be

P_{X_{1}}^{i} (x_{1}) ≜ \frac{| A_{i} (x_{1}) |}{| A |}, P_{X_{2}}^{i} (x_{2}) ≜ \frac{| B_{i} (x_{2}) |}{| B |} .

(9)

Furthermore, for any two subsets

S \subseteq X_{1}

and

T \subseteq X_{2}

, define the codebooks induced by the unions over S and T of the respective projected codebooks, to be

A_{i} (S) ≜ ⋃_{x_{1} \in S} A_{i} (x_{1}), B_{i} (T) ≜ ⋃_{x_{2} \in T} B_{i} (x_{2}) .

(10)

Note that if

(S, T)

is a dual clique pair such that

A_{i} (S) \neq \emptyset

and

B_{i} (T) \neq \emptyset

, then the unions in (10) are disjoint, as otherwise this would contradict the assumption that

(A, B)

is uniquely decodable. Hence

\begin{matrix} | A_{i} (S) | = \sum_{x_{1} \in S} | A_{i} (x_{1}) |, | B_{i} (T) | = \sum_{x_{2} \in T} | B_{i} (x_{2}) |, \end{matrix}

(11)

and also, for any

i \in [n]

it must hold that

(A_{i} (S), B_{i} (T))

is a uniquely decodable codebook pair of length

n - 1

. Combining (8), (9) and (11) gives

\begin{matrix} {| A |}^{λ} {| B |}^{1 - λ} & = {(\frac{| A_{i} (S) |}{\sum_{s \in S} P_{X_{1}}^{i} (s)})}^{λ} \cdot {(\frac{| B_{i} (T) |}{\sum_{t \in T} P_{X_{2}}^{i} (t)}))}^{1 - λ} . \end{matrix}

(12)

By the inductive hypothesis, we obtain

\begin{matrix} {| A |}^{λ} {| B |}^{1 - λ} & \leq \frac{κ \cdot {(\frac{1}{l (λ)})}^{n - 1}}{{(\sum_{s \in S} P_{X_{1}}^{i} (s))}^{λ} \cdot {(\sum_{t \in T} P_{X_{2}}^{i} (t))}^{1 - λ}} \\ \leq \frac{κ \cdot {(\frac{1}{l (λ)})}^{n - 1}}{l (λ)} = κ \cdot {(\frac{1}{l (λ)})}^{n}, \end{matrix}

where the second inequality follows from the definition of

l (λ)

in (6). This completes the proof. □

The following is a trivial corollary of Lemmas 6 and 7.

Corollary 1.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

is contained in

\begin{matrix} ⋂_{Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}} ⋂_{0 \leq λ \leq 1} {(R_{1}, & R_{2}) : R_{1} \geq 0, R_{2} \geq 0, λ R_{1} + (1 - λ) R_{2} \leq t (λ)}, \end{matrix}

(13)

where

t (λ) ≜ min \{max_{P_{X_{1}}, P_{X_{2}}} ϵ (λ), max_{P_{X_{1}}, P_{X_{2}}} - log l (λ)\} .

(14)

We now provide a single-letter outer bound, in which the order of the minimum and the maximum in (14) is swapped. This generally yields a tighter outer bound due to the max–min inequality. In fact, our bound can be seen as a generalization of the one obtained by Holzman and Körner for the binary multiplying channel [13], in which case the max–min is indeed strictly tighter than the min–max.

Theorem 2.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

is contained in

\begin{matrix} ⋂_{Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}} ⋂_{0 \leq λ \leq 1} {(R_{1}, & R_{2}) : R_{1} \geq 0, R_{2} \geq 0, λ R_{1} + (1 - λ) R_{2} \leq θ (λ)}, \end{matrix}

(15)

where

θ (λ) ≜ max_{P_{X_{1}}, P_{X_{2}}} min {ϵ (λ), - log l (λ)} .

(16)

The first intersection is taken over all DM-TWCs

Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}

with the same adjacency as

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

, and the maximum is taken over all product input probability distributions

P_{X_{1}} \times P_{X_{2}}

.

Proof.

The intersection over all

Q_{Y_{1}, Y_{2} | X_{1}, X_{2}}

follows from Proposition 2. Hence without loss of generality, we prove that for

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

, each achievable rate pair

(R_{1}, R_{2})

satisfies

λ R_{1} + (1 - λ) R_{2} \leq θ (λ)

, where

0 \leq λ \leq 1

.

To that end, for each uniquely decodable codebook pair

(A, B)

of length n, we will show that:

{| A |}^{λ} {| B |}^{1 - λ} \leq κ \cdot 2^{n θ (λ)}

(17)

by induction on n, where

κ

is a constant independent of n. The base case of

n = 1

, follows in the same way as in the proof of the base case in Lemma 7. Assume that (17) holds for all length

n^{'} \leq n - 1

, and let us prove it also holds for length n. Suppose that

(A, B) \subseteq X_{1}^{n} \times X_{2}^{n}

is a uniquely decodable codebook pair of length n. Following the same steps of (8)–(12) in the argument of Lemma 7, we also have:

\begin{matrix} {| A |}^{λ} {| B |}^{1 - λ} & = {(\frac{| A_{i} (S) |}{\sum_{s \in S} P_{X_{1}}^{i} (s)})}^{λ} \cdot {(\frac{| B_{i} (T) |}{\sum_{t \in T} P_{X_{2}}^{i} (t)}))}^{1 - λ} . \end{matrix}

(18)

Now, if there exists a dual clique pair

(S, T)

and a coordinate

1 \leq i \leq n

such that

{(\sum_{s \in S} P_{X_{1}}^{i} (s))}^{λ} {(\sum_{t \in T} P_{X_{2}}^{i} (t))}^{1 - λ} \geq 2^{- θ (λ)},

(19)

then (18) implies

\begin{matrix} {| A |}^{λ} {| B |}^{1 - λ} \leq \frac{κ 2^{(n - 1) θ (λ)}}{2^{- θ (λ)}} = κ 2^{n θ (λ)}, \end{matrix}

where the inequality follows from the inductive hypothesis and (19). Therefore, we conclude that (17) holds under condition (19).

Assume now that condition (19) is not satisfied, that is,

max_{i \in [n]} max_{S, T} {(\sum_{s \in S} P_{X_{1}}^{i} (s))}^{λ} {(\sum_{t \in T} P_{X_{2}}^{i} (t))}^{1 - λ} < 2^{- θ (λ)} .

(20)

Let

A^{n}

and

B^{n}

be codewords chosen from

A

and

B

respectively, uniformly at random, and let

Y_{1}^{n}

,

Y_{2}^{n}

be the corresponding channel outputs. Since

(A, B)

is a uniquely decodable codebook pair of length n, it must be that:

\begin{matrix} log | A | & = I (Y_{2}^{n}; A^{n} | B^{n}), \\ log | B | & = I (Y_{1}^{n}; B^{n} | A^{n}) . \end{matrix}

(21)

On the other hand, we have:

\begin{matrix} I (Y_{1}^{n}; B^{n} | A^{n}) & = H (Y_{1}^{n} | A^{n}) - H (Y_{1}^{n} | A^{n}, B^{n}) \end{matrix}

(22)

\begin{matrix} = \sum_{i = 1}^{n} H (Y_{1, i} | Y_{1, 1}, \dots, Y_{1, i - 1}, A^{n}) - \sum_{i = 1}^{n} H (Y_{1, i} | A_{i}, B_{i}) \end{matrix}

(23)

\begin{matrix} \leq \sum_{i = 1}^{n} H (Y_{1, i} | A_{i}) - \sum_{i = 1}^{n} H (Y_{1, i} | A_{i}, B_{i}) \end{matrix}

(24)

\begin{matrix} = \sum_{i = 1}^{n} I (Y_{1, i}; B_{i} | A_{i}), \end{matrix}

(25)

where (23) follows from the entropy chain rule and the memorylessness of the channel, and (24) follows from the fact that conditioning reduces entropy. Similarly,

I (Y_{2}^{n}; A^{n} | B^{n}) \leq \sum_{i = 1}^{n} I (Y_{2, i}; A_{i} | B_{i}) .

(26)

Combining (20)–(26), we obtain

\begin{matrix} {log | A |}^{λ} {| B |}^{1 - λ} \\ = λ log | A | + (1 - λ) log | B | \\ \leq \sum_{i = 1}^{n} λ I (Y_{2, i}; A_{i} | B_{i}) + (1 - λ) I (Y_{1, i}; B_{i} | A_{i}) \\ \leq max_{\binom{P_{X_{1}}, P_{X_{2}},}{l (λ) < 2^{- θ (λ)}}} n [λ I (Y_{2}; X_{1} | X_{2}) + (1 - λ) I (Y_{1}; X_{2} | X_{1})] \\ = max_{\binom{P_{X_{1}}, P_{X_{2}},}{l (λ) < 2^{- θ (λ)}}} n \cdot ϵ (λ), \end{matrix}

(27)

where

ϵ (λ)

and

l (λ)

are defined in (4) and (6), respectively, and the maximum is taken over all product input probability distributions

P_{X_{1}} \times P_{X_{2}}

such that

l (λ) < 2^{- θ (λ)}

, following condition (20).

By the definition of

θ (λ)

, we have:

\begin{matrix} θ (λ) & = max_{P_{X_{1}}, P_{X_{2}}} min {ϵ (λ), - log l (λ)} \end{matrix}

(28)

\begin{matrix} \geq max_{\binom{P_{X_{1}}, P_{X_{2}},}{l (λ) < 2^{- θ (λ)}}} min {ϵ (λ), - log l (λ)} . \end{matrix}

(29)

Note that for any input distributions

P_{X_{1}}, P_{X_{2}}

such that

l (λ) < 2^{- θ (λ)}

, we have

- log l (λ) > θ (λ) .

(30)

Combining (29) and (30), we obtain

max_{\binom{P_{X_{1}}, P_{X_{2}},}{l (λ) < 2^{- θ (λ)}}} ϵ (λ) \leq θ (λ) .

(31)

Substituting (31) into (27), we have

{log | A |}^{λ} {| B |}^{1 - λ} \leq n θ (λ)

, completing the proof. □

We remark that Theorem 2 immediately implies, in particular, the following upper bound on the zero-error capacity of the point-to-point discrete memoryless channel.

Corollary 2.

The zero-error capacity of the discrete memoryless channel

P_{Y | X}

is upper bounded by

min_{Q_{Y | X}} max_{P_{X}} min \{I (X; Y), - log max_{C} \sum_{x \in C} P_{X} (x)\} .

The outer minimum is taken over all the

Q_{Y | X}

having the same confusion graph as

P_{Y | X}

, the outer maximum is taken over all the input distributions

P_{X}

, and the inner maximum is taken over all the cliques C of the confusion graph of the channel.

As it turns out, the upper bound in Corollary 2 coincides with the linear programming bound on the zero-error capacity of a point-to-point discrete memoryless channel in [2]. Namely,

min_{Q_{Y | X}} max_{P_{X}} I (X; Y) = max_{P_{X}} \{- log max_{C} \sum_{x \in C} P_{X} (x)\}

for any point-to-point discrete memoryless channel

P_{Y | X}

. This fact was originally conjectured by Shannon [2] and later proved by Ahlswede [22]. In other words, this means that in the point-to-point case, Corollary 1 yields exactly the same bound as Theorem 2. However, this is not the case in general for the DM-TWC. For example, recall that Holzman and Körner [13] derived the bound in Theorem 2 in the special case of the (deterministic) binary multiplying channel (using

λ = 0.5

) and numerically showed that it is strictly better than what can be obtained from Corollary 1. Next we give another example showing that Theorem 2 outperforms Corollary 1 for a noisy (i.e., non-deterministic) DM-TWC as well.

Example 1.

Let

X_{1} = {0, 1, 2}, X_{2} = Y_{1} = Y_{2} = {0, 1}

, and the conditional probability distribution

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

be

where

δ \in (0, 1)

. Corollary 1 gives the upper bound

R_{1} + R_{2} \leq min \{max_{P_{X_{1}}, P_{X_{2}}} ϵ^{*}, max_{P_{X_{1}}, P_{X_{2}}} - log l^{*}\} \approx 1.2933,

where

\begin{matrix} ϵ^{*} & = I (X_{1}; Y_{2} | X_{2}) + I (X_{2}; Y_{1} | X_{1}) \\ = P_{X_{1}} (2) \cdot h (P_{X_{2}} (0)) + P_{X_{2}} (0) \cdot h (P_{X_{1}} (0) + δ \cdot P_{X_{1}} (1)) - P_{X_{1}} (1) \cdot P_{X_{2}} (0) \cdot h (δ) \\ + P_{X_{2}} (1) \cdot h (P_{X_{1}} (1)), \\ l^{*} & = max_{S, T} (\sum_{x_{1} \in S} P_{X_{1}} (x_{1})) (\sum_{x_{2} \in T} P_{X_{2}} (x_{2})) \\ = max {P_{X_{1}} (0), P_{X_{1}} (1), P_{X_{2}} (0) \cdot (P_{X_{1}} (0) + P_{X_{1}} (1)), P_{X_{2}} (0) \cdot (P_{X_{1}} (1) + P_{X_{1}} (2)), \\ P_{X_{2}} (1) \cdot (P_{X_{1}} (0) + P_{X_{1}} (2))}, \end{matrix}

and

h (x) = - x log x - (1 - x) log (1 - x)

. In contrast, Theorem 2 yields a tighter upper bound of

R_{1} + R_{2} \leq max_{P_{X_{1}}, P_{X_{2}}} min {ϵ^{*}, - log l^{*}} \approx 1.2910 .

Based on Lemma 3 and the Shannon capacity of a graph, we immediately have the following bound.

Lemma 8.

\begin{matrix} C_{z e}^{s u m} ([G_{1}, \dots, G_{| X_{1} |}; H_{1}, \dots, H_{| X_{2} |}]) \leq max_{x_{1} \in X_{1}, x_{2} \in X_{2}} Θ (G_{x_{1}}) + Θ (H_{x_{2}}) . \end{matrix}

It is worth noting that the above bound could be optimal in the sense that when all

G_{i} = G

and

H_{j} = H

, it is easily verified that

C_{z e}^{s u m} ([G, \dots, G; H, \dots, H]) = Θ (G) + Θ (H)

. However, the bound in Lemma 8 is not tight in general. Later in Section 5, we will improve the bound in Lemma 8 for certain scenarios and show that the improved bound (in Theorem 5) could outperform Theorem 2 (see Example 3), and be achieved in special cases (see Theorem 7).

In this section, we present two inner bounds for the non-adaptive zero-error capacity region of the DM-TWC, one based on random coding and the other on linear codes.

The random coding for DM-TWC is standard and generalizes a known bound by Shannon for the one-way case [2]. To obtain the random coding inner bound, we need the following lemma from [1].

Lemma 9

([1]). Let X be a random variable taking values in

[N]

, and

{f_{i} : [N] \to R_{+}}_{i \in [d]}

be a collection of nonnegative functions. Then there exists

x \in [N]

such that

f_{i} (x) \leq d \cdot E [f_{i} (X)]

for all

i \in [d]

.

Theorem 3.

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

contains the region:

\begin{matrix} ⋃_{P_{X_{1}}, P_{X_{2}}} {(R_{1}, R_{2}) : & R_{1} \geq 0, R_{2} \geq 0, \\ R_{1} \leq - \frac{1}{2} log \sum_{\binom{x_{1} \overset{x_{2}}{\sim} x_{1}^{'} \lor x_{1} = x_{1}^{'},}{x_{1}, x_{1}^{'} \in X_{1}, x_{2} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{1}} (x_{1}^{'}) P_{X_{2}} (x_{2}), \\ R_{2} \leq - \frac{1}{2} log \sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}),} \end{matrix}

(32)

where the union is taken over all input distributions

P_{X_{1}}

,

P_{X_{2}}

.

Proof.

We randomly draw a codebook pair

(A, B)

, such that

A

(resp.

B

) consists of

M_{1}

(resp.

M_{2}

) statistically independent words, where each word is generated i.i.d. according to a probability distribution

P_{X_{1}}

(resp.

P_{X_{2}}

). A word

a^{n} \in A

is called bad, if there exist two words,

b^{n}, {\tilde{b}}^{n} \in B

that are either equal or adjacent in

G_{a_{1}} ⊠ \dots ⊠ G_{a_{n}}

. For any particular words

a^{n} \in A

,

b^{n}, {\tilde{b}}^{n} \in B

and coordinate

i \in [n]

, the probability that

b_{i} \sim {\tilde{b}}_{i}

in

G_{a_{i}}

is upper bounded by:

\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}) .

Since all the coordinates are independent, the probability that

b^{n} \sim {\tilde{b}}^{n}

in

G_{a_{1}} ⊠ \dots ⊠ G_{a_{n}}

is at most:

{(\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}))}^{n} .

(33)

Denote by

Bad (a^{n})

the number of 2-subsets

{b^{n}, {\tilde{b}}^{n}} \subseteq B

such that

b^{n} \sim {\tilde{b}}^{n}

in

G_{a_{1}} ⊠ \dots ⊠ G_{a_{n}}

. Then,

\begin{matrix} \Pr {a^{n} is bad} & = \Pr {Bad (a^{n}) \geq 1} \\ \leq E [Bad (a^{n})] \\ \leq (\binom{M_{2}}{2}) {(\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}))}^{n}, \end{matrix}

where the first inequality is by Markov’s inequality, and the second inequality follows from (33) and the linearity of expectation. Similarly, a word

b^{n} \in B

is called bad, if there exist two words

a^{n}, {\tilde{a}}^{n} \in A

that are equal or adjacent in

H_{b_{1}} ⊠ \dots ⊠ H_{b_{n}}

, and we have

\Pr {b^{n} is bad} \leq (\binom{M_{1}}{2}) {(\sum_{\binom{x_{1} \overset{x_{2}}{\sim} x_{1}^{'} \lor x_{1} = x_{1}^{'},}{x_{1}, x_{1}^{'} \in X_{1}, x_{2} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{1}} (x_{1}^{'}) P_{X_{2}} (x_{2}))}^{n} .

Let

f_{1} (A, B)

,

f_{2} (A, B)

be the number of bad words in

A

and

B

respectively. Then, we have:

\begin{matrix} E [f_{1} (A, B)] & \leq M_{1} (\binom{M_{2}}{2}) {(\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}))}^{n}, \end{matrix}

(34)

\begin{matrix} E [f_{2} (A, B)] & \leq M_{2} (\binom{M_{1}}{2}) {(\sum_{\binom{x_{1} \overset{x_{2}}{\sim} x_{1}^{'} \lor x_{1} = x_{1}^{'},}{x_{1}, x_{1}^{'} \in X_{1}, x_{2} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{1}} (x_{1}^{'}) P_{X_{2}} (x_{2}))}^{n} . \end{matrix}

(35)

By Lemma 9, there exists a pair

(A^{*}, B^{*})

such that

f_{1} (A^{*}, B^{*}) \leq 2 E [f_{1} (A, B)], f_{2} (A^{*}, B^{*}) \leq 2 E [f_{2} (A, B)] .

(36)

Remove all the bad words in

A^{*}

and

B^{*}

respectively, yielding a codebook pair

(A^{'}, B^{'})

such that:

| A^{'} | = M_{1} - f_{1} (A^{*}, B^{*}) and | B^{'} | = M_{2} - f_{2} (A^{*}, B^{*}) .

(37)

It is readily seen that

(A^{'}, B^{'})

is a uniquely decodable codebook pair.

Now let

\begin{matrix} M_{1} & = {(1 - ξ_{1})}^{\frac{n}{2}} {(\sum_{\binom{x_{1} \overset{x_{2}}{\sim} x_{1}^{'} \lor x_{1} = x_{1}^{'},}{x_{1}, x_{1}^{'} \in X_{1}, x_{2} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{1}} (x_{1}^{'}) P_{X_{2}} (x_{2}))}^{- \frac{n}{2}}, \end{matrix}

(38)

\begin{matrix} M_{2} & = {(1 - ξ_{2})}^{\frac{n}{2}} {(\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}))}^{- \frac{n}{2}}, \end{matrix}

(39)

where

ξ_{1}, ξ_{2}

are arbitrarily small positive numbers. Combining (34)–(39), we obtain:

\begin{matrix} | A^{'} | & \geq (1 - {(1 - ξ_{2})}^{n}) {(1 - ξ_{1})}^{\frac{n}{2}} {(\sum_{\binom{x_{1} \overset{x_{2}}{\sim} x_{1}^{'} \lor x_{1} = x_{1}^{'},}{x_{1}, x_{1}^{'} \in X_{1}, x_{2} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{1}} (x_{1}^{'}) P_{X_{2}} (x_{2}))}^{- \frac{n}{2}}, \\ | B^{'} | & \geq (1 - {(1 - ξ_{1})}^{n}) {(1 - ξ_{2})}^{\frac{n}{2}} {(\sum_{\binom{x_{2} \overset{x_{1}}{\sim} x_{2}^{'} \lor x_{2} = x_{2}^{'},}{x_{1} \in X_{1}, x_{2}, x_{2}^{'} \in X_{2}}} P_{X_{1}} (x_{1}) P_{X_{2}} (x_{2}) P_{X_{2}} (x_{2}^{'}))}^{- \frac{n}{2}} . \end{matrix}

Since

ξ_{1}, ξ_{2}

are arbitrarily small, by taking n as sufficiently large, we can find an

(n, R_{1}, R_{2})

uniquely decodable codebook pair arbitrarily close to (32), as desired. □

In this subsection, we present a construction of uniquely decodable codes via linear codes, which generalizes a known result for the binary multiplying channel [15]. Let us introduce some notations first. Suppose D is a set of letters,

x^{n}

and

y^{n}

are vectors of length n, and

C

is a collection of vectors of length n. Let:

{ind}_{D} (x^{n}) ≜ {1 \leq i \leq n : x_{i} \in D}

(40)

denote the collection of indices where

x_{i} \in D

. For

I \subseteq [n]

let

y^{n} |_{I}

denote the vector obtained from

y^{n}

by projecting onto the coordinates in I, and denote

C_{| I} ≜ {c^{n} |_{I} : c^{n} \in C} .

Let

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

be a DM-TWC. We say that

x_{1} \in X_{1}

is a detecting symbol, if

x_{2} \overset{x_{1}}{≁} x_{2}^{'}

for any distinct

x_{2}, x_{2}^{'} \in X_{2}

. A detecting symbol

x_{2} \in X_{2}

is defined analogously. Let

D_{1} \subseteq X_{1}

and

D_{2} \subseteq X_{2}

denote the sets of all detecting symbols in

X_{1}

and

X_{2}

, respectively. A vector

a^{n} \in X_{1}^{n}

is called a detecting vector for

B \subseteq X_{2}^{n}

if

|B_{| {ind}_{D_{1}} (a^{n})}| = | B | .

(41)

Similarly, a vector

b^{n} \in X_{2}^{n}

is a detecting vector for

A \subseteq X_{1}^{n}

if

| A_{| {ind}_{D_{2}} (b^{n})} | = | A | .

(42)

The following claim is immediate.

Proposition 7.

Let

A \subseteq X_{1}^{n}

,

B \subseteq X_{2}^{n}

. If each

a^{n} \in A

is a detecting vector for

B

and each

b^{n} \in B

is a detecting vector for

A

, then

(A, B)

is a uniquely decodable codebook pair.

Proposition 7 provides a sufficient condition for a uniquely decodable code, which is not always necessary (see Example 2). Nevertheless, this sufficient condition furnishes us with a way of constructing uniquely decodable codes by employing linear codes.

Example 2.

Suppose that

X_{1} = {a_{0}, a_{1}, a_{2}}

,

X_{2} = {b_{0}, b_{1}}

such that

D_{1} = {a_{0}, a_{1}, a_{2}}

,

D_{2} = {b_{1}}

, and

a_{0} \overset{b_{0}}{\sim} a_{1}

,

a_{0} \overset{b_{0}}{\sim} a_{2}

,

a_{1} \overset{b_{0}}{≁} a_{2}

. Let

A = {a_{0} a_{0} a_{0}, a_{1} a_{1} a_{1}, a_{0} a_{1} a_{2}}

and

B = {b_{0} b_{1} b_{0}}

. It is easy to verify that

(A, B)

is a uniquely decodable codebook pair. However,

{ind}_{D_{2}} (b_{0} b_{1} b_{0}) = {2}

and

| A_{| {2}} | = | {a_{0}, a_{1}} | = 2 < | A | = 3

, implying that

b_{0} b_{1} b_{0}

is not a detecting vector for

A

.

Assume that

| X_{1} | = q_{1}

and

| X_{2} | = q_{2}

, where

q_{1}, q_{2}

are prime powers, and let us think of the alphabets as

F_{q_{1}}

and

F_{q_{2}}

, respectively. The following theorem gives an inner bound on the capacity region, which is a generalization of the Tolhuizen’s construction for the Blackwell’s multiplying channel [15].

Theorem 4.

Let

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

be a DM-TWC with input alphabet sizes

| X_{1} | = q_{1}

,

| X_{2} | = q_{2}

, where

q_{1}

,

q_{2}

are prime powers. If

X_{1}

and

X_{2}

contain

τ_{1}

and

τ_{2}

detecting symbols respectively, then

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

contains the region

\begin{matrix} ⋃_{0 \leq α, β \leq 1} {(R_{1}, R_{2}) : & R_{1} \geq 0, R_{2} \geq 0, \\ R_{1} \leq h (α) + α log τ_{2} + (1 - α) log (q_{2} - τ_{2}) - (1 - β) log q_{2}, \\ R_{2} \leq h (β) + β log τ_{1} + (1 - β) log (q_{1} - τ_{1}) - (1 - α) log q_{1}}, \end{matrix}

(43)

where

h (x) ≜ - x log x - (1 - x) log (1 - x)

is the binary entropy function.

To prove this theorem, we need the following lemma. The case that

q_{1} = q_{2} = 2

and

τ = 1

was proved in ([15], Theorem 3). Lemma 10 follows from similar argument.

Lemma 10.

Let q,

q^{'}

be prime powers,

n, k

be positive integers such that

1 \leq k \leq n

, and

D \subseteq F_{q^{'}}

with cardinality

| D | = τ

. Then there exists a pair

(C, Y (C))

satisfying that:

(1): $C$ is a q-ary $[n, k]$ linear code;
(2): $Y (C) \subseteq F_{q^{'}}^{n}$ such that

$| Y (C) | \geq (\binom{n}{k}) τ^{k} {(q^{'} - τ)}^{n - k} \prod_{i = 1}^{\infty} (1 - q^{- i});$
(3): for each $x^{n} \in Y (C)$ , we have $| {i n d}_{D} (x^{n}) | = k$ and $|C_{| {i n d}_{D} (x^{n})}| = | C |$ .

Proof.

Let A be a

k \times n

matrix of full rank over

F_{q}

, then

C (A) ≜ {y^{k} A : y^{k} \in F_{q}^{k}}

is a q-ary

[n, k]

linear code generated by A. Recall that for every

x^{n} \in F_{q^{'}}^{n}

,

{ind}_{D} (x^{n}) = {i \in [n] : x_{i} \in D}

as in (40). Denote:

Y (C (A)) ≜ \{x^{n} \in F_{q^{'}}^{n} : | {ind}_{D} (x^{n}) | = k, | C_{| {ind}_{D} (x^{n})} | = | C |\} .

Let

{A |}_{{ind}_{D} (x^{n})}

denote the

k \times | {ind}_{D} (x^{n}) |

submatrix of A with columns indexed by

{ind}_{D} (x^{n})

. It is easy to see that

| C_{| {ind}_{D} (x^{n})} | = | C |

is equivalent to

rank (A |_{{ind}_{D} (x^{n})}) = k

. Denote:

P ≜ \{(A, x^{n}) : A \in F_{q}^{k \times n}, x^{n} \in F_{q^{'}}^{n}, | {ind}_{D} (x^{n}) | = k, rank (A |_{{ind}_{D} (x^{n})}) = k\},

(44)

and let us proceed by double counting the cardinality of

P

.

On the one hand, the number of vectors

x^{n} \in F_{q^{'}}^{n}

such that

| {ind}_{D} (x^{n}) | = k

is

(\binom{n}{k}) τ^{k} {(q^{'} - τ)}^{n - k}

. For each such

x^{n}

, there are

q^{k (n - k)} I_{q} (k)

corresponding

k \times n

matrices

A \in F_{q}^{k \times n}

such that

rank (A |_{{ind}_{D} (x^{n})}) = k

, where

I_{q} (k) = \prod_{i = 0}^{k - 1} (q^{k} - q^{i})

is the number of

k \times k

invertible matrices over

F_{q}

, see ([15], Lemma 3). Hence, we have:

| P | = (\binom{n}{k}) τ^{k} {(q^{'} - τ)}^{n - k} q^{k (n - k)} I_{q} (k) .

On the other hand, the number of matrices

A \in F_{q}^{k \times n}

is

q^{n k}

. By (44) and the pigeonhole principle, there exist a matrix

A^{*} \in F_{q}^{k \times n}

and a corresponding code

C (A^{*})

such that

| Y (C (A)) | \geq | P | / q^{n k}

. Letting

C = C (A^{*})

, the lemma follows. □

Proof of Theorem 4.

For

i = 1, 2

, let us identify

X_{i}

with

F_{q_{i}}

, and let the respective sets of all detecting symbols be

D_{i} \subseteq F_{q_{i}}

with

| D_{i} | = τ_{i}

.

To prove the existence of a uniquely decodable codebook pair based on Proposition 7, we first use Lemma 10 to find two “one-sided” uniquely decodable linear codebook pairs, and then combine them to the desired codebook pair by employing their cosets in

F_{q_{1}}^{n}

and

F_{q_{2}}^{n}

.

First, letting

q = q_{1}

,

q^{'} = q_{2}

,

G = D_{2}

and

τ = τ_{2}

in Lemma 10, we have a pair

(C_{1}, Y (C_{1}))

satisfying that

C_{1}

is a

q_{1}

-ary

[n, k_{1}]

linear code and

Y (C_{1}) \subseteq F_{q_{2}}^{n}

such that

| Y (C_{1}) | \geq (\binom{n}{k_{1}}) {τ_{2}}^{k_{1}} {(q_{2} - τ_{2})}^{n - k_{1}} \prod_{i = 1}^{\infty} (1 - q_{1}^{- i}) .

(45)

Similarly, letting

q = q_{2}

,

q^{'} = q_{1}

,

G = D_{1}

and

τ = τ_{1}

in Lemma 10, we have a pair

(C_{2}, Y (C_{2}))

satisfying that

C_{2}

is a

q_{2}

-ary

[n, k_{2}]

linear code and

Y (C_{2}) \subseteq F_{q_{1}}^{n}

such that

| Y (C_{2}) | \geq (\binom{n}{k_{2}}) {τ_{1}}^{k_{2}} {(q_{1} - τ_{1})}^{n - k_{2}} \prod_{i = 1}^{\infty} (1 - q_{2}^{- i}) .

(46)

The property

(3)

in Lemma 10 implies that each

x^{n} \in Y (C_{i})

is a detecting vector for

C_{i}

for

i = 1, 2

. Note that if

Ξ (C_{i}) \subseteq F_{q_{i}}^{n}

is a coset of

C_{i}

, then each

x^{n} \in Y (C_{i})

is also a detecting vector for

Ξ (C_{i})

.

Now we are going to combine the two pairs

(C_{1}, Y (C_{1}))

and

(C_{2}, Y (C_{2}))

. Since

C_{i}

has

q_{i}^{n - k_{i}}

cosets, then by the pigeonhole principle there exists coset

Ξ (C_{i})

of

C_{i}

such that:

\begin{matrix} A & ≜ Y (C_{1}) \cap Ξ (C_{2}), | A | \geq \frac{| Y (C_{1}) |}{q_{2}^{n - k_{2}}}, \\ B & ≜ Y (C_{2}) \cap Ξ (C_{1}), | B | \geq \frac{| Y (C_{2}) |}{q_{1}^{n - k_{1}}} . \end{matrix}

(47)

We now notice that each vector in

A

(resp.

B

) is a detecting vector for

B

(resp.

A

), hence by Proposition 7

(A, B)

is a uniquely decodable codebook pair. Moreover, for fixed

q_{1}

,

q_{2}

, we have:

\begin{matrix} \frac{log | A |}{n} \geq & h (\frac{k_{1}}{n}) + (\frac{k_{1}}{n}) log τ_{2} + (1 - \frac{k_{1}}{n}) log (q_{2} - τ_{2}) - (1 - \frac{k_{2}}{n}) log q_{2} - O (\frac{1}{n}), \end{matrix}

\begin{matrix} \frac{log | B |}{n} \geq & h (\frac{k_{2}}{n}) + (\frac{k_{2}}{n}) log τ_{1} + (1 - \frac{k_{2}}{n}) log (q_{1} - τ_{1}) - (1 - \frac{k_{1}}{n}) log q_{1} - O (\frac{1}{n}), \end{matrix}

which follows from (45)–(47). Letting

α = k_{1} / n \in [0, 1]

,

β = k_{2} / n \in [0, 1]

, we obtain:

\begin{matrix} R_{1} & = lim_{n \to \infty} \frac{log | A |}{n} \geq h (α) + α log τ_{2} + (1 - α) log (q_{2} - τ_{2}) - (1 - β) log q_{2}, \\ R_{2} & = lim_{n \to \infty} \frac{log | B |}{n} \geq h (β) + β log τ_{1} + (1 - β) log (q_{1} - τ_{1}) - (1 - α) log q_{1} . \end{matrix}

Therefore, (43) follows, as desired. □

We note that for any DM-TWC, one could only exploit part of input symbols

X_{1}^{'} \subseteq X_{1}

,

X_{2}^{'} \subseteq X_{2}

to meet the requirements in Theorem 4. Hence we in fact have the following more general bound.

Corollary 3.

Let

P_{Y_{1}, Y_{2} | X_{1}, X_{2}}

be a DM-TWC with input alphabets

X_{1}

,

X_{2}

. Then

C_{z e} (P_{Y_{1}, Y_{2} | X_{1}, X_{2}})

contains the region:

\begin{matrix} ⋃_{\binom{X_{1}^{'} \subseteq X_{1},}{X_{2}^{'} \subseteq X_{2}}} ⋃_{0 \leq α, β \leq 1} {(R_{1}, R_{2}) : & R_{1} \geq 0, R_{2} \geq 0, \\ R_{1} \leq h (α) + α log τ_{2}^{'} + (1 - α) log (q_{2}^{'} - τ_{2}^{'}) - (1 - β) log q_{2}^{'}, \\ R_{2} \leq h (β) + β log τ_{1}^{'} + (1 - β) log (q_{1}^{'} - τ_{1}^{'}) - (1 - α) log q_{1}^{'}}, \end{matrix}

where the first union is taken over all

X_{1}^{'} \subseteq X_{1}

,

X_{2}^{'} \subseteq X_{2}

such that

| X_{1}^{'} |

and

| X_{2}^{'} |

are prime powers, and contain

τ_{1}^{'}

and

τ_{2}^{'}

detecting symbols, respectively.

Notice that the region (43) relies on the number

q_{1}

,

q_{2}

of symbols being used and the corresponding numbers

τ_{1}, τ_{2}

of detecting symbols. It is thus possible that using only a smaller subset of channel inputs would yield higher achievable rates (when using our linear coding strategy) than those obtained by using larger subsets. For Example 1, Corollary 3 shows that a lower bound on the maximum sum-rate

R_{1} + R_{2}

is

1.17

, which is better than the random coding lower bound

1.0907

.

In this section, we consider the DM-TWC in the scenario that the communication in one direction is stable (in particular, noiseless). First we briefly review the probabilistic refinement of the Shannon capacity of a graph in Section 5.1. Then in Section 5.2, we provide an outer bound on the zero-error capacity region via the asymptotic spectrum of graphs. In Section 5.3, we present explicit constructions that attain our outer bound in certain special cases.

We first recall some basic notions and results from the method-of-types. Let

x^{n} \in X^{n}

be a sequence and

N (a | x^{n})

be the number of times that

a \in X

appears in sequence

x^{n}

. The type

P_{x^{n}}

of

x^{n}

is the relative proportion of each symbol in

X

, that is,

P_{x^{n}} (a) ≜ \frac{N (a | x^{n})}{n}

for all

a \in X

. Let

P_{n}

denote the collection of all possible types of sequences of length n. For every

P \in P_{n}

, the type class

T^{n} (P)

of P is the set of sequences of type P, that is,

T^{n} (P) ≜ {x^{n} : P_{x^{n}} = P}

. The ϵ-typical set of P is

T_{ϵ}^{n} (P) ≜ {x^{n} \in X^{n} : | P_{x^{n}} (a) - P (a) | < ϵ, \forall a \in X} .

The joint type

P_{x^{n}, y^{n}}

of a pair of sequences

(x^{n}, y^{n})

is the relative proportion of occurrences of each pair of symbols of

X \times Y

, that is,

P_{x^{n}, y^{n}} ≜ \frac{N (a, b | x^{n}, y^{n})}{n}

for all

a \in X

and

b \in Y

. By the Bayes’ rule, the conditional type

P_{x^{n} | y^{n}}

is defined as:

P_{x^{n} | y^{n}} (a, b) ≜ \frac{N (a, b | x^{n}, y^{n})}{N (b | y^{n})} = \frac{P_{x^{n}, y^{n}} (a, b)}{P_{y^{n}} (b)} .

Lemma 11

([23]).

| P_{n} {| \leq (n + 1)}^{| X |}

.

Lemma 12

([23]).

\forall P \in P_{n}

, we have

\frac{2^{n H (X)}}{{(n + 1)}^{| X |}} \leq | T^{n} (P) | \leq 2^{n H (X)}

.

In [24], Csiszár and Körner introduced the probabilistic refinement of the Shannon capacity of a graph, imposing that the independent set consists of sequences of the (asymptotically) same type. Let

G_{ϵ}^{n} [P]

denote the subgraph of

G^{n}

induced by

T_{ϵ}^{n} (P)

. The Shannon capacity of graph G relative to P is defined as

Θ (G, P) ≜ lim_{ϵ \to 0} \underset{n \to \infty}{lim sup} \frac{1}{n} log α (G_{ϵ}^{n} [P]) .

Let

G^{n} [P]

denote the subgraph of

G^{n}

induced by

T^{n} (P)

. Then, it is readily seen that:

\underset{n \to \infty}{lim sup} \frac{1}{n} log α (G^{n} [P]) \leq Θ (G, P) .

For each

η \in Δ (G)

, define

\hat{η} (G, P) ≜ lim_{ϵ \to 0} \underset{n \to \infty}{lim sup} \frac{1}{n} log η (G_{ϵ}^{n} [P]) .

(48)

If

G = {\bar{K}}_{n}

, then according to Lemma 12, we have

\hat{η} ({\bar{K}}_{n}, P) = H (X)

(49)

for any

η \in Δ (G)

. Very recently, Vrana [25] proved the following results on

\hat{η} (G, P)

.

Lemma 13

([25]). The limit in (48) exists and

(1): $Θ (G, P) = min_{η \in Δ (G)} \hat{η} (G, P)$ ;
(2): $log η (G) = max_{P} \hat{η} (G, P)$ for $η \in Δ (G)$ .

According to Lemma 11, it is easily seen that:

Θ (G) = max_{P} Θ (G, P) .

Here, we would like to mention that the probabilistic refinement of the Lovász theta number was introduced and investigated by Marton in [26] via a non-asymptotic formula, and the probabilistic refinement of the fractional clique cover number was studied in relation to the graph entropy in [27].

In this subsection, we derive an outer bound for the case when all

{H_{j}}

are the same, namely,

H_{j} = H

for all

j \in X_{2}

.

Theorem 5.

C_{z e} ([G_{1}, \dots, G_{| X_{1} |}; H, \dots, H])

is contained in the region

\begin{matrix} \{(R_{1}, R_{2}) : R_{1} \geq 0, R_{2} \geq 0, R_{1} + R_{2} \leq max_{P_{X_{1}}} \sum_{x_{1} \in X_{1}} P_{X_{1}} (x_{1}) Θ (G_{x_{1}}) + Θ (H, P_{X_{1}})\} . \end{matrix}

Proof.

Suppose that

(A, B) \subseteq X_{1}^{n} \times X_{2}^{n}

is a uniquely decodable codebook pair of length n. For any

a^{n} \in A

and

b^{n} \in B

, let

P_{a^{n}, b^{n}}

denote the joint type of the pair

(a^{n}, b^{n})

and

J^{n} (P_{X_{1}, X_{2}}) ≜ {(a^{n}, b^{n}) : a^{n} \in A, b^{n} \in B, P_{a^{n}, b^{n}} = P_{X_{1}, X_{2}}} .

By Lemma 11, there are at most

{(n + 1)}^{| X_{1} | | X_{2} |}

different joint types over

(A, B)

. Thus by the pigeonhole principle, there exists one joint type

P_{X_{1}, X_{2}}^{*}

such that:

| J^{n} (P_{X_{1}, X_{2}}^{*}) | \geq \frac{| A | | B |}{{(n + 1)}^{| X_{1} | | X_{2} |}} .

(50)

Notice that for each

(a^{n}, b^{n}) \in J^{n} (P_{X_{1}, X_{2}}^{*})

, we have:

\begin{matrix} P_{a^{n}} & = P_{X_{1}}^{*} = \sum_{x_{2} \in X_{2}} P_{X_{1}, X_{2} = x_{2}}^{*}, \\ P_{b^{n}} & = P_{X_{2}}^{*} = \sum_{x_{1} \in X_{1}} P_{X_{1} = x_{1}, X_{2}}^{*} . \end{matrix}

Now we are going to upper bound the cardinality of

J^{n} (P_{X_{1}, X_{2}}^{*})

. Let

A^{*}

(resp.

B^{*}

) denote the collection of

a^{n} \in A

(resp.

b^{n} \in B

) that appears in

J^{n} (P_{X_{1}, X_{2}}^{*})

, that is, there exists

b^{n} \in B

(resp.

a^{n} \in A

) such that

P_{a^{n}, b^{n}} = P_{X_{1}, X_{2}}^{*}

. Then we immediately have

| J^{n} (P_{X_{1}, X_{2}}^{*}) | \leq | A^{*} | | B^{*} | .

(51)

Let us now turn to upper bound the cardinalities of

A^{*}

and

B^{*}

. Since

(A, B)

is uniquely decodable, by Proposition 1, for any

a^{n} \in A^{*}

it must hold that

B^{*}

is an independent set of

G_{a_{1}} ⊠ G_{a_{2}} ⊠ \dots ⊠ G_{a_{n}}

. Accordingly,

\begin{matrix} | B^{*} | & \leq α (G_{1}^{n P_{X_{1}}^{*} (1)} ⊠ G_{2}^{n P_{X_{1}}^{*} (2)} ⊠ \dots ⊠ G_{| X_{1} |}^{n P_{X_{1}}^{*} (| X_{1} |)}) . \end{matrix}

(52)

Also, for

b^{n} \in B^{*}

, we notice that

A^{*}

is an independent set of

H^{n}

with type

P_{X_{1}}^{*}

. To be precise, we have:

\begin{matrix} | A^{*} | & \leq α (H^{n} [P_{X_{1}}^{*}]) . \end{matrix}

(53)

Therefore we have:

\begin{matrix} \underset{n \to \infty}{lim sup} \frac{1}{n} log | A | | B | \end{matrix}

\begin{matrix} \leq \underset{n \to \infty}{lim sup} \frac{1}{n} log ({(n + 1)}^{| X_{1} | | X_{2} |} | J^{n} (P_{X_{1}, X_{2}}^{*}) |) \end{matrix}

(54)

\begin{matrix} = \underset{n \to \infty}{lim sup} \frac{1}{n} log | J^{n} (P_{X_{1}, X_{2}}^{*}) | \end{matrix}

(55)

\begin{matrix} \leq \underset{n \to \infty}{lim sup} \frac{1}{n} log | A^{*} | | B^{*} | \end{matrix}

(56)

\begin{matrix} \leq \underset{n \to \infty}{lim sup} \frac{1}{n} [log (α (G_{1}^{n P_{X_{1}}^{*} (1)} ⊠ \dots ⊠ G_{| X_{1} |}^{n P_{X_{1}}^{*} (| X_{1} |)})) + log (α (H^{n} [P_{X_{1}}^{*}]))] \end{matrix}

(57)

\begin{matrix} \leq \underset{n \to \infty}{lim sup} min_{η, η^{'} \in Δ (G)} \frac{1}{n} [log (η (G_{1}^{n P_{X_{1}}^{*} (1)} ⊠ \dots ⊠ G_{| X_{1} |}^{n P_{X_{1}}^{*} (| X_{1} |)})) + log (η^{'} (H^{n} [P_{X_{1}}^{*}]))] \end{matrix}

(58)

\begin{matrix} \leq \underset{n \to \infty}{lim sup} min_{η, η^{'} \in Δ (G)} \frac{1}{n} [\sum_{x_{1} \in X_{1}} n P_{X_{1}}^{*} (x_{1}) log (η (G_{x_{1}})) + log (η^{'} (H^{n} [P_{X_{1}}^{*}]))] \end{matrix}

(59)

\begin{matrix} \leq max_{P_{X_{1}}} \sum_{x_{1} \in X_{1}} P_{X_{1}} (x_{1}) Θ (G_{x_{1}}) + Θ (H, P_{X_{1}}), \end{matrix}

(60)

where (54) follows from (50); (55) follows from the fact that

| X_{1} |

,

| X_{2} |

are fixed when n tends to infinity; (56) follows from (51); (57) follows from (52) and (53); (58) follows from Theorem 1 that

α (G) \leq {min}_{η \in Δ (G)} η (G)

for any graph G; (59) follows from Theorem 1 that each

η \in Δ (G)

is multiplicative with respect to the strong product; and (60) follows from Theorem 1 and Lemma 13.

This completes the proof. □

In particular, considering the DM-TWC such that

| X_{1} | = 2

,

H = {\bar{K}}_{2}

,

G_{1} = {\bar{K}}_{| X_{2} |}

and

G_{2} = G

is a general graph, we have the following result.

Theorem 6.

C_{z e} ([{\bar{K}}_{| X_{2} |}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}])

is contained in the region

\begin{matrix} \{(R_{1}, R_{2}) : R_{1} \geq 0, R_{2} \geq 0, R_{1} + R_{2} \leq log (| X_{2} | + 2^{Θ (G)})\} . \end{matrix}

Proof.

Recall that:

Θ ({\bar{K}}_{n}) = log (n)

and

Θ ({\bar{K}}_{n}, P) = H (X)

. According to Theorem 5, we have:

\begin{matrix} R_{1} + R_{2} & \leq max_{P_{X_{1}}} \sum_{x_{1} \in X_{1}} P_{X_{1}} (x_{1}) Θ (G_{x_{1}}) + Θ ({\bar{K}}_{2}, P_{X_{1}}) \\ = max_{P_{X_{1}}} \{P_{X_{1}} (1) \cdot log | X_{2} | + P_{X_{1}} (2) \cdot Θ (G) + H (X_{1})\} \\ = log (| X_{2} | + 2^{Θ (G)}), \end{matrix}

where the last equality is achieved by taking

P_{X_{1}} (1) = \frac{| X_{2} |}{| X_{2} | + 2^{Θ (G)}}

and

P_{X_{1}} (2) = \frac{2^{Θ (G)}}{| X_{2} | + 2^{Θ (G)}}

. □

We remark that Theorem 6 (hence also Theorem 5) could outperform Theorem 2, see the following example.

Example 3.

Consider the channel

[{\bar{K}}_{5}, C_{5}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]

where

C_{5}

is the Pentagon graph. It is well known from [2,12] that

Θ (C_{5}) = \frac{1}{2} log 5

. Then by Theorem 6 we have an upper bound on the sum-rate

R_{1} + R_{2} \leq log (5 + \sqrt{5}) \approx 2.8552

, while Theorem 2 only gives an upper bound

R_{1} + R_{2} \leq 2.9069

.

In this subsection, we present explicit constructions of uniquely decodable codebook pairs which could attain the outer bound of Theorem 6 in certain special cases.

Theorem 7.

Let m be a prime power,

| X_{2} | = q = m s

and

G = K_{m} ⊔ \dots ⊔ K_{m}

be a disjoint union of s cliques. Then

C_{z e}^{s u m} ([{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) = log (q + s)

.

Proof.

First by Theorem 6, we have an upper bound on the sum-capacity given by

C_{z e}^{s u m} ([{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) \leq log (| X_{2} | + 2^{Θ (G)}) = log (q + s) .

(61)

Next, we consider the lower bound. Notice that

G = K_{m} ⊠ {\bar{K}}_{s}

. We can reformulate the channel accordingly as:

[{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}] = [{\bar{K}}_{m}, K_{m}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}] ⊠ [{\bar{K}}_{s}; K_{1}, \dots, K_{1}],

where the first

[{\bar{K}}_{m}, K_{m}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]

corresponds to a channel with input alphabets

X_{1}^{(1)} = {1, 2}

and

X_{2}^{(1)} = {1, \dots, m}

; and the second

[{\bar{K}}_{s}; K_{1}, \dots, K_{1}]

is with input alphabets

X_{1}^{(2)} = {1}

and

X_{2}^{(2)} = {1, \dots, s}

. Together with Lemma 1, we have:

C_{z e}^{s u m} ([{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) \geq C_{z e}^{s u m} ([{\bar{K}}_{m}, K_{m}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) + C_{z e}^{s u m} ([{\bar{K}}_{s}; K_{1}, \dots, K_{1}]) .

(62)

On the one hand, it is easy to see that:

C_{z e}^{s u m} ([{\bar{K}}_{s}; K_{1}, \dots, K_{1}]) = log s

(63)

since this is a clean channel and Alice and Bob could always communicate without error. On the other hand, by Lemma 10, we obtain:

C_{z e}^{s u m} ([{\bar{K}}_{m}, K_{m}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) \geq log (m + 1) .

(64)

In fact, letting

q = m

,

q^{'} = 2

and

τ = 1

in Lemma 10, we have a pair

(C, Y (C))

satisfying that

C

is an m-ary

[n, k]

linear code and

Y (C) \subseteq F_{2}^{n}

such that

| Y (C) | \geq (\binom{n}{k}) \prod_{i = 1}^{\infty} (1 - m^{- i}) .

(65)

Now let

A = Y (C)

and

B = C

, then it is easy to see that

(A, B)

is a uniquely decodable codebook pair with respect to the channel

[{\bar{K}}_{m}, K_{m}; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]

. The corresponding sum-rate is

\begin{matrix} lim_{n \to \infty} \frac{1}{n} log | A | | B | & = lim_{n \to \infty} \frac{1}{n} (log m^{k} + log (\binom{n}{k}) \prod_{i = 1}^{\infty} (1 - m^{- i})) \\ = lim_{n \to \infty} \frac{k}{n} log m + h (\frac{k}{n}) . \end{matrix}

Taking

k / n = m / (m + 1)

, we obtain a lower bound

log (m + 1)

on the best possible sum-rate, that is, (64).

Combining (62)–(64), we have

C_{z e}^{s u m} ([{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]) \geq log (m + 1) + log s = log (q + s)

, which also implies an explicit uniquely decodable codebook pair for the channel

[{\bar{K}}_{q}, G; {\bar{K}}_{2}, \dots, {\bar{K}}_{2}]

based on the argument of Lemma 1. Then, together with (61), the proof is complete. □

In this paper, we investigated the non-adaptive zero-error capacity region of the DM-TWC and provided several single-letter inner and outer bounds, some of which coincide in certain special cases. Determining the exact zero-error capacity region of a general DM-TWC remains an open problem, and clearly a difficult one, since it includes the notorious Shannon capacity of a graph as a special case. Despite this inherent difficulty, the problem is richer than the graph capacity setting, and we believe it deserves further study in order to obtain tighter bounds and smarter constructions.

One appealing direction is to extend the Lovász’s semi-definite relaxation approach in order to obtain tighter outer bounds, mimicking the graph capacity case. This, however, does not seem to be a simple task. In particular, one may ask whether the natural quantity

ρ ({G_{i}}, {H_{j}})

defined in (2), which upper-bounds the one-shot zero-error sum-capacity, is sub-multiplicative with respect to the graph strong product, in which case it would also serve as an upper bound for the zero-error sum-capacity. This is however not evident, in part since the problem (2) is not a semi-definite program. We have also considered other variations of the program (2). In particular, we have attempted to modify the non-linear constraints

⟨ E_{i, j}, Γ ⟩ ⟨ E_{i, m}, Γ ⟩ = 0

to be of a linear form

⟨ A, Γ ⟩ = 0

for some suitable symmetric matrix A. We have also looked at some variants of the orthonormal representation. For example, we considered the case where each graph vertex a is labelled by a unit vector

v_{a}

, and if two vertices a and

a^{'}

are nonadjacent

a \overset{b}{≁} a^{'}

if and only if

b \in F

for some set F, then the vector projections of

v_{a}

and

v_{a^{'}}

onto the subspace spanned by the vectors in F are orthogonal. However, proving sub-multiplicativity in any of these settings has so far resisted our best efforts.

It would be also of much interest to consider the adaptive zero-error capacity of the DM-TWC. Allowing Alice and Bob to adapt their transmissions on the fly can in general enlarge the zero-error capacity region. As a simple example, note that a point-to-point channel with noiseless feedback is a special case of the DM-TWC (where Bob has no information to send). In [2], Shannon explicitly derived the zero-error capacity with feedback for the point-to-point channel, and pointed out that for the channel corresponding to Pentagon graph this capacity is given by

log (5 / 2) \approx 1.32

. This is strictly larger than the zero-error capacity without feedback

(log 5) / 2 \approx 1.16

, which can be thought of in this case as the non-adaptive zero-error capacity of the channel. Exploring the differences between the adaptive and non-adaptive zero-error capacity regions of a general DM-TWC remains a challenging future work.

Conceptualization, Y.G. and O.S.; methodology, Y.G. and O.S.; investigation, Y.G. and O.S.; writing—original draft preparation, Y.G. and O.S.; writing—review and editing, Y.G. and O.S. All authors have read and agreed to the published version of the manuscript.

This work was supported by an ERC grant no. 639573, ISF grant no. 1495/18, and JSPS grant no. 21K13830.

Not applicable.

Not applicable.

Not applicable.

We would like to thank Sihuang Hu and Lele Wang for some helpful discussions on the generalization of Lovász theta number.

The authors declare no conflict of interest.

Shannon, C.E. Two-way communication channels. In Proceedings of the 4th Berkeley Symposium Mathematics, Statistics and Probability, Oakland, CA, USA, 20 June–30 July 1961; pp. 611–644. [Google Scholar]
Shannon, C.E. The zero error capacity of a noisy channel. IRE Trans. Inf. Theory 1956, 2, 8–19. [Google Scholar] [CrossRef]
Han, T. A general coding scheme for the two-way channel. IEEE Trans. Inf. Theory 1984, 30, 35–44. [Google Scholar] [CrossRef]
Hekstra, A.P.; Willems, F.J. Dependence balance bounds for single-output two-way channels. IEEE Trans. Inf. Theory 1989, 35, 44–53. [Google Scholar] [CrossRef]
Zhang, Z.; Berger, T.; Schalkwijk, J. New outer bounds to capacity regions of two-way channels. IEEE Trans. Inf. Theory 1986, 32, 383–386. [Google Scholar] [CrossRef]
Weng, J.; Song, L.; Alajaji, F.; Linder, T. Capacity of two-way channels with symmetry properties. IEEE Trans. Inf. Theory 2019, 65, 6290–6313. [Google Scholar] [CrossRef]
Weng, J.; Song, L.; Alajaji, F.; Linder, T. Sufficient conditions for the tightness of Shannon’s capacity bounds for two-way channels. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 1410–1414. [Google Scholar]
Sabag, O.; Permuter, H.H. An achievable rate region for the two-way channel with common output. In Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–5 October 2018; pp. 527–531. [Google Scholar]
Schalkwijk, J. The binary multiplying channel—A coding scheme that operates beyond Shannon’s inner bound region. IEEE Trans. Inf. Theory 1982, 28, 107–110. [Google Scholar] [CrossRef]
Schalkwijk, J. On an extension of an achievable rate region for the binary multiplying channel. IEEE Trans. Inf. Theory 1983, 29, 445–448. [Google Scholar] [CrossRef]
Haemers, W. On some problems of Lovász concerning the Shannon capacity of a graph. IEEE Trans. Inf. Theory 1979, 25, 231–232. [Google Scholar] [CrossRef]
Lovász, L. On the Shannon capacity of a graph. IEEE Trans. Inf. Theory 1979, 25, 1–7. [Google Scholar] [CrossRef]
Holzman, R.; Körner, J. Cancellative pairs of families of sets. Eur. J. Combin. 1995, 16, 263–266. [Google Scholar] [CrossRef][Green Version]
Janzer, B. A new upper bound for cancellative pairs. Electron. J. Combin. 2018, 25, 2–13. [Google Scholar] [CrossRef]
Tolhuizen, L.M. New rate pairs in the zero-error capacity region of the binary multiplying channel without feedback. IEEE Trans. Inf. Theory 2000, 46, 1043–1046. [Google Scholar] [CrossRef]
Gu, Y.; Shayevitz, O. On the non-adaptive zero-error capacity of the discrete memoryless two-way channel. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 3107–3111. [Google Scholar]
Zuiddam, J. The asymptotic spectrum of graphs and the Shannon capacity. Combinatorica 2019, 39, 1173–1184. [Google Scholar] [CrossRef]
Cubitt, T.; Mancinska, L.; Roberson, D.E.; Severini, S.; Stahlke, D.; Winter, A. Bounds on entanglement-assisted source-channel coding via the Lovász theta number and its variants. IEEE Trans. Inform. Theory 2014, 60, 7330–7344. [Google Scholar] [CrossRef]
Blasiak, A. A Graph-Theoretic Approach to Network Coding. Ph.D. Thesis, Cornell University, Ithaca, NY, USA, 2013. [Google Scholar]
Bukh, B.; Cox, C. On a fractional version of Haemers’ bound. IEEE Trans. Inform. Theory 2019, 65, 3340–3348. [Google Scholar] [CrossRef]
Alon, N. The Shannon capacity of a union. Combinatorica 1998, 18, 301–310. [Google Scholar] [CrossRef]
Ahlswede, R. Channels with arbitrarily varying channel probability functions in the presence of noiseless feedback. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1973, 25, 239–252. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 1981. [Google Scholar]
Csiszár, I.; Körner, J. On the capacity of the arbitrarily varying channel for maximum probability of error. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1981, 57, 87–101. [Google Scholar] [CrossRef]
Vrana, P. Probabilistic refinement of the asymptotic spectrum of graphs. Combinatorica 2021. [Google Scholar] [CrossRef]
Marton, K. On the Shannon capacity of probabilistic graphs. J. Comb. Theory Ser. B 1993, 57, 183–195. [Google Scholar] [CrossRef]
Körner, J. Coding of an information source having ambiguous alphabet and the entropy of graphs. In Proceedings of the 6th Prague Conference on Information Theory, Prague, Czech Republic, 1 January 1973; pp. 411–425. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

$y_{1} y_{2}$

$x_{1} x_{2}$

$y_{1} y_{2}$