Two-Party Zero-Error Function Computation with Asymmetric Priors

Basak Guler; Aylin Yener; Prithwish Basu; Ananthram Swami

doi:10.3390/e19120635

,

and

¹

The Pennsylvania State University, University Park, PA 16802, USA

²

Raytheon BBN Technologies, Cambridge, MA 02138, USA

³

Army Research Laboratory, Adelphi, MD 20783, USA

^*

Author to whom correspondence should be addressed.

Entropy2017, 19(12), 635;https://doi.org/10.3390/e19120635

This article belongs to the Special Issue Network Information Theory

Version Notes

Order Reprints

Abstract

We consider a two party network where each party wishes to compute a function of two correlated sources. Each source is observed by one of the parties. The true joint distribution of the sources is known to one party. The other party, on the other hand, assumes a distribution for which the set of source pairs that have a positive probability is only a subset of those that may appear in the true distribution. In that sense, this party has only partial information about the true distribution from which the sources are generated. We study the impact of this asymmetry on the worst-case message length for zero-error function computation, by identifying the conditions under which reconciling the missing information prior to communication is better than not reconciling it but instead using an interactive protocol that ensures zero-error communication without reconciliation. Accordingly, we provide upper and lower bounds on the minimum worst-case message length for the communication strategies with and without reconciliation. Through specializing the proposed model to certain distribution classes, we show that partially reconciling the true distribution by allowing a certain degree of ambiguity can perform better than the strategies with perfect reconciliation as well as strategies that do not start with an explicit reconciliation step. As such, our results demonstrate a tradeoff between the reconciliation and communication rates, and that the worst-case message length is a result of the interplay between the two factors.

Keywords:

data compression; function computation; partial information; characteristic graphs

1. Introduction

Consider a scenario in which two parties make a query over distributed correlated databases. Each party observes data from one database, whereas the query has to be evaluated over the data observed by both users separately. Suppose that one party knows all data combinations that may lead to an answer to some query, whereas the other party is missing some of these combinations. The parties are allowed to communicate with each other. The goal is to find the minimum amount of communication required so that both parties can retrieve the correct answer for any query. We model this scenario as interactive communication in which two parties interact to compute a function of two correlated discrete memoryless sources. Each source is observed by one party. One party knows the true joint distribution of the sources, whereas the other party is missing some source pairs that may occur with positive probability and assumes another distribution in which these missing pairs have zero probability. Communication takes place in multiple interactive rounds, at the end of which a function of the two correlated sources has to be computed at both parties with zero-error. We study the impact of this partial knowledge about the true distribution on the worst-case message length.

In a function computation scenario, one party observes a random variable X, whereas the other party observes a random variable Y, where each realization of

(X, Y)

is generated from some probability distribution

p_{X Y}

. The two parties wish to compute a function

f (X, Y)

by exchanging a number of messages in multiple rounds. Conventionally, the true distribution from which the sources are generated is available as common knowledge to both parties. This work extends this framework to the scenario in which the true distribution of the sources is available at one of the communicating parties only, while the distribution assumed at the other party has missing information compared to the true distribution. That is, the second party has only partial knowledge about the source pairs that are realized with positive probability according to the true distribution.

In order to identify the impact of partial information on the worst-case message length, we consider three interactive communication protocols. The first interactive protocol we consider is to reconcile the partial information between the two parties in a way to allow the second party to learn the true joint distribution, and then utilize the true distribution for function computation. The reconciliation stage transforms the problem into the conventional zero-error function computation problem with zero-error. Although this is a natural approach in that it ensures that both sides are in agreement about the true distribution, this protocol requires additional bits to be transmitted between the two parties for reconciling the distribution information, which, in turn may increase the overall message length. The second protocol we consider provides an alternative interaction strategy in which the two parties do not reconcile the true distribution, but instead use a function computation strategy that allows error-free computation under the distribution uncertainty. In doing so, this protocol alleviates the costs that may have incurred for reconciling the distributions. The message length for the function computation part, however, may be larger compared to that of the previous scheme. The last interaction protocol quantifies a trade-off between the two interaction protocols, by allowing the two parties to partially reconcile the distributions. In this protocol, each party learns the true distribution up to a class of distributions. The function computation step then ensures error-free computation under any distribution within the reconciled class of distributions. By doing so, we create different levels of common knowledge about the distribution to investigate the relation between the cost of various degrees of partial reconciliation and the resulting compression performance.

By leveraging the proposed interaction protocols, we identify the conditions under which it is better or worse to reconcile the partial information than to not reconcile the distributions, i.e., using a zero-error encoding scheme with possibly increased message length. Accordingly, we develop upper and lower bounds on the worst-case zero-error message length for computing the function at both parties under different reconciliation and communication strategies. Our results demonstrate that, reconciling the partial information, although often reducing the communication cost, may or may not reduce the overall worst-case message length. In effect, the worst-case message length results from an interplay between reconciliation and communication costs. As such, partial reconciliation of the true distribution is sometimes strictly better than the remaining two interaction strategies.

Related Work

For the setting when both parties know the true joint distribution of the sources, interactive communication strategies have been studied in [1] to enable both sides to learn the source observed by the other party with zero-error. Reference [2] has considered the impact of the number of interaction rounds on the worst-case message length, as well as upper and lower bounds on the worst-case message length. The optimal zero-error communication strategy for minimizing the worst-case message length, even for the setting in which the communicating parties know the exact true distribution of the sources, has since been an open problem. The zero-error communication problem has also been considered for communicating semantic information [3,4]. Our work is also related to the field of communication complexity, which studies the minimum amount of communication required to compute a function of two sources [5]. Known as the direct-sum theorem, it was shown in [6] that computing multiple instances of a function can reduce the minimum amount of communication required per instance. The main distinction between the communication-complexity approaches and the setups from [1,2] is that the models from [1,2] emphasize utilizing the source distribution and in particular its support set to reduce the amount of communication, which is also referred to as the computation of a partial function ([7], Section 4.7).

In addition to the zero-error setup, interactive communication has also been considered for computing a function at one of the communicating parties with vanishing error probability [8]. Subsequently, interactive communication has been considered for computing a function of two sources simultaneously at both parties with vanishing error probability [9]. The two-party scenario has been extended to a multi-terminal function computation setup in [10], in which each party observes an independent source and broadcasts its message to all the nodes in the network. A related study in [11] investigates the role of side information when communicating interactively a source known by one party to another with vanishing error. Interactive communication has also been leveraged in [11] for one-way recovery of a source known by one party at the other side with vanishing error in the presence of side information.

This work is also related to zero-error communication strategies in non-interactive data compression scenarios. In particular, we leverage graphical representations of the confusable source and distribution terms, which are reminiscent of characteristic graphs introduced in [12] to study the zero-error capacity of a channel. Subsequently, characteristic graphs have been utilized for zero-error compression of a source in the presence of decoder side information [13,14]. They have been utilized to characterize graph entropy and chromatic entropy in [15,16], respectively, in [8] to characterize the rate region for the lossless computation of a function, and in [17] to obtain achievable rates for lossy function computation. Such graphical representations have also been leveraged for non-interactive set reconciliation [18]. Another relevant application is zero-error source coding with compound decoder side information considered in [19].

Many existing and emerging network applications, e.g., sensor networks, cyber-physical systems, social media, and semantic networks, facilitate interaction between multiple terminals to share information towards achieving a common objective [20,21,22,23]. As such, it is essential for such systems to mitigate the ambiguities that may result from the imperfect knowledge available at the communicating parties. The case when the communicating parties assume different prior distributions while communicating a source from one party to another has recently been considered for the non-interactive setting. In [24], communicating a source with vanishing error is considered in the presence of side information when the joint probability distributions assumed at the encoder and the decoder are different. Reference [25] has incorporated shared randomness to facilitate compression when the source distribution assumed by the two parties are different from each other. Deterministic compression strategies are investigated in [26] for the case when no shared randomness is present. In this work, we study interactive function computation with partial priors for the asymmetric scenario when the true joint distribution of the sources is available at one party only [27].

2. Problem Setup

This section introduces our two-party communication setup with asymmetric priors. The following notation is adopted in the sequel. We use

X

for a set with cardinality

| X |

, and define

x^{n} = (x_{1}, \dots, x_{n})

where

x^{1} = x

[28]. The difference between defining a sequence

x^{n} = (x_{1}, \dots, x_{n})

vs. taking the

n^{t h}

power of a given number will be clear from context. We denote

{0, 1}^{*} = \cup_{n = 1}^{\infty} {0, 1}^{n}

. The support set of a distribution

p (x, y)

over a set

X \times Y

is represented as,

supp (p) ≜ {(x, y) \in X \times Y : p (x, y) > 0},

(1)

where

supp (p^{n}) = {(x^{n}, y^{n}) \in X^{n} \times Y^{n} : p (x_{i}, y_{i}) > 0 for i = 1, \dots, n} .

(2)

The chromatic number of a graph G is given by

χ (G)

.

ℓ (\cdot)

represents the length (number of bits) of a bit stream. Finally, for a bipartite graph

G = (V, U, E)

with vertex sets V, U and an edge set E, we let

Δ_{X}

and

Δ_{Y}

denote the maximum degree of any node

v \in V

and

u \in U

, respectively.

2.1. System Model

Consider discrete memoryless correlated sources

(X, Y)

defined over a finite set

X \times Y

. The sources are generated from a distribution

p (x, y) \in P

where

P

is a finite set of probability distributions. Nodes 1 and 2 observe

x^{n} \in X^{n}

and

y^{n} \in Y^{n}

, respectively, with probability

p^{n} (x^{n}, y^{n}) = \prod_{i = 1}^{n} p (x_{i}, y_{i})

. The distribution

p (x, y)

is fixed over the course of n time instants. We refer to

p (x, y)

as the true distribution of sources

(X, Y)

as it represents nature’s selection for the distribution of sources

(X, Y)

. User 1 knows the true distribution

p (x, y)

. The source distribution known to user 2, however, may be different from the true distribution. In particular, user 2 assumes a distribution

q (x, y) \in Q

such that

supp (q) \subseteq supp (p)

where

Q

is a finite set. The set of distributions

P

is known by both users, but the actual selections for

p (x, y)

and

q (x, y)

are only known at the corresponding user. In that sense,

q (x, y)

provides some, although incomplete, information to user 2 about

p (x, y)

.

Each of the two parties is requested to compute a function

f : X \times Y \to F

for each term of the source sequence

(X^{n}, Y^{n})

, which we represent as

f^{n} (x^{n}, y^{n}) ≜ (f (x_{1}, y_{1}), \dots, f (x_{n}, y_{n})),

(3)

where

F

is a finite set. In particular, user 1 recovers some

Z_{1}^{n} \in F^{n}

whereas user 2 recovers some

Z_{2}^{n} \in F^{n}

such that zero-error probability condition

Pr [f^{n} (X^{n}, Y^{n}) \neq Z_{1}^{n}] = P r [f^{n} (X^{n}, Y^{n}) \neq Z_{2}^{n}] = 0,

(4)

is satisfied, which is evaluated over the true distribution

p (x, y)

. Note that, whenever

f (x, y)

is a bijective function, Equation (4) reduces to the conventional zero-error interactive data compression where each source symbol is perfectly recovered at the other party [1].

The two users employ an interactive communication protocol, in which they send binary strings called messages at each round. A codeword represents a sequence of messages exchanged by the two users in multiple rounds. In particular, for an r round communication, the encoding function is given by some variable-length scheme

ϕ : X^{n} \times Y^{n} \to {0, 1}^{*}

for which the codeword

ϕ (x^{n}, y^{n}) = (ϕ_{1} (x^{n}, y^{n}), \dots, ϕ_{r} (x^{n}, y^{n}))

is the sequence of messages exchanged for the pair

(x^{n}, y^{n}) \in S^{n}

, where

ϕ_{i} (x^{n}, y^{n})

represents the message transmitted by both parties at round i and

ϕ^{i} (x^{n}, y^{n}) = (ϕ_{1} (x^{n}, y^{n}), \dots, ϕ_{i} (x^{n}, y^{n}))

denotes the sequence of messages exchanged through the first i rounds for

i \in {1, \dots, r}

. The encoding at each round is based only on the symbols known to the user and on the messages exchanged between the two users in the previous rounds, so that

ϕ_{i} (x^{n}, y^{n}) = (ϕ_{i}^{X} (x^{n}, ϕ^{i - 1} (x^{n}, y^{n})), ϕ_{i}^{Y} (y^{n}, ϕ^{i - 1} (x^{n}, y^{n}))),

(5)

where

ϕ_{i}^{X} (x^{n}, ϕ^{i - 1} (x^{n}, y^{n})) \in {0, 1}^{*}

and

ϕ_{i}^{Y} (y^{n}, ϕ^{i - 1} (x^{n}, y^{n})) \in {0, 1}^{*}

are the messages transmitted from users 1 and 2 at round i, respectively. The encoding protocol is deterministic and agreed upon by both parties in advance. Accordingly, we define

\begin{matrix} ϕ^{X} (x^{n}, y^{n}) = (ϕ_{1}^{X} (x^{n}), \dots, ϕ_{r}^{X} (x^{n}, ϕ^{r - 1} (x^{n}, y^{n}))), \end{matrix}

(6)

and

\begin{matrix} ϕ^{Y} (x^{n}, y^{n}) = (ϕ_{1}^{Y} (y^{n}), \dots, ϕ_{r}^{Y} (y^{n}, ϕ^{r - 1} (x^{n}, y^{n}))), \end{matrix}

(7)

as the sequences of messages transmitted from users 1 and 2, respectively, in r rounds. Another condition is the prefix-free message property to ensure that whenever one user sends a message, the other user knows when the message ends. This necessitates that for all

(x^{n}, y^{n}), (x^{n}, {\hat{y}}^{n}) \in supp (p^{n})

, then

ϕ^{i - 1} (x^{n}, y^{n}) = ϕ^{i - 1} (x^{n}, {\hat{y}}^{n})

for some

i \in {2, \dots, r}

requires that

ϕ_{i}^{Y} (y^{n}, ϕ^{i - 1} (x^{n}, y^{n}))

is not a proper prefix of

ϕ_{i}^{Y} ({\hat{y}}^{n}, ϕ^{i - 1} (x^{n}, {\hat{y}}^{n}))

. Same applies for user 1 when we interchange the roles of X and Y. In addition, we require the coordinated termination criterion to ensure both parties know when communication ends. In particular, given some

(x^{n}, y^{n}), (x^{n}, {\hat{y}}^{n}) \in supp (p^{n})

, we require that

ϕ (x^{n}, y^{n})

is not a proper prefix of

ϕ (x^{n}, {\hat{y}}^{n})

. Same condition applies when the roles of X and Y are interchanged. The last condition we require is the unique message property. In particular, if

(x^{n}, y^{n}), (x^{n}, {\hat{y}}^{n}) \in supp (p^{n})

, then

ϕ^{i - 1} (x^{n}, y^{n}) = ϕ^{i - 1} (x^{n}, {\hat{y}}^{n})

implies that

ϕ_{i}^{X} (x^{n}, ϕ^{i - 1} (x^{n}, y^{n})) = ϕ_{i}^{X} (x^{n}, ϕ^{i - 1} (x^{n}, {\hat{y}}^{n}))

. The same applies when the roles of X and Y are changed. Null transmissions are allowed at any round.

The worst-case codeword length for mapping

ϕ

is given by

l_{ϕ}^{(n)} = max_{(x^{n}, y^{n}) \in supp (p^{n})} \frac{1}{n} ℓ (ϕ (x^{n}, y^{n})) bits / symbol .

(8)

where

ℓ (\cdot)

is the number of bits in a bit stream. The optimal worst-case codeword length is given by

l^{(n)} = min_{ϕ} l_{ϕ}^{(n)} .

(9)

The zero-error condition in Equation (4) ensures that, for any given function, the worst-case codeword length of the optimal communication protocol is the same for all distributions in

P

, i.e., for any

p, p^{'} \in P

, as long as

supp (p) = supp (p^{'})

. We utilize this property for designing interactive protocols by constructing graphical structures as described next. It is useful to note that the results throughout the paper hold when the parties only know the support of the distributions

p (x, y)

and

q (x, y)

in the problem set up considered in this paper as described next. For each

p (x, y) \in P

, we define a bipartite graph

G_{p} = (X, Y, E_{p})

with vertex sets

X

,

Y

, and an edge set

E_{p}

. An edge

(x, y) \in E_{p}

exists if and only if

p (x, y) > 0

.

Observe that we have

G_{p} = G_{p^{'}}

for any

p (x, y), p^{'} (x, y) \in P

with

supp (p) = supp (p^{'})

. One can therefore partition

P

into groups of distributions that have the same support set, such that the set of distributions in each partition maps to a unique bipartite graph. We represent this set of resulting bipartite graphs by

G

, and denote each element

G \in G

by

G = (X, Y, E_{G})

. The bipartite graph structure used for partitioning the distributions in

P

is related to the notion of ergodic decomposition from [29], in that each bipartite graph represents a class of distributions with the same ergodic decomposition. For each

G \in G

, we denote

S_{G}^{n} = {(x^{n}, y^{n}) \in X^{n} \times Y^{n} : (x_{i}, y_{i}) \in E_{G}, i = 1, \dots, n},

(10)

and note that for any distribution

p (x, y) \in P

whose support set can be represented by the bipartite graph G, one has

S_{G}^{n} = supp (p^{n})

.

Given

G \in G

, we define the following sets. For each

x^{n} \in X^{n}

, we define an ambiguity set

I_{X, G} (x^{n}) = {f^{n} (x^{n}, y^{n}) \in F^{n} : (x_{i}, y_{i}) \in E_{G}, y_{i} \in Y, i = 1, \dots, n},

(11)

where each element is a sequence of function values, and

λ_{G} (x^{n}) ≜ | I_{X, G} (x^{n}) |

denotes the number of distinct sequences of function values. Similarly, for each

y^{n} \in Y^{n}

, we define an ambiguity set

I_{Y, G} (y^{n}) = {f^{n} (x^{n}, y^{n}) \in F^{n} : (x_{i}, y_{i}) \in E_{G}, x_{i} \in X, i = 1, \dots, n},

(12)

with

μ_{G} (y^{n}) ≜ | I_{Y, G} (y^{n}) |

. Next, we let

λ_{G} ≜ max_{x \in X} λ_{G} (x),

(13)

and note that

max_{x^{n} \in X^{n}} λ_{G} (x^{n}) = {(λ_{G})}^{n}

. Similarly, we define

μ_{G} ≜ max_{y \in Y} μ_{G} (y),

(14)

and note that

max_{y^{n} \in Y^{n}} μ_{G} (y^{n}) = {(μ_{G})}^{n}

. We denote the maximum vertex degrees for graph G by

Δ_{X} ≜ max_{x \in X} | {y \in Y : (x, y) \in E_{G}} |, Δ_{Y} ≜ max_{y \in Y} | {x \in X : (x, y) \in E_{G}} | .

(15)

Lastly, using Equations (11) and (12), for each

(x^{n}, y^{n}) \in S_{G}^{n}

we define

I_{G} (x^{n}, y^{n}) = I_{X, G} (x^{n}) \cup I_{Y, G} (y^{n}),

(16)

An illustrative example of the bipartite graph is given in Figure 1 for the function

f (x, y) = (x + y) mod 4

and the probability distribution

p (x, y) = \{\begin{matrix} \frac{1}{| X | + 2} & if x = 1 or y = 3 \\ 0 & otherwise \end{matrix}

(17)

over the finite set

X = {1, \dots, 5}

and

Y = {1, \dots, 3}

.

Figure 1. Bipartite graph representation of the probability distribution from Equation (17). Edge labels represent the function values

f (x, y) = (x + y) mod 4

. Note that the maximum vertex degree is

Δ_{X} = 3

for

x \in X

and

Δ_{Y} = 5

for

y \in Y

whereas

λ_{G} = 3

and

μ_{G} = 4

.

Finally, we review a basic property of zero-error interactive protocols, which is key to our analysis in the sequel. The straightforward proof immediately follows, e.g., from ([1], Lemma 1, Corollary 2).

Proposition 1.

Let

{[ϕ_{k} (x^{n}, y^{n})]}_{k = 1}^{r}

be the concatenation of all

ϕ_{k} (x^{n}, y^{n})

for

k = 1, \dots, r

. Then, for each

(x^{n}, y^{n}) \in S_{G}^{n}

, the set of sequences corresponding to the symbols in

I_{G} (x^{n}, y^{n})

should be prefix-free.

Proof.

The proof follows from the following observation. Suppose for some

(x^{n}, y^{n}) \in S_{G}^{n}

, we have

({\hat{x}}^{n}, y^{n}), (x^{n}, {\hat{y}}^{n}) \in S_{G}^{n}

where

{[ϕ_{k} ({\hat{x}}^{n}, y^{n})]}_{k = 1}^{r}

is a prefix of

{[ϕ_{k} (x^{n}, {\hat{y}}^{n})]}_{k = 1}^{r}

. Then, from ([1], Lemma 1), we have

ϕ ({\hat{x}}^{n}, y^{n}) = ϕ (x^{n}, y^{n}) = ϕ (x^{n}, {\hat{y}}^{n})

. Now, if

f^{n} (x^{n}, y^{n}) \neq f^{n} (x^{n}, {\hat{y}}^{n})

, then user 1 will not be able to distinguish between the two function values as the message sequences are the same for both. Similarly, if

f^{n} (x^{n}, y^{n}) \neq f^{n} ({\hat{x}}^{n}, y^{n})

, then user 2 will not be able to distinguish between the two function values. Hence,

{[ϕ_{k} ({\hat{x}}^{n}, y^{n})]}_{k = 1}^{r}

cannot be a prefix of

{[ϕ_{k} (x^{n}, {\hat{y}}^{n})]}_{k = 1}^{r}

whenever

f^{n} ({\hat{x}}^{n}, y^{n}) \neq f^{n} (x^{n}, {\hat{y}}^{n})

. From the same argument,

{[ϕ_{k} (x^{n}, y^{n})]}_{k = 1}^{r}

cannot be a prefix of

{[ϕ_{k} (x^{n}, {\hat{y}}^{n})]}_{k = 1}^{r}

whenever

f (x^{n}, y^{n}) \neq f (x^{n}, {\hat{y}}^{n})

otherwise user 1 will not be able to recover the correct function value. The same applies to user 2 when the roles of X and Y are changed. Therefore, for any given

(x^{n}, y^{n}) \in S_{G}^{n}

, we need at least

| I_{G} (x^{n}, y^{n}) |

prefix-free sequences, one for each element of

I_{G} (x^{n}, y^{n})

. Otherwise, one of the above three cases will occur and at least one user will not be able to distinguish the correct function value. ☐

2.2. Motivating Example

Consider two interacting users, user 1 observing

x \in X = {1, \dots, 7}

and user 2 observing

y \in Y = {1, \dots, 7}

according to the distribution

p (x, y) = \{\begin{matrix} 1 / 5 & if (x, y) \in {(3, 1), (3, 2), (3, 5), (6, 5), (7, 5)} \\ 0 & otherwise \end{matrix}

(18)

where both users want to compute a function of

(x, y) \in X \times Y

f (x, y) = \{\begin{matrix} 0 & if x - y > 0 \\ 1 & if - 1 \leq x - y \leq 0 \\ 2 & otherwise . \end{matrix}

(19)

First, assume that users 1 and 2 both know the distribution

p (x, y)

; we will call this the symmetric priors case. In this case, one can readily observe from Equations (18) and (19) that the function value

f (x, y) = 1

will never occur, hence the two parties can discard that value beforehand. That is, in this case users 1 and 2 know beforehand that they only need to distinguish between two function values,

f (x, y) = 0

, which occurs when

(x, y) \in {(3, 1), (3, 2), (6, 5), (7, 5)}

, and

f (x, y) = 2

, which occurs when

(x, y) = (3, 5)

. We now detail five interaction protocols as follows. The first one is a naïve protocol where user 1 sends x to user 2, and user 2 sends y to user 1, after which both users can compute

f (x, y)

. To do so, users 1 and 2 need

⌈ log 7 ⌉ = 3

bits each, i.e., a total of 6 bits is needed. Second, consider a protocol in which user 1 sends x to user 2, and user 2 calculates

f (x, y)

and sends the result back to user 1. To do so, user 1 needs to use

⌈ log 7 ⌉ = 3

bits. User 2 on the other hand needs to send only

log 2 = 1

bit, since there are at most 2 possible function values. This protocol uses 4 bits in total in two rounds. Same applies to the third protocol where we exchange the roles of users 1 and 2. Since users 1 and 2 know the support set of

p (x, y)

, i.e., the pairs of

(x, y)

for which

p (x, y) > 0

, a fourth protocol would involve sending only

⌈ log 3 ⌉ + ⌈ log 3 ⌉ = 4

bits in total, in which user 1 sends one of

x \in {3, 6, 7}

, whereas user 2 sends one of

y \in {1, 2, 5}

. Lastly, consider a different protocol where user 1 sends “0” if

x \in {6, 7}

, and a “1” otherwise, which is sufficient for user 2 to infer whether

f (x, y) = 0

or

f (x, y) = 2

depending on the y he observes, since

f (x, y) = 1

is not possible with these

(x, y)

values. Therefore, user 2 computes

f (x, y)

and sends the result back to user 1 by using

log 2 = 1

bit. This protocol requires only

log 2 + log 2 = 2

bits in two rounds and at the end both users learn

f (x, y)

. As is clear from this example, communicating all distinct pairs of symbols is not always the best strategy, and resources can be saved by using a more efficient strategy.

Next, consider the following variation on the example. Users 1 and 2 again wish to compute

f (x, y)

given in Equation (19), but this time the joint distribution of the sources

p (x, y)

is selected from a set of distributions

P = {p_{1}, p_{2}, p_{3}}

where

p_{1} (x, y)

is defined as in Equation (18), and we have

p_{2} (x, y) = \{\begin{matrix} 1 / 7 & if (x, y) \in {(3, 2), (3, 3), (3, 4), (3, 5), (4, 5), (5, 5), (6, 5)} \\ 0 & otherwise \end{matrix}

(20)

and

p_{3} (x, y) = \{\begin{matrix} 1 / 5 & if (x, y) \in {(3, 1), (3, 3), (3, 5), (4, 5), (7, 5)} \\ 0 & otherwise . \end{matrix}

(21)

As described in the beginning of Section 2.1, one can represent the structure of these distributions and the corresponding function values via bipartite graphs. Such a bipartite graph for the probability distribution

p_{2} (x, y)

in Equation (20) is given in Figure 2.

Figure 2. Shared bipartite graph with

n = 1

.

X = Y = {1, \dots, 7}

representing the distribution

p_{2} (x, y)

from Equation (20). Edge labels represent the function values

f (x, y)

defined in Equation (19). Maximum vertex degrees are

Δ_{X} = Δ_{Y} = 4

for

x \in X

and

y \in Y

whereas

λ_{G} = μ_{G} = 3

.

User 1 observes

p (x, y)

, i.e., the true distribution. User 2 knows the set

P

, but not the specific choice in

P

. User 2 instead observes a distribution

q (x, y)

from a set

Q = {q_{1}, q_{2}}

.

q_{1} (x, y) = \{\begin{matrix} 1 / 3 & if (x, y) \in {(3, 2), (3, 5), (6, 5)} \\ 0 & otherwise \end{matrix}

(22)

and

q_{2} (x, y) = \{\begin{matrix} 1 / 3 & if (x, y) \in {(3, 1), (3, 5), (7, 5)} \\ 0 & otherwise . \end{matrix}

(23)

User 1 does not know the distribution

q (x, y)

observed at user 2. In addition, the set

Q

is unknown to both users. The only requirement we have is that this distribution be consistent with

p (x, y)

. That is to say that the support of

q (x, y)

is contained in the support of

p (x, y)

, i.e.,

q (x, y)

does not have a positive probability for a source pair whose probability is zero in

p (x, y)

, i.e.,

supp (q) \subseteq supp (p)

. Note that this is side information in that users 1 and 2 can infer which of the

q (x, y)

or

p (x, y)

distributions are possible at the other party, respectively, given their own distribution.

In order to interact in this setup, users 1 and 2 may initially agree to reconcile the distribution and then use it as in the previous case. To do so, user 1 informs user 2 of the true distribution. She assigns an index “0” if

p = p_{1}

, and a “1” if

p \in {p_{2}, p_{3}}

, and sends it to user 2 by using

log 2 = 1

bit. User 2 can infer the true distribution by using the received index as well as his own distribution q. If the received index is “0”, then it immediately follows that the true distribution is

p_{1}

. However, if the received index is “1”, then user 2 needs to decide between

p_{2}

and

p_{3}

. To do so, he utilizes q: (i) whenever

q = q_{1}

, he declares that the true distribution is

p_{2}

, since

supp (q_{1}) ⊈ supp (p_{3})

, (ii) whenever

q = q_{2}

, he decides that the true distribution is

p_{3}

, since in this case

supp (q_{2}) ⊈ supp (p_{2})

. After this step, both users know the true distribution, and can compute

f (x, y)

by exchanging no more than a total number of

⌈ log 3 ⌉ + ⌈ log 3 ⌉ = 4

bits, as detailed next. The case where

p_{1} (x, y)

is the true distribution requires 2 bits for interaction as noted earlier. If the true distribution is

p_{2} (x, y)

, user 1 can send user 2 an index “0” if

x \in {6}

, a “1” if

x \in {4, 5}

, or a “2” otherwise. User 2 can compute

f (x, y)

and send the result back to user 1 by using at most

⌈ log 3 ⌉ = 2

bits, since in the worst-case all three function values may occur, which happens when

x = 3

. Therefore, this case requires 4 bits for communication. If instead the true distribution is

p_{3} (x, y)

, user 1 can send user 2 an index “0” if

x \in {7}

, a “1” if

x \in {4}

, or a “2” otherwise. User 2 can compute

f (x, y)

and send it back to user 1 by using

⌈ log 3 ⌉ = 2

bits, since all three function values are again possible for

x = 3

. Hence, this scheme requires 5 bits to be communicated in total, 1 bit for reconciliation and 4 bits for communication.

An alternative scheme is one in which users 1 and 2 do not reconcile the true distribution, but instead use an encoding scheme that allows error-free communication under any distribution uncertainty. To do so, user 1 sends an index “0” if

x \in {6, 7}

, a “1” if

x \in {4, 5}

, and a “2” otherwise. Describing 3 indices requires user 1 to use

⌈ log 3 ⌉ = 2

bits. After receiving the index value, user 2 can recover

f (x, y)

perfectly, whether the true distribution p is equal to

p_{1}

,

p_{2}

, or

p_{3}

, and then send it to user 1 by using no more than

⌈ log 3 ⌉ = 2

bits, since there are at most 3 distinct values of

f (x, y)

for each

y \in Y

. Both users can then learn

f (x, y)

. Not reconciling the partial information therefore takes 4 bits, which is less then the previous two stage reconciliation-communication protocol.

3. Communication Strategies with Asymmetric Priors

In this section, we propose three strategies for zero-error communication by mitigating the ambiguities resulting from the partial information about the true distribution.

3.1. Perfect Reconciliation

For the communication model described in Section 2.1, a natural approach to tackle the partial information is by first sending the missing information to user 2 so that both sides know the source pairs that may be realized with positive probability with respect to the true distribution, which can then be utilized for communication. This setup consists of two stages. In the first stage, user 2 learns the support set of the true distribution

p (x, y)

, or equally the bipartite graph G corresponding to

p (x, y)

, from user 1. We call this the reconciliation stage. After this stage, both parties use graph G for zero-error interactive communication. We refer to this two-stage protocol as perfect reconciliation in the sequel. The worst-case message length under this setup is referred to as

l_{R}^{(n)}

.

For the reconciliation stage, we first partition

Q

into groups of distributions with distinct support sets, and denote by

B

the set of distinct bipartite graphs that correspond to the support sets of the distributions in

Q

. This process is similar to the one described for

P

in Section 2.1. Next, we find a lower bound for the minimum number of bits required for user 2 to learn the graph G, i.e., all

(x, y)

pairs that may occur with positive probability under the true distribution

p (x, y)

.

Definition 1.

(Reconciliation graph) Define a characteristic graph

R = (G, E_{R})

, in which each vertex represents a graph

G \in G

. Recall that

G

is a set of bipartite graphs as we define in Section 2.1. An edge

(G, G^{'}) \in E_{R}

is defined between vertices G and

G^{'}

if and only if there exists a

B \in B

such that

E_{B} \subseteq E_{G}

and

E_{B} \subseteq E_{G^{'}}

.

The minimum number of bits required for user 2 to perfectly learn G is then

⌈ log χ (R) ⌉

, where

χ (\cdot)

denotes the chromatic number of a graph. This can be observed by noting that in the reconciliation phase, any two nodes in the reconciliation graph with an edge in between has to be assigned to distinct bit streams, otherwise user 2 will not be able to distinguish them, which requires a minimum of

⌈ log χ (R) ⌉

number of bits to be transmitted from user 1 to user 2. It is useful to note that perfect reconciliation incurs a negligible cost for large blocklengths.

Proposition 2.

Perfect reconciliation is an asymptotically optimal strategy.

Proof.

Since the distributions

p (x, y)

and

q (x, y)

are fixed once chosen, reconciliation requires at most

⌈ log | R | ⌉

bits for any class of graphs

G

. Therefore its contribution on the codeword length per symbol is

\frac{1}{n} ⌈ log | R | ⌉

, which vanishes as

n \to \infty

. Since the communication cost for not reconciling the graphs can never be lower than reconciling them, we can conclude that reconciling the graphs first, and then using the reconciled graphs for communication, cannot perform worse than not reconciling them. We note, however, that this statement may no longer hold if the joint distribution is arbitrarily varying over the course of n symbols, since correct recovery in this case may require the graphs to be repeatedly reconciled. ☐

In the following, we demonstrate a lower bound on the worst-case message length for this two-stage reconciliation-communication protocol.

Lemma 1.

A lower bound on the worst-case message length for the two-stage reconciliation-communication protocol is,

l_{R}^{(n)} \geq \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ .

(24)

Proof.

We prove Equation (24) by obtaining a lower bound on the message length for the reconciliation and communication parts separately. The lower bound for the reconciliation part is determined by bounding the minimum number of bits to be transmitted from user 1 to user 2 using Definition 1. As a result, both sides learn the support set of the true distribution

p (x, y)

. The lower bound in Equation (24) then follows from

\begin{matrix} l_{R}^{(n)} & \geq \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} min_{ϕ} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ℓ (ϕ (x^{n}, y^{n})) \end{matrix}

(25)

\begin{matrix} \geq \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} min_{ϕ} \frac{1}{n} ℓ (ϕ (x^{n}, y^{n})) \end{matrix}

(26)

\begin{matrix} = \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} min_{ϕ} \frac{1}{n} ℓ ({[ϕ_{k} (x^{n}, y^{n})]}_{k = 1}^{r}) \end{matrix}

(27)

\begin{matrix} \geq \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ \end{matrix}

(28)

where Equation (26) follows from the min-max inequality and Equation (28) from Proposition 1. ☐

We next demonstrate an upper bound for the minimum worst-case message length. Consider the distribution

p (x, y)

and the corresponding bipartite graph

G \in G

. Let

G_{X}^{n} = (X^{n}, E_{X}^{n})

denote a characteristic graph for user 1 with a vertex set

X^{n}

. Vertices of

G_{X}^{n}

are the n-tuples

x^{n} \in X^{n}

. An edge

(x^{n}, {\hat{x}}^{n}) \in E_{X}^{n}

exists between

x^{n} \in X^{n}

and

{\hat{x}}^{n} \in X^{n}

whenever some

y^{n} \in Y

exists such that

(x^{n}, y^{n}) \in S_{G}^{n}

,

({\hat{x}}^{n}, y^{n}) \in S_{G}^{n}

and

f^{n} (x^{n}, y^{n}) \neq f^{n} ({\hat{x}}^{n}, y^{n})

. Similarly, define a characteristic graph

G_{Y}^{n} = (Y^{n}, E_{Y}^{n})

for user 2 whose vertices are the n-tuples

y^{n} \in Y^{n}

. An edge

(y^{n}, {\hat{y}}^{n}) \in E_{Y}^{n}

exists between

y^{n} \in Y^{n}

and

{\hat{y}}^{n} \in Y^{n}

whenever some

x^{n} \in X^{n}

exists such that

(x^{n}, y^{n}) \in S_{G}^{n}

,

(x^{n}, {\hat{y}}^{n}) \in S_{G}^{n}

, and

f^{n} (x^{n}, y^{n}) \neq f^{n} (x^{n}, {\hat{y}}^{n})

.

The characteristic graphs defined above are useful in that any valid coloring over the characteristic graphs will enable the two parties to resolve the ambiguities in distinguishing the correct function values. Figure 3 illustrates the characteristic graphs

G_{X}^{1}

and

G_{Y}^{1}

, respectively, constructed by using

p_{2} (x, y)

from Equation (20) and

f (x, y)

from Equation (19) in the example discussed in Section 2.2. In the following, we follow the notation

G_{X} ≜ G_{X}^{1}

and

G_{Y} ≜ G_{Y}^{1}

.

Figure 3. Characteristic graphs (a)

G_{X}^{1}

and (b)

G_{Y}^{1}

constructed using distribution

p_{2} (x, y)

in Equation (20) and function

f (x, y)

in Equation (19).

Theorem 1.

The worst-case message length for the two-stage separate reconciliation and communication strategy satisfies

l_{R}^{(n)} \leq \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} \frac{1}{n} \{⌈ n log (χ (G_{X})) ⌉ + ⌈ n log (χ (G_{Y})) ⌉\},

(29)

Proof.

Consider a minimum coloring for

G_{X}

and

G_{Y}

using

χ (G_{X})

and

χ (G_{Y})

colors. Note that

G_{X}^{n}

and

G_{Y}^{n}

can be colored with at most

{(χ (G_{X}))}^{n}

and

{(χ (G_{Y}))}^{n}

colors, respectively. Hence, users 1 and 2 can simultaneously send the index of the color assigned to their symbols by using at most

⌈ n log χ (G_{X}) ⌉

and

⌈ n log χ (G_{Y}) ⌉

bits, respectively. Then, users can utilize the received color index and their own symbols for correct recovery of the function values. ☐

3.2. Protocols that Do Not Explicitly Start with a Reconciliation Procedure

Instead of the reconciliation-based strategy described in Section 3.1, the two users may choose not to reconcile the distributions, but instead utilize a robust communication strategy that ensures zero-error communication under any distribution in set

P

. Specifically, they can agree on a worst-case communication strategy that always ensures zero-error communication for both users. In this section, we study two specific protocols that do not explicitly start with a reconciliation procedure. We denote the worst case message length in this setting as

l_{RF}^{(n)}

.

As an example of such a robust communication strategy, consider a scenario in which user 1 enumerates each

x^{n} \in X^{n}

by using

n log | X |

bits whereas user 2 enumerates each

y^{n} \in Y^{n}

by using

n log | Y |

bits. Then, by using no more than

n log | X | + n log | Y |

bits in total, the two parties can communicate their observed symbols with zero-error under any true distribution, and evaluate

f^{n} (X^{n}, Y^{n})

. In that sense, this setup does not require any additional bits for learning about the distribution from the other side either perfectly or partially, but the message length for communicating the symbols is often higher. In the following, we derive an upper bound on the worst-case message length based on two achievable protocols that do not start with a reconciliation procedure.

The first achievable strategy we consider is based on graph coloring. Let

G_{X, G} = (X, E_{X})

be a characteristic graph for user 1 whose vertex set is

X

. Define an edge

(x, \hat{x}) \in E_{X}

between nodes

x \in X

and

\hat{x} \in X

whenever there exists some

y \in Y

such that

(x, y) \in ⋃_{p \in P} supp (p)

and

(\hat{x}, y) \in ⋃_{p \in P} supp (p)

whereas

f (x, y) \neq f (\hat{x}, y)

. Similarly, define a characteristic graph

G_{Y, G} = (Y, E_{Y})

for user 2 whose vertex set is

Y

. Define an edge

(y, \hat{y}) \in E_{Y}

between vertices

y \in Y

and

\hat{y} \in Y

whenever there exists some

x \in X

such that

(x, y) \in supp (p)

and

(x, \hat{y}) \in supp (p)

for some

p \in P

but

f (x, y) \neq f (x, \hat{y})

. We note the difference between the conditions for constructing

G_{X, G}

and

G_{Y, G}

in that the former is based on the union

⋃_{p \in P}

whereas the latter is based on the existence for some

p \in G

. This difference results from the fact that user 2 does not know the true distribution, hence needs to distinguish the possible symbols from a group of distributions, whereas user 1 has the true distribution, and can utilize it for eliminating the ambiguities for correct function recovery. We note however that both

G_{X, G}

and

G_{Y, G}

depend on

G

. Lastly, we let

χ (G_{X, G})

and

χ (G_{Y, G})

denote the chromatic number of

G_{X, G}

and

G_{Y, G}

, respectively.

Then, under any true distribution

p \in P

, the following communication protocol ensures zero error. Suppose user 1 observes

x^{n}

and user 2 observes

y^{n}

from some distribution

p^{n} (x^{n}, y^{n}) = \prod_{i = 1}^{n} p (x_{i}, y_{i})

. For each

x_{i} \in X

where

i = 1 \dots, n

, user 1 sends the color of

x_{i}

by using no more than

⌈ log χ (G_{X, G}) ⌉

bits. After this step, user 2 can recover

f_{i} (x_{i}, y_{i})

by using

y_{i}

as follows. Given

y_{i}

, user 2 considers the set of all

x_{i} \in X

such that

p (x_{i}, y_{i}) > 0

for some

p \in P

. Note that within this set, each color represents a group of

x_{i} \in X

for which

f_{i} (x_{i}, y_{i})

is equal. Therefore, under any true distribution

p \in P

, user 2 will be able to recover the correct

f_{i} (x_{i}, y_{i})

value solely by using the received color along with

y_{i}

. Similarly, for each

y_{i} \in Y

, user 2 sends the color of

y_{i}

by using no more than

⌈ log χ (G_{Y, G}) ⌉

bits, after which user 1 recovers

f_{i} (x_{i}, y_{i})

by using the received color and the true distribution

p (x, y)

. Since user 1 knows the true distribution, it can distinguish any function value correctly as long as no two

y, y^{'} \in Y

are assigned to the same codeword for which

\exists x \in X

such that

(x, y) \in supp (p)

and

(x, y^{'}) \in supp (p)

when

f (x, y) \neq f (x, y^{'})

.

We then have the following upper bound on the worst-case message length,

l_{RF}^{(n)} \leq \frac{1}{n} (n ⌈ log χ (G_{X, G}) ⌉ + n ⌈ log χ (G_{Y, G}) ⌉) = ⌈ log χ (G_{X, G}) ⌉ + ⌈ log χ (G_{Y, G}) ⌉ bits / symbol,

(30)

where user 1 sends

n ⌈ log (χ (G_{X, G})) ⌉

bits to user 2 whereas user 2 sends

n ⌈ log (χ (G_{Y, G})) ⌉

bits to user 1. After this step, both users can recover the correct function values

f^{n} (x^{n}, y^{n})

for any source pair

(x^{n}, y^{n}) \in supp (p^{n})

under any

p \in P

.

The second achievable strategy we consider is based on perfect hash functions. A function

h : {1, \dots, N} \to {1, \dots, k}

is called a perfect hash function for a set

S \subseteq {1, \dots, N}

if for all

x, y \in S

such that

x \neq y

, one has

h (x) \neq h (y)

. Define a family of functions

H

such that

h : {1, \dots, N} \to {1, \dots, k}

for all

h \in H

. If

M \geq s (ln N) e^{s^{2} / k}

(31)

for some

k \geq s

, then, there exists a family of

| H | = M

functions such that for every

S \subseteq {1, \dots, N}

with

| S | \leq s

, there exists a function

h \in H

that is injective (one-to-one) over

S

([30], Section III.2.3). Perfect hash functions have been proved to be useful for constructing zero-error interactive communication protocols when the true distribution of the sources are known by both parties [7]. In the following, we extend the interactive communication framework from [7] to the setting when the true distribution is unknown by the communicating parties.

Initially, we construct a graph

{\bar{G}}^{(n)} = (V, E)

for user 2 with a vertex set

V = Y^{n}

. In that sense, each vertex of the graph is an n-tuple

y^{n} \in Y^{n}

. Define an edge

(y^{n}, {\hat{y}}^{n}) \in E

between vertices

y^{n} \in Y^{n}

and

{\hat{y}}^{n} \in Y^{n}

if for some n-tuple

x^{n} \in X^{n}

that there exists some

p \in P

for which

(x_{i}, y_{i}) \in supp (p)

and

(x_{i}, {\hat{y}}_{i}) \in supp (p)

for all

i = 1, \dots, n

. Define a minimum coloring of this graph and let

χ ({\bar{G}}^{(n)})

denote the minimum number of required colors, i.e., the chromatic number of

{\bar{G}}^{(n)}

. In that sense, any valid coloring over this graph will enable user 1 to resolve the ambiguities in distinguishing the correct n-tuple observed by user 2, under any true distribution

p \in P

.

We next define the following ambiguity set for each

x^{n} \in X^{n}

,

I_{X} (x^{n}) ≜ {y^{n} \in Y^{n} : (x_{i}, y_{i}) \in ⋃_{p \in P} supp (p) for i = 1, \dots, n},

(32)

as the set of distinct

y^{n}

sequences that may occur with respect to the support set

⋃_{p \in P} supp (p)

under the given sequence

x^{n}

. We denote the size of the largest single-term ambiguity set as,

λ ≜ max_{x \in X} | I_{X} (x) |,

(33)

and note that

{max}_{x^{n} \in X^{n}} | I_{X} (x^{n}) | = λ^{n}

. Lastly, we define an ambiguity set for each

y^{n} \in Y^{n}

,

I_{Y} (y^{n}) ≜ {f^{n} (x^{n}, y^{n}) \in F^{n} : x_{i} \in X and (x_{i}, y_{i}) \in ⋃_{p \in P} supp (p) for i = 1, \dots, n},

(34)

as the set of distinct function values that may appear for the given sequence

y^{n}

and with respect to the support set

⋃_{p \in P} supp (p)

. We denote the size of the largest single-term ambiguity set as

μ ≜ max_{y \in Y} | I_{Y} (y) |,

(35)

and note that

{max}_{y^{n} \in Y^{n}} | I_{Y} (y^{n}) | \leq μ^{n}

.

The interaction protocol is then given as follows. From Equation (31), there exists a family

H

of

| H | = ⌈ λ^{n} (log χ ({\bar{G}}^{(n)})) e ⌉

(36)

functions such that

h : {1, \dots, χ ({\bar{G}}^{(n)})} \to {1, \dots, λ^{2 n}}

for all

h \in H

and for each

S \subseteq {1, \dots, χ ({\bar{G}}^{(n)})}

of size

| S | \leq λ^{n}

, there exists an

h \in H

that is injective over

S

. In that sense, the colors assigned to an ambiguity set

I_{X} (x^{n})

for some

x^{n} \in X^{n}

will correspond to some

S

. Both users initially agree on such a family of functions

H

and a minimum coloring of graph

{\bar{G}}^{(n)}

with

χ ({\bar{G}}^{(n)})

colors. Suppose user 1 observes

x^{n} \in X^{n}

and user 2 observes

y^{n} \in Y^{n}

. User 1 finds a function

h \in H

that is injective over the colors assigned to vertices

y^{n} \in I_{X} (x^{n})

from Equation (32) and sends its index to user 2 by using no more than

\begin{matrix} ⌈ log | H | ⌉ & = ⌈ log ⌈ λ^{n} (log χ ({\bar{G}}^{(n)})) e ⌉ ⌉ \end{matrix}

(37)

bits in total. After this step, user 2 evaluates the corresponding function for the assigned color of

y^{n}

and sends the evaluated value back to user 1 by using no more than

⌈ log λ^{2 n} ⌉

bits. After this step, user 1 will learn the color of

y^{n}

, from which it can recover

y^{n}

by using the observed

x^{n}

. This is due to the fact that from the definition of an ambiguity set

I_{X} (x^{n})

in Equation (32), every n-tuple

y^{n} \in I_{X} (x^{n})

for a given

x^{n} \in X^{n}

will receive a different color in the minimum coloring of the graph

{\bar{G}}^{(n)}

. Since the selected perfect hash function is one-to-one over the colors assigned to

y^{n} \in I_{X} (x^{n})

, it will allow user 1 to recover the color of

y^{n}

from the evaluated hash function value. In the last step, user 1 evaluates the function

f^{n} (x^{n}, y^{n})

, and sends it to user 2 by using no more than

⌈ log μ^{n} ⌉

bits. In doing so, she assigns a distinct index for each sequence of function values in the ambiguity set

I_{Y} (y^{n})

from Equation (34). User 2 can then recover the function

f^{n} (x^{n}, y^{n})

by using

y^{n}

and the received index. Overall, this protocol requires no more than

⌈ log ⌈ λ^{n} (log χ ({\bar{G}}^{(n)})) e ⌉ ⌉ + ⌈ 2 n log λ ⌉ + ⌈ n log μ ⌉

(38)

bits to be transmitted in total, therefore

\begin{matrix} l_{NR}^{(n)} & \leq \frac{1}{n} (⌈ log ⌈ λ^{n} (log χ ({\bar{G}}^{(n)})) e ⌉ ⌉ + ⌈ 2 n log λ ⌉ + ⌈ n log μ ⌉) \end{matrix}

(39)

\begin{matrix} \leq \frac{1}{n} (⌈ log (λ^{n} (log χ ({\bar{G}}^{(n)})) e + 1) ⌉ + 2 n log λ + 1 + n log μ + 1) \end{matrix}

(40)

\begin{matrix} \leq \frac{1}{n} (log (λ^{n} (log χ ({\bar{G}}^{(n)})) e + 1) + 2 n log λ + n log μ + 3) \end{matrix}

(41)

\begin{matrix} \leq \frac{1}{n} (log (λ^{n} (log χ ({\bar{G}}^{(n)})) e) + log (1 + \frac{1}{λ^{n} (log χ ({\bar{G}}^{(n)})) e}) + 2 n log λ + n log μ + 3) \end{matrix}

(42)

\begin{matrix} \leq 3 log λ + log μ + \frac{1}{n} log log χ ({\bar{G}}^{(n)}) + \frac{4 + log e}{n} \end{matrix}

(43)

\begin{matrix} \leq 3 log λ + log μ + \frac{1}{n} log n log χ ({\bar{G}}^{(1)}) + \frac{4 + log e}{n} \end{matrix}

(44)

\begin{matrix} \leq 3 log λ + log μ + \frac{1}{n} log log χ ({\bar{G}}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n} \end{matrix}

(45)

where Equation (42) follows from the fact that

\frac{1}{λ^{n} (log χ ({\bar{G}}^{(n)})) e} \leq 1

, and Equation (44) holds since

χ ({\bar{G}}^{(n)}) \leq χ^{n} ({\bar{G}}^{(1)})

. This is due to the fact that any coloring over the nth order strong product of

{\bar{G}}^{(1)}

is also a valid coloring for

{\bar{G}}^{(n)}

, since by construction of

{\bar{G}}^{(n)}

, any edge that exists in

{\bar{G}}^{(n)}

also exists in the nth order strong product of

{\bar{G}}^{(1)}

. Therefore, the chromatic number of

{\bar{G}}^{(n)}

is no greater than the chromatic number of the nth order product of

{\bar{G}}^{(1)}

, which is no greater than

χ^{n} ({\bar{G}}^{(1)})

.

Combining the bounds obtained from the two protocols from Equations (30) and (45), we have the following upper bound on the worst-case message length.

Proposition 3.

The worst-case message length for the two strategies that do not explicitly start with a reconciliation procedure can be upper bounded as,

l_{RF}^{(n)} \leq min \{⌈ log χ (G_{X, G}) ⌉ + ⌈ log χ (G_{Y, G}) ⌉, 3 log λ + log μ + \frac{1}{n} log log χ ({\bar{G}}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n}\} .

(46)

Proof.

The result follows from combining the two interaction strategies in Equations (30) and (45). ☐

3.3. Partial Reconciliation

In order to understand the impact of level of reconciliation on the worst-case message length, we consider a third scheme called partial reconciliation, which allows user 2 to distinguish the true distribution up to a class of distributions, after which the two users use a robust worst-case communication protocol that allows for zero-error communication in the presence of any distribution within the class. In that sense, partial reconciliation allows some ambiguity in the reconciled set of distributions. Accordingly, the schemes considered in Section 3.1 and Section 3.2 are special cases of the partial reconciliation scheme. We denote

l_{PR}^{(n)}

as the per-symbol worst-case message length for a finite block of n source symbols under the partial reconciliation scheme. In the following, we demonstrate two protocols for interactive communication with partial reconciliation. The first protocol is based on coloring characteristic graphs, whereas the second protocol is based on perfect hash functions. We then derive an upper bound on the worst-case message length with partial reconciliation.

For the first partial reconciliation protocol, consider the set

G

of bipartite graphs

G = (X, Y, E_{G})

constructed by using the distributions

p \in P

as described in Section 2.1. Define a partition of the set

G

as

A = {A_{1}, A_{2}, \dots, A_{| A |}}

such that

⋃_{i = 1}^{| A |} A_{i} = G

and

A_{i} \cap A_{j} = \emptyset

for all

i \neq j

, where

A_{i}

is non-empty for

i \in {1, \dots, | A |}

. Define

\bar{A}

as the set of all such partitions of

G

.

Fix a partition

A \in \bar{A}

. For each

i \in {1, \dots, | A |}

, define a graph

G_{X, A_{i}} = (X, E_{X})

for user 1 with the vertex set

X

. Define an edge

(x, \hat{x}) \in E_{X}

between nodes

x \in X

and

\hat{x} \in X

if there exists some

y \in Y

such that

(x, y) \in ⋃_{G \in A_{i}} G

and

(\hat{x}, y) \in ⋃_{G \in A_{i}} G

whereas

f (x, y) \neq f (\hat{x}, y)

. Next, construct a graph

G_{Y, A_{i}} = (Y, E_{Y})

for user 2 with the vertex set

Y

. Define an edge

(y, \hat{y})

between nodes

y \in Y

and

\hat{y} \in Y

if there exists some

x \in X

such that

(x, y) \in E_{G}

and

(x, \hat{y}) \in E_{G}

for some

G \in A_{i}

but

f (x, y) \neq f (x, \hat{y})

. Let

χ (G_{X, A_{i}})

and

χ (G_{Y, A_{i}})

denote the chromatic number of

G_{X, A_{i}}

and

G_{Y, A_{i}}

, respectively.

Then, under any true distribution

p \in P

, the following communication protocol ensures zero error. The two users agree on a partition

A \in \bar{A}

before communication starts. Suppose users 1 and 2 observe

x^{n}

and

y^{n}

, respectively, under the true distribution

p (x, y)

. Let

G = (X, Y, E_{G})

denote the bipartite graph corresponding to the distribution

p (x, y)

. Initially, user 1 sends the index i of the set

A_{i} \in A

for which

G \in A_{i}

, by using no more than

⌈ log | A | ⌉

bits. After this step, user 1 sends the color of each symbol in

x^{n}

according to the minimum coloring of graph

G_{X, A_{i}}

by using no more than

n ⌈ log χ (G_{X, A_{i}}) ⌉

bits in total. By using the sequence of colors received from user 1, user 2 can determine the correct function values

f^{n} (x^{n}, y^{n})

. In the last step, user 2 sends the color of each symbol in

y^{n}

according to graph

G_{Y, A_{i}}

by using no more than

n ⌈ log χ (G_{Y, A_{i}}) ⌉

bits. After this step, user 1 can recover the function values

f^{n} (x^{n}, y^{n})

. Overall, this protocol requires no more than

\begin{matrix} ⌈ log | A | ⌉ + max_{i \in {1, \dots, | A |}} n (⌈ log χ (G_{X, A_{i}}) ⌉ + ⌈ log χ (G_{Y, A_{i}}) ⌉), \end{matrix}

(47)

bits to be transmitted. Since one can leverage any partition within

\bar{A}

for constructing the communication protocol, we conclude that the worst-case message length for partial reconciliation is bounded above by,

\begin{matrix} l_{PR}^{(n)} \leq & min_{A \in \bar{A}} (\frac{1}{n} ⌈ log | A | ⌉ + max_{i \in {1, \dots, | A |}} (⌈ log χ (G_{X, A_{i}}) ⌉ + ⌈ log χ (G_{Y, A_{i}}) ⌉)) . \end{matrix}

(48)

For the second partial reconciliation protocol, we again leverage perfect hash functions from Equation (31). As in the first protocol, we define a partition of the set

G

as

A = {A_{1}, A_{2}, \dots, A_{| A |}}

such that

⋃_{i = 1}^{| A |} A_{i} = G

and

A_{i} \cap A_{j} = \emptyset

for all

i \neq j

. We let

\bar{A}

be the set of all such partitions of

G

.

We fix a partition

A \in \bar{A}

of

G

. For each

i \in {1, \dots, | A |}

, we define a graph

{\bar{G}}_{i}^{(n)} = (Y^{n}, E)

with the vertex set

Y^{n}

. We define an edge

(y^{n}, {\hat{y}}^{n}) \in E

between two vertices

y^{n} \in Y^{n}

and

{\hat{y}}^{n} \in Y^{n}

if there exists some

x^{n} \in X^{n}

such that

(x_{j}, y_{j}) \in ⋃_{G \in A_{i}} E_{G}

and

(x_{j}, {\hat{y}}_{j}) \in ⋃_{G \in A_{i}} E_{G}

for

j = 1, \dots, n

. We denote the chromatic number of

{\bar{G}}_{i}^{(n)}

by

χ ({\bar{G}}_{i}^{(n)})

.

We define an ambiguity set for each

x^{n} \in X^{n}

,

I_{i}^{X} (x^{n}) ≜ {y^{n} \in Y^{n} : (x_{j}, y_{j}) \in ⋃_{G \in A_{i}} E_{G} for j = 1, \dots, n}

(49)

where the size of the largest single-term ambiguity set is given as,

λ_{i} ≜ max_{x \in X} | I_{i}^{X} (x) |,

(50)

and note that

{max}_{x^{n} \in X^{n}} | I_{i}^{X} (x^{n}) | \leq λ_{i}^{n}

. Next, we define an ambiguity set for each

y^{n} \in Y^{n}

,

I_{i}^{Y} (y^{n}) ≜ {f^{n} (x^{n}, y^{n}) \in F^{n} : x_{j} \in X and (x_{j}, y_{j}) \in ⋃_{G \in A_{i}} E_{G} for j = 1, \dots, n}

(51)

and define the size of the largest single-term ambiguity set as,

μ_{i} ≜ max_{y \in Y} | I_{i}^{Y} (y) |,

(52)

where

{max}_{y^{n} \in Y^{n}} | I_{i}^{Y} (y^{n}) | \leq μ_{i}^{n}

. Given

i \in {1, \dots, | A |}

, from Equation (31), there exists a family

H

of

| H | = ⌈ λ_{i}^{n} (log χ ({\bar{G}}_{i}^{(n)})) e ⌉

(53)

functions such that

h : {1, \dots, χ ({\bar{G}}_{i}^{(n)})} \to {1, \dots, λ_{i}^{2 n}}

for all

h \in H

and for each

S \subseteq {1, \dots, χ ({\bar{G}}_{i}^{(n)})}

of size

| S | \leq λ_{i}^{n}

, there exists an

h \in H

injective over

S

. For each

i \in {1, \dots, | A |}

, the two users agree on a family of functions

H

and a coloring of graph

{\bar{G}}_{i}^{(n)}

with

χ ({\bar{G}}_{i}^{(n)})

colors. Suppose user 1 observes

x^{n} \in X^{n}

and user 2 observes

y^{n} \in Y^{n}

. User 1 sends the index of the partition for p to user 2 by using no more than

⌈ log | A | ⌉

bits. User 1 then finds a function

h \in H

that is injective over the colors of the vertices

y^{n} \in I_{i}^{X} (x^{n})

from Equation (49) and sends its index to user 2 by using no more than

\begin{matrix} ⌈ log | H | ⌉ & = ⌈ log ⌈ λ_{i}^{n} (log χ ({\bar{G}}_{i}^{(n)})) e ⌉ ⌉ \end{matrix}

(54)

bits. User 2 then evaluates the corresponding function for the assigned color of

y^{n}

and sends it back to user 1 by using no more than

⌈ log λ_{i}^{2 n} ⌉

bits. After this step, user 1 learns the color of

y^{n}

, from which it recovers

y^{n}

by using the observed

x^{n}

. User 1 then evaluates the function

f^{n} (x^{n}, y^{n})

, and sends it to user 2 by using no more than

⌈ log μ_{i}^{n} ⌉

bits. In doing so, she assigns a distinct index for each sequence of function values in the ambiguity set

I_{i}^{Y} (y^{n})

from Equation (34). User 2 can then recover the function

f^{n} (x^{n}, y^{n})

by using

y^{n}

and the received index. Overall, this protocol requires no more than

⌈ log ⌈ λ_{i}^{n} (log χ ({\bar{G}}_{i}^{(n)})) e ⌉ ⌉ + ⌈ 2 n log λ_{i} ⌉ + ⌈ n log μ_{i} ⌉

(55)

bits to be transmitted in total, therefore

\begin{matrix} l_{PR}^{(n)} & \leq min_{A \in \bar{A}} \frac{1}{n} (⌈ log | A | ⌉ + max_{i \in {1, \dots, | A |}} (⌈ log ⌈ λ_{i}^{n} (log χ ({\bar{G}}_{i}^{(n)})) e ⌉ ⌉ + ⌈ 2 n log λ_{i} ⌉ + ⌈ n log μ_{i} ⌉)) \end{matrix}

(56)

\begin{matrix} \leq min_{A \in \bar{A}} (\frac{1}{n} ⌈ log | A | ⌉ + max_{i \in {1, \dots, | A |}} (3 log λ_{i} + log μ_{i} + \frac{1}{n} log log χ ({\bar{G}}_{i}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n})) \end{matrix}

(57)

Combining the bounds obtained from the two protocols in Equations (48) and (57), we have the following upper bound on the worst-case message length with partial reconciliation,

\begin{matrix} l_{PR}^{(n)} \leq & min_{A \in \bar{A}} (\frac{1}{n} ⌈ log | A | ⌉ + min {max_{i \in {1, \dots, | A |}} (⌈ log χ (G_{X, A_{i}}) ⌉ + ⌈ log χ (G_{Y, A_{i}}) ⌉), max_{i \in {1, \dots, | A |}} (3 log λ_{i} \\ + log μ_{i} + \frac{1}{n} log log χ ({\bar{G}}_{i}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n})}) . \end{matrix}

(58)

At the outset, partial reconciliation characterizes the interplay between reconciliation and communication costs. In order to understand this inherent reconciliation-communication trade-off, we next identify the cases for which reconciling the missing information is better or worse than not reconciling them. To do so, we provide sufficient conditions under which reconciliation-based strategies can outperform the strategies that do not start with a reconciliation procedure, and vice versa, and show that either strategy can outperform the other. Finally, we demonstrate that partial reconciliation can strictly outperform both.

4. Cases in which Strategies that Do Not Start with a Reconciliation Procedure is Better than Perfect Reconciliation

In this section, we demonstrate that strategies with no explicit reconciliation step can be strictly better than perfect reconciliation.

Proposition 4.

Strategies that do not start with an explicit reconciliation procedure is better than perfect reconciliation if

\begin{matrix} \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ \\ > min \{⌈ log χ (G_{X, G}) ⌉ + ⌈ log χ (G_{Y, G}) ⌉, 3 log λ + log μ + \frac{1}{n} log log χ ({\bar{G}}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n}\} . \end{matrix}

(59)

Proof.

The result follows from comparing the lower bound on the number of bits required for the perfect reconciliation setting from Equation (24) with the upper bound from Equation (58). ☐

Corollary 1.

Strategies with no explicit reconciliation step can strictly outperform perfect reconciliation.

Proof.

Consider a scenario in which there exists a parent distribution

p^{*} \in P

such that

supp (p) \subseteq supp (p^{*})

for all

p \in P

, then, reconciliation cannot perform better than the strategies with no explicit reconciliation step. This immediately follows from: (i) any zero-error communication strategy for

p^{*}

is a valid strategy with no explicit reconciliation step, since

⋃_{p \in P} supp (p) = supp (p^{*})

, (ii) any perfect reconciliation scheme should ensure a valid zero-error communication strategy for

p^{*}

, as it may appear as the true distribution. Therefore, reconciling distributions cannot decrease the overall message length. Suppose that there exists some

q \in Q

for which

supp (q) \subseteq supp (p)

for all

p \in P

. Then, Corollary 1 holds whenever

| P | > 1

. ☐

We next consider the following example to elaborate on the impact of overlap between the edges of bipartite graphs on the worst-case message length. To do so, we let

n = 1

and investigate the following class of graphs.

Definition 2.

(Z-Graph) Consider a class of graphs

G

for which there exists a single

(x, y) \in X \times Y

such that

(x, y) \in E_{G}

for all

G \in G

. Additionally, assume that for any

(\hat{x}, \hat{y}) \in X \times Y

such that

(\hat{x}, \hat{y}) \in E_{G}

for some

G \in G

, then either

x = \hat{x}

or

y = \hat{y}

. In that sense, the structure of these graphs resemble a Z shape, hence we refer to them as Z-graphs. For this class of graphs,

λ_{G} = λ_{G} (x)

and

μ_{G} = μ_{G} (y)

for any

G \in G

.

Lemma 2.

Consider the class of graphs defined in Definition 2. For this class of graphs, the worst-case message length for strategies with no explicit reconciliation step satisfies,

l_{R F}^{(1)} \leq ⌈ log χ (G_{Y, G}) ⌉ + ⌈ log μ_{G} ⌉

(60)

where

⌈ log χ (G_{Y, G}) ⌉

is defined in Section 3.2 and

μ_{G} = {max}_{y \in Y} | \cup_{G \in G} I_{Y, G} (y) |

such that

I_{Y, G} (y)

is as given in Equation (12).

Proof.

Consider the following encoding scheme. Group all the neighbors

x^{'} \in X

of y in

\cup_{G \in G} G

that lead to the same function value

f (x^{'}, y)

. Assign a single distinct codeword to each of these groups. User 1 sends the corresponding codeword to user 2, which requires no more than

⌈ log μ_{G} ⌉

bits, after which user 2 can recover the correct function value. Next, construct the graph

G_{Y, G}

as defined in Section 3.2. Find the minimum coloring of

G_{Y, G}

, and assign a distinct codeword to each of the colors. User 2 then sends the corresponding codeword to user 1, by using no more than

⌈ log χ (G_{Y, G}) ⌉

bits. Note that user 1 can infer the correct function value after this step, as she already knows the bipartite graph G that corresponds to the true distribution and given x and G, each color represents a distinct function value. ☐

Example 1.

Consider the framework of Section 3.1 along with a class of Z-graphs

G = {G_{1}, G_{2}}

and

B = {B_{1}, B_{2}}

. That is,

G_{1}, G_{2}, B_{1}, B_{2}

share an edge

(x, y) \in X \times Y

such that

(x, y) \in E_{G_{1}}, E_{G_{2}}, E_{B_{1}}, E_{B_{2}}

. Moreover, for any other edge

(\hat{x}, \hat{y}) \in X \times Y

, either

\hat{x} = x

or

\hat{y} = y

. Assume that

f (x, y)

is distinct for each edge

(x, y)

in

G

. Let

ω ≜ | {(\hat{x}, y) : (\hat{x}, y) \in E_{G_{1}}, E_{G_{2}}, \hat{x} \in X} |

(61)

represent the number of common edges, i.e., overlap, between

G_{1}

and

G_{2}

, where

1 \leq ω \leq min {μ_{G_{1}}, μ_{G_{2}}}

. Note that the overlap between

G_{1}

and

G_{2}

can only consist of the edges that share the endpoint y. We consider the following four cases that may occur for the relations between the structures of the graphs

G_{1}, G_{2}, B_{1}, B_{2}

.

$E_{B_{1}} \subseteq E_{G_{1}}$ , $E_{B_{2}} ⊈ E_{G_{1}}$ , $E_{B_{1}} ⊈ E_{G_{2}}$ , $E_{B_{2}} \subseteq E_{G_{2}}$ . In this case, no reconciliation is always better than reconciliation, because whenever user 2 observes $B_{1}$ (respectively $B_{2}$ ), he can infer that user 1 knows $G_{1}$ (respectively $G_{2}$ ).
$E_{B_{1}} \subseteq E_{G_{1}}$ , $E_{B_{2}} \subseteq E_{G_{1}}$ , $E_{B_{1}} ⊈ E_{G_{2}}$ , $E_{B_{2}} ⊈ E_{G_{2}}$ . In this case, no reconciliation is again optimal as user 2 can infer that user 1 is knows $G_{1}$ whenever he observes $B_{1}$ or $B_{2}$ .
$E_{B_{1}} ⊈ E_{G_{1}}$ , $E_{B_{2}} ⊈ E_{G_{1}}$ , $E_{B_{1}} \subseteq E_{G_{2}}$ , $E_{B_{2}} \subseteq E_{G_{2}}$ . Then, no reconciliation is again optimal as user 2 can infer that user 1 knows $G_{2}$ if she observes $B_{2}$ or $B_{1}$ .
$E_{B_{1}} ⊈ E_{G_{1}}$ , $E_{B_{2}} \subseteq E_{G_{1}}$ , $E_{B_{1}} \subseteq E_{G_{2}}$ , $E_{B_{2}} \subseteq E_{G_{2}}$ . In this case, the chromatic number of the reconciliation graph is given by $χ (R) = 2$ from Definition 1. Then the worst-case message length for the perfect reconciliation scheme satisfies,

$\begin{matrix} l_{R}^{(1)} & \geq 1 + max {⌈ log (λ_{G_{1}} + μ_{G_{1}} - 1) ⌉, ⌈ log λ_{G_{2}} + μ_{G_{2}} - 1 ⌉} . \end{matrix}$

(62)

which follows from Lemma 1. On the other hand, we find that the worst-case message length for the no reconciliation scheme satisfies

$\begin{matrix} l_{R F}^{(1)} & \leq max {⌈ log (λ_{G_{1}}) ⌉, ⌈ log (λ_{G_{2}}) ⌉} + ⌈ log (μ_{G_{1}} + μ_{G_{2}} - ω) ⌉, \end{matrix}$

(63)

which follows from Lemma 2 and the following coloring scheme. Suppose $max {λ_{G_{1}}, λ_{G_{2}}} = λ_{G_{1}}$ . Using $λ_{G_{1}}$ colors, assign each $\hat{y} \in Y$ that is connected to x in $G_{1}$ a distinct color. Next, take $λ_{G_{2}}$ of these colors excluding the color assigned to node y, and color each $\hat{y} \neq y$ that is connected to x in $G_{2}$ with a distinct color. Note that this is a valid coloring since there are only two bipartite graphs $G_{1}$ and $G_{2}$ , corresponding to two cliques whose sizes are $λ_{G_{1}}$ and $λ_{G_{2}}$ in the characteristic graph and the only common node between these two cliques is y. Furthermore, no edge exists across the two cliques. Hence, no reconciliation is better than perfect reconciliation whenever

$\begin{matrix} max {⌈ log λ_{G_{1}} ⌉, ⌈ log λ_{G_{2}} ⌉} + ⌈ log (μ_{G_{1}} + μ_{G_{2}} - ω) ⌉ \\ < 1 + max {⌈ log (λ_{G_{1}} + μ_{G_{1}} - 1) ⌉, ⌈ log (λ_{G_{2}} + μ_{G_{2}} - 1) ⌉} . \end{matrix}$

(64)

As an example, consider the graphs illustrated in Figure 4 for which $λ_{G_{1}} = λ_{G_{2}} = μ_{G_{1}} = μ_{G_{2}} = 2$ and $ω = μ_{G_{1}} = μ_{G_{2}}$ . The corresponding characteristic graph and coloring of $G_{Y, G}$ is illustrated in Figure 5. For this case, we observe that no reconciliation is always better than reconciliation.

Figure 4. Example graphs $G = {G_{1}, G_{2}}$ and $B = {B_{1}, B_{2}}$ , where $λ_{G_{1}} = λ_{G_{2}} = μ_{G_{1}} = μ_{G_{2}} = 2$ .

Figure 5. Coloring of the characteristic graph $G_{Y, G}$ .

We note that the performance of a particular communication strategy with respect to others greatly depends on the structure of the partial information as well as the true probability distribution of the observed symbols. In the following section, we show that there exist cases for which reconciling the true distribution only partially can lead to better worst-case message length then both the strategies from Section 3.1 and Section 3.2, indicating that the best communication strategy under partial information may result from a balance between reconciliation and communication costs.

5. Cases in Which Partial Reconciliation is Better

We now investigate the conditions under which partially reconciling the graph information is better than perfect reconciliation. To do so, we initially compare the perfect and partial reconciliation strategies.

Proposition 5.

Partial reconciliation is better than perfect reconciliation if

\begin{matrix} \frac{⌈ log χ (R) ⌉}{n} + max_{G \in G} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ \\ > min_{A \in \bar{A}} (\frac{1}{n} ⌈ log | A | ⌉ + min {max_{i \in {1, \dots, | A |}} (⌈ log χ (G_{X, A_{i}}) ⌉ + ⌈ log χ (G_{Y, A_{i}}) ⌉), max_{i \in {1, \dots, | A |}} (3 log λ_{i} \\ + log μ_{i} + \frac{1}{n} log log χ ({\bar{G}}_{i}^{(1)}) + \frac{log n}{n} + \frac{4 + log e}{n})}) . \end{matrix}

(65)

Proof.

The right-hand side of Equation (65) is an upper bound on the zero-error message length with partial reconciliation from Equation (58), whereas the left-hand side lower bounds the zero-error codeword length for perfect reconciliation via Equation (24), from which Equation (65) follows. ☐

We next show that there exist cases for which partial reconciliation strictly outperforms the strategies from Section 3.1 and Section 3.2. To do so, we let

n = 1

and again focus on the class of graphs introduced in Definition 2. First, we present an upper bound on the worst-case message length with partial reconciliation for Z-graphs.

Lemma 3.

The worst-case message length with partial reconciliation for the class of graphs from Definition 2 can be upper bounded by,

l_{P R}^{(1)} \leq min_{A \in \bar{A}} (⌈ log | A | ⌉ + max_{i \in {1, \dots, | A |}} (⌈ log χ (G_{Y, A_{i}}) ⌉ + ⌈ log μ_{A_{i}} ⌉))

(66)

where

⌈ log χ (G_{Y, A_{i}}) ⌉

is as defined in Section 3.3 and

μ_{A_{i}} = {max}_{y \in Y} | \cup_{G \in A_{i}} I_{Y, G} (y) |

with

I_{Y, G} (y)

as described in Equation (12).

Proof.

To prove achievability, note that for a given partition

A

, at least

⌈ log | A | ⌉

bits are necessary for sending the partition index, which reconciles each graph up to the class of graphs in the partition it is assigned to. After reconciliation, zero-error communication requires no more than

{max}_{i : A_{i} \in A} (⌈ log χ (G_{Y, A_{i}}) ⌉ + ⌈ log μ_{A_{i}} ⌉)

in the worst-case. We show this by considering an encoding scheme that ensures zero-error communication for any graph in

A_{i}

by using

(⌈ log χ (G_{Y, A_{i}}) ⌉ + ⌈ log μ_{A_{i}} ⌉)

bits. Group all the neighbors

x^{'} \in X

of y in

\cup_{G \in A_{i}} G

that lead to the same function value

f (x^{'}, y)

. Assign a single distinct codeword to each of these groups. Note that this requires no more than

⌈ log μ_{A_{i}} ⌉

bits in total. Next, for a given partition

A_{i}

, construct the graph

G_{Y, A_{i}}

as defined in Section 3.3. Find the minimum coloring of

G_{Y, A_{i}}

, and assign a distinct codeword to each of the colors, which requires no more than

⌈ log χ (G_{Y, A_{i}}) ⌉

bits in total. Then, fix a partitioning

A

of

G

and use the following communication scheme. User 1, using the partition

A

, sends the index of the group in which her graph G resides. Then, users 1 and 2 use a robust communication scheme for all the graphs contained in this group. To do so, user 1 sends the codeword assigned to

x^{'} \in X

by using no more than

⌈ log μ_{A_{i}} ⌉

bits, after which user 2 can recover the correct function value. Then, user 2 sends the color assigned to

y^{'} \in Y

using no more than

⌈ log χ (G_{Y, A_{i}}) ⌉

bits, after which user 1 can learn the correct function value. ☐

Proposition 6.

Partial reconciliation can strictly outperform the strategies from Section 3.1 and Section 3.2.

Proof.

Consider the set of Z-graphs

G = {G_{1}, G_{2}, G_{3}}

and

B = {B}

in Figure 6. The edge sets satisfy

E_{G_{3}} \subset E_{G_{1}}

,

E_{G_{2}} \cap E_{G_{1}} = {(x, y)}

, and

E_{B} = {(x, y)}

.

Figure 6. Bipartite graphs

G = {G_{1}, G_{2}, G_{3}}

,

B = {B}

, and the corresponding reconciliation graph R.

Let

f (x, y)

be distinct for each edge

(x, y)

in

G

, and that

λ_{G_{i}} \geq 2

for some

i \in {1, 2}

.

First, consider the protocol from Section 3.2. It can be shown that this protocol satisfies

\begin{matrix} l_{R F}^{(1)} & \geq 1 + ⌈ log (μ_{G_{1}} + μ_{G_{2}} - 1) ⌉ \end{matrix}

(67)

which results from the following observation. From Proposition 1, it follows that if

(x^{'}, y), (x^{″}, y) \in E_{G_{i}}

for some

i \in {1, 2, 3}

, then

{[ϕ_{k}^{X} (x^{'}, ϕ^{k - 1} (x^{'}, y))]}_{k = 1}^{r}

cannot be a prefix of

{[ϕ_{k}^{X} (x^{″}, ϕ^{k - 1} (x^{″}, y))]}_{k = 1}^{r}

, where

{[ϕ_{k}^{X} (x^{'}, ϕ^{k - 1} (x^{'}, y))]}_{k = 1}^{r}

is the sequence of bits sent by user 1 in r rounds. Next, suppose that for some

(x^{'}, y) \in E_{G_{i}}

and

(x^{″}, y) \in E_{G_{j}}

where

i \neq j

, and

{[ϕ_{k}^{X} (x^{'}, ϕ^{k - 1} (x^{'}, y))]}_{k = 1}^{r}

is a prefix of

{[ϕ_{k}^{X} (x^{″}, ϕ^{k - 1} (x^{″}, y))]}_{k = 1}^{r}

. Since user 2 does not know the true distribution, she cannot distinguish between

x^{'}

and

x^{″}

, causing an error since

(x^{'}, y)

and

(x^{″}, y)

lead to different function values. This in turn violates the zero-error condition. As a result,

{[ϕ_{k}^{X} (x, ϕ^{k - 1} (x, y))]}_{k = 1}^{r}

should be prefix free for all

x \in I_{Y} (y)

defined in Equation (34) whose size is

μ_{G_{1}} + μ_{G_{2}} - 1

. Therefore, user 1 needs to send at least

⌈ log (μ_{G_{1}} + μ_{G_{2}} - 1) ⌉

bits to user 2. Next, we demonstrate that user 2 needs to send at least 1 bit to user 1. Suppose that this is not true, i.e., user 2 does not send anything. Since

λ_{G_{i}} \geq 2

for some

i \in {1, 2}

, in this case user 1 will not be able to distinguish between two distinct function values for at least one graph that may occur at user 1. Therefore, by contradiction, Equation (67) provides a lower bound for Z-graphs for the protocols that do not start with a reconciliation strategy considered in Section 3.2.

Next, consider the perfect reconciliation protocol. For this scheme, we construct the reconciliation graph R as given in Figure 6, and observe that any encoding strategy that allows user 2 to distinguish the graph of user 1 requires 3 colors (distinct codewords). After this step, both users consider one of

G_{1}

,

G_{2}

, or

G_{3}

. Then,

\begin{matrix} l_{R}^{(1)} & \geq ⌈ log 3 ⌉ + max_{G_{i} \in G} ⌈ log (λ_{G_{i}} + μ_{G_{i}} - 1) ⌉ \end{matrix}

(68)

which follows from Lemma 1 with the observation that

| I_{G_{i}} (x, y) | = λ_{G_{i}} + μ_{G_{i}} - 1

for

i \in {1, 2, 3}

.

Lastly, consider the partial reconciliation protocol. In particular, consider a partial reconciliation scheme achieved by the partitioning

A = {A_{1}, A_{2}}

such that

A_{1} = {G_{1}, G_{3}}

, and

A_{2} = {G_{2}}

. Then, from Equation (66), we obtain

\begin{matrix} l_{P R}^{(1)} & \leq log 2 + max {⌈ log λ_{G_{1}} ⌉ + ⌈ log μ_{G_{1}} ⌉, ⌈ log λ_{G_{2}} ⌉ + ⌈ log μ_{G_{2}} ⌉} . \end{matrix}

(69)

Therefore, whenever

λ_{G_{1}}, λ_{G_{2}}, μ_{G_{1}}, μ_{G_{2}}

satisfy

\begin{matrix} log 2 + max {⌈ log λ_{G_{1}} ⌉ & + ⌈ log μ_{G_{1}} ⌉, ⌈ log λ_{G_{2}} ⌉ + ⌈ log μ_{G_{2}} ⌉} < 1 + ⌈ log (μ_{G_{1}} + μ_{G_{2}} - 1) ⌉, \end{matrix}

(70)

then, partial reconciliation outperforms the strategies from Section 3.2. On the other hand, whenever

λ_{G_{1}}, λ_{G_{2}}, μ_{G_{1}}, μ_{G_{2}}

satisfy

\begin{matrix} log 2 + max {⌈ log λ_{G_{1}} ⌉ & + ⌈ log μ_{G_{1}} ⌉, ⌈ log λ_{G_{2}} ⌉ + ⌈ log μ_{G_{2}} ⌉} \\ < ⌈ log 3 ⌉ + max {⌈ log (λ_{G_{1}} + μ_{G_{1}} - 1) ⌉, ⌈ log (λ_{G_{2}} + μ_{G_{2}} - 1) ⌉}, \end{matrix}

(71)

then, partial reconciliation outperforms the perfect reconciliation scheme. By setting

λ_{G_{1}} = 2

,

μ_{G_{1}} = 8

,

λ_{G_{2}} = 1

,

μ_{G_{2}} = 16

, we observe that

l_{P R}^{(1)} \leq 5

whereas

l_{R}^{(1)} \geq 6

and

l_{R F}^{(1)} \geq 6

and both Equations (70) and (71) are satisfied, from which Proposition 6 follows. ☐

Therefore, under certain settings, it is strictly better to design the interaction protocols to allow the communicating parties to agree on the true source distribution only partially, than to learn it perfectly or not learn it at all, pointing to an inherent reconciliation-communication tradeoff.

6. Communication Strategies with Symmetric Priors

In this section we let

P = Q

and

| P | = 1

and specialize the communication model to the conventional function computation scenario where the true distribution

p (x, y)

of the sources is known by both users. Users thus share a common bipartite graph

G = (X, Y, E)

which they can leverage for interactive communication. We first state a simple lower bound on the worst-case message length.

Proposition 7.

A lower bound on the worst-case message length when the true distribution is known by both parties is,

l^{(n)} \geq max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ .

(72)

Proof.

For the worst-case codeword length, we have

\begin{matrix} l^{(n)} & = min_{ϕ} max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ℓ (ϕ (x^{n}, y^{n})) \end{matrix}

(73)

\begin{matrix} \geq max_{(x^{n}, y^{n}) \in S_{G}^{n}} min_{ϕ} \frac{1}{n} ℓ (ϕ (x^{n}, y^{n})) \end{matrix}

(74)

\begin{matrix} \geq max_{(x^{n}, y^{n}) \in S_{G}^{n}} \frac{1}{n} ⌈ log | I_{G} (x^{n}, y^{n}) | ⌉ \end{matrix}

(75)

where Equation (74) follows from the min-max inequality whereas Equation (75) follows from Proposition 1. ☐

We next consider the upper bounds on the worst-case message length for this scenario. A simple upper bound can be obtained via the graph coloring approach in Theorem 1,

l^{(n)} \leq \frac{1}{n} \{⌈ n log (χ (G_{X})) ⌉ + ⌈ n log (χ (G_{Y})) ⌉\},

(76)

where the characteristic graphs

G_{X}

and

G_{Y}

are constructed as in Theorem 1 using the bipartite graph corresponding to the true distribution

p (x, y)

. We note that Equation (76) implies that

lim_{n \to \infty} l^{(n)} \leq log (χ (G_{X})) + log (χ (G_{Y})) .

(77)

The above approach may yield limited gains for compression for large values of

χ (G_{X})

and

χ (G_{Y})

, and another round of interaction may help reduce the compression rate. We next provide another upper bound that combines graph coloring and hypergraph partitioning. To do so, we first review the following notable results. The first one is a technical result regarding partitioning hypergraphs.

Lemma 4 ([1]).

Define

Γ = (V, E)

to be a hypergraph with a vertex set of size

| V |

, and the hyperedges

E_{m} \subseteq V

with

m = 1, \dots, | E |

. Assume that each hyperedge has at most κ elements, i.e.,

| E_{m} | \leq κ

. Then for any given

ϵ > 0

, there exists a constant

ρ (ϵ)

such that

\forall s \geq {(ln \sqrt{| V | | E |})}^{1 + ϵ}

and

s > 1

, a partition

V_{1}, V_{2}, \dots V_{⌈\frac{κ}{s} ρ (ϵ)⌉}

of V can be found with

| V_{k} \cap E_{m} | < s

for all

m = 1, \dots, | E |

and

k = 1, \dots, ⌈\frac{κ}{s} ρ (ϵ)⌉

.

We can now state the second useful result.

Lemma 5 ([1]).

The following worst-case codeword length can be achieved in three rounds for

n = 1

,

l^{(1)} \leq log Δ_{X} + log Δ_{Y} + (1 + ϵ) log (log \sqrt{| X | | Y |}) + 2 log ρ (ϵ) + 5 .

(78)

where each person makes two non-empty transmissions.

We next derive an upper bound based on Lemma 5 by increasing the number of interaction rounds and following a sequential hypergraph partitioning approach. This allows the proposed scheme to work in low-rate communication environments when parties do not mind having extra rounds of interaction.

Theorem 2.

Given a joint probability distribution

p (x, y)

, consider the corresponding bipartite graph

G = (X, Y, E)

. Consider a partition of

X^{n}

into

⌈ \frac{Δ_{Y}^{n}}{{(ln \sqrt{| X^{n} | | Y^{n} |})}^{1 + ϵ}} ρ (ϵ) ⌉

groups such that for each group

X_{u}^{n}

,

| X_{u}^{n} \cap \{x^{n} : (x^{n}, y^{n}) \in S^{n}, x^{n} \in X^{n}\} | \leq {(ln \sqrt{| X^{n} | | Y^{n} |})}^{1 + ϵ}, \forall y^{n} \in Y^{n},

(79)

where

u = 1, \dots, ⌈\frac{min {Δ_{X}^{n}, Δ_{Y}^{n}}}{{(n ln \sqrt{| X | | Y |})}^{1 + ϵ}} ρ (ϵ)⌉

. Then, the worst-case codeword length with four total rounds can be bounded by using sequential hypergraph partitioning as

l^{(n)} \leq log Δ_{X} + log Δ_{Y} + \frac{(1 + ϵ)}{n} log (log \sqrt{γ_{n}}) + \frac{2}{n} log ρ (ϵ) + \frac{5}{n}

(80)

where

γ_{n} = max_{u} (| X_{u}^{n} | \times min {Δ_{X}^{n} | X_{u}^{n} |, | Y^{n} |}) \leq {| X |}^{n} {| Y |}^{n}

.

Proof.

Our proof builds upon [1] as follows. The set of symbols in the first round are from

X^{n}

and

Y^{n}

for users 1 and 2, respectively. We assume

Δ_{X} > Δ_{Y}

without loss of generality. Let

s_{1} = {(ln \sqrt{| X^{n} | | Y^{n} |})}^{1 + ϵ}

. From Lemma 4,

X^{n}

can be partitioned into

⌈ \frac{Δ_{Y}^{n}}{s_{1}} ρ (ϵ) ⌉

groups such that for each group

X_{u}^{n}

,

| X_{u}^{n} \cap \{x^{n} : (x^{n}, y^{n}) \in S^{n}, x^{n} \in X^{n}\} | \leq s_{1}, \forall y^{n} \in Y^{n},

(81)

where

u = 1, \dots, ⌈ \frac{Δ_{Y}^{n}}{s_{1}} ρ (ϵ) ⌉

. In this round, user 1 sends the index of the group that her symbol resides in by using no more than

⌈ log (\frac{Δ_{Y}^{n}}{s_{1}} ρ (ϵ)) ⌉

bits, and user 2 makes a null transmission. Let

\hat{u}

be the index of the group sent by user 1 in the first round. In the second round, the following set is considered by user 2 after receiving the index from user 1

Y_{\hat{u}}^{n} = \{y^{n} : (x^{n}, y^{n}) \in S^{n}, x^{n} \in X_{\hat{u}}^{n}, y^{n} \in Y^{n}\} .

(82)

Note that

| Y_{\hat{u}}^{n} | \leq min {Δ_{X}^{n} | X_{\hat{u}}^{n} |, | Y^{n} |}

. Next, consider a hypergraph

Γ = (V, E)

with the vertex set

V = Y_{\hat{u}}^{n}

and define a hyperedge for each

x^{n} \in X_{\hat{u}}^{n}

as follows.

E_{x^{n}} = {y^{n} : (x^{n}, y^{n}) \in S^{n}, y^{n} \in Y_{\hat{u}}^{n}},

(83)

where

| E | = | X_{\hat{u}}^{n} |

, and

| E_{x^{n}} | \leq Δ_{X}^{n}, x^{n} \in X_{\hat{u}}^{n}

. User 1 can also determine this set by using the group index for her symbol and the relations between the function values of both parties. Let

s_{2 \hat{u}} = {(ln \sqrt{| V | | E |})}^{1 + ϵ} = {(ln \sqrt{| Y_{\hat{u}}^{n} | | X_{\hat{u}}^{n} |})}^{1 + ϵ} \leq s_{1} .

(84)

User 2 then partitions

Y_{\hat{u}}^{n}

into

⌈ \frac{Δ_{X}^{n}}{s_{2 \hat{u}}} ρ (ϵ) ⌉

groups and sends the group index for his symbol, which requires no more than

⌈ log (\frac{Δ_{X}^{n}}{s_{2 \hat{u}}} ρ (ϵ)) ⌉

bits. After receiving the group index, user 1 has to decide from at most

s_{2 \hat{u}}

possible symbols from user 2.

In the third round, symbols are now restricted to a subspace of

X^{n} \times Y^{n}

with at most

s_{1}

possible symbols from user 1 for each symbol from user 2 and at most

s_{2 \hat{u}}

possible symbols from user 2 for each one of the symbols of user 1. Then, by using ([1], Lemma 2), one can find that no more than

⌈ log (s_{1}) ⌉ + 2 ⌈ log (s_{2 \hat{u}}) ⌉

bits are required.

This scheme requires four rounds of interaction in total; each person makes two non-empty transmissions. The total number of bits required in the worst-case satisfies

\begin{matrix} l^{(n)} & \leq \frac{1}{n} max_{u} (⌈log (\frac{Δ_{Y}^{n}}{s_{1}} ρ (ϵ))⌉ + ⌈log (\frac{Δ_{X}^{n}}{s_{2 u}} ρ (ϵ))⌉ + ⌈ log s_{1} ⌉ + 2 ⌈ log s_{2 u} ⌉) \end{matrix}

(85)

\begin{matrix} \leq \frac{1}{n} max_{u} (log Δ_{Y}^{n} + log Δ_{X}^{n} + log s_{2 u} + 2 log ρ (ϵ) + 5) \end{matrix}

(86)

\begin{matrix} \leq log Δ_{X} + log Δ_{Y} + \frac{1}{n} ((1 + ϵ) log (log \sqrt{γ_{n}}) + 2 log ρ (ϵ) + 5), \end{matrix}

(87)

where

γ_{n} = max_{u} (| X_{u}^{n} | | Y_{u}^{n} |) \leq {| X |}^{n} {| Y |}^{n}

. ☐

Corollary 2.

In the limit of large block lengths, the upper bound of Theorem 2 satisfies

lim_{n_{\to} \infty} l^{(n)} = log Δ_{X} + log Δ_{Y} .

(88)

Proof.

As

| X_{u}^{n} | \leq | X^{n} |

and

| Y_{u}^{n} | \leq | Y^{n} |

for all partitions u of

X^{n}

and

Y^{n}

, from Theorem 2,

\begin{matrix} lim_{n \to \infty} l^{(n)} & \leq lim_{n \to \infty} (log Δ_{X} + log Δ_{Y} + \frac{(1 + ϵ)}{n} log (log \sqrt{| X | | Y |}) + (1 + ϵ) \frac{log n}{n} + \frac{2}{n} log ρ (ϵ) + \frac{5}{n}) \\ = log Δ_{X} + log Δ_{Y} . \end{matrix}

(89)

☐

Lemma 5 and Theorem 2 apply the hypergraph partitioning technique to the bipartite graph of the joint distribution

p (x, y)

, but provide achievable rates by first performing source reconstruction at the two ends, after which both users can compute the correct function value. The next theorem takes the function values into account while constructing the hypergraph partitioning algorithm, with the use of characteristic graphs.

Consider any valid coloring of the characteristic graphs

G_{X}^{n}

and

G_{Y}^{n}

defined in Section 3.1. Note that by using their own symbols, each user can recover the correct function values upon receiving the color from the other user. The problem now reduces to sharing the colors between the two parties correctly, for which we apply sequential hypergraph partitioning to the colors of the graphs

G_{X}^{n}

and

G_{Y}^{n}

.

Theorem 3.

Define a coloring

α : G_{X}^{n} \to C_{α}

for

G_{X}^{n}

with

| C_{α} |

colors, and a coloring

β : G_{Y}^{n} \to C_{β}

for

G_{Y}^{n}

with

| C_{β} |

colors. Let

c (x^{n})

and

c (y^{n})

denote the colors assigned to

x^{n}

and

y^{n}

by the colorings α and β, respectively. Define the ambiguity set for color

c_{X} \in C_{α}

as

J_{X} (c_{X}) ≜ {c_{Y} \in C_{β} : (x^{n}, y^{n}) \in S^{n}, c (x^{n}) = c_{X}, c (y^{n}) = c_{Y}}

(90)

with the size bound

Δ_{X α}^{(n)} ≜ {max}_{c_{X} \in C_{α}} | J_{X} (c_{X}) |

, and the ambiguity set for color

c_{Y} \in C_{β}

as

J_{Y} (c_{Y}) ≜ {c_{X} \in C_{α} : (x^{n}, y^{n}) \in S^{n}, c (x^{n}) = c_{X}, c (y^{n}) = c_{Y}}

(91)

with the size bound

Δ_{Y β}^{(n)} ≜ {max}_{c_{Y} \in C_{β}} | J_{Y} (c_{Y}) |

. Consider a partition of

C_{α}

into

⌈\frac{min {Δ_{X α}^{(n)}, Δ_{Y β}^{(n)}}}{{(ln \sqrt{| C_{α} | | C_{β} |})}^{1 + ϵ}} ρ (ϵ)⌉

groups such that for each group

C_{α u}

,

| C_{α u} \cap J_{Y} (c_{Y}) | \leq {(ln \sqrt{| C_{α} | | C_{β} |})}^{1 + ϵ} \forall c_{Y} \in C_{β},

(92)

and

C_{β u} ≜ {c_{Y} \in C_{β} : (x^{n}, y^{n}) \in S^{n}, c (x^{n}) = c_{X} \in C_{α u}, c (y^{n}) = c_{Y}} .

(93)

where

u = 1, \dots, ⌈\frac{min {Δ_{X α}^{(n)}, Δ_{Y β}^{(n)}}}{{(ln \sqrt{| C_{α} | | C_{β} |})}^{1 + ϵ}} ρ (ϵ)⌉

. Then, the worst-case message length can be upper bounded as,

l^{(n)} \leq min_{α, β} (\frac{log Δ_{X α}^{(n)}}{n} + \frac{log Δ_{Y β}^{(n)}}{n} + \frac{(1 + ϵ)}{n} log (log \sqrt{γ_{α, β}}) + \frac{2}{n} log ρ (ϵ) + \frac{5}{n}),

(94)

where

γ_{α, β} = max_{u} (| C_{α u} | | C_{β u} |)

.

Proof.

Assume

Δ_{X α}^{(n)} > Δ_{Y β}^{(n)}

without loss of generality. Choose

s_{1} = {(ln \sqrt{| C_{α} | | C_{β} |})}^{1 + ϵ}

. Partition

C_{α}

into

⌈ \frac{Δ_{Y β}^{(n)}}{s_{1}} ρ (ϵ) ⌉

groups such that in each partition the number of colors from the ambiguity set is no greater than

s_{1}

. Hence, for any

c_{Y} \in C_{β}

,

| C_{α u} \cap {c_{X} : (x^{n}, y^{n}) \in S^{n}, c (x^{n}) = c_{X} \in C_{α}, c (y^{n}) = c_{Y}} | \leq s_{1},

(95)

for

u = 1, \dots, ⌈ \frac{Δ_{Y α}^{(n)}}{s_{1}} ρ (ϵ) ⌉

. In the first round, user 1 sends the index of the partition the color of her symbols lies in. This requires at most

⌈ log (\frac{Δ_{Y β}^{(n)}}{s_{1}} ρ (ϵ)) ⌉

bits, whereas user 2 makes an empty transmission. Denote

\hat{u}

as the index of the partition sent from user 1. In the second round, upon receiving

\hat{u}

from user 1, user 2 considers a set

C_{β \hat{u}}

given as

u = \hat{u}

in Equation (93), where

| C_{β, \hat{u}} | \leq min {{\tilde{λ}}_{m a x} | C_{α \hat{u}} |, | C_{β} |}

. Define a hypergraph

Γ = (V, E)

with a vertex set

V = C_{β \hat{u}}

and a hyperedge

E_{c_{X}} = {c_{Y} : (x^{n}, y^{n}) \in S^{n}, c (y^{n}) = c_{Y} \in C_{β \hat{u}}, c (x^{n}) = c_{X}}

for each

c_{X} \in C_{α \hat{u}}

such that

| E | = | C_{α \hat{u}} |

, and

| E_{c_{X}} | \leq Δ_{X α}^{(n)}

for every

c_{X} \in C_{α \hat{u}}

. Define

s_{2 \hat{u}} = {(ln \sqrt{| V | | E |})}^{1 + ϵ} = {(ln \sqrt{| C_{β \hat{u}} | | C_{α \hat{u}} |})}^{1 + ϵ} < s_{1}

and partition

C_{β \hat{u}}

into

⌈ \frac{Δ_{X α}^{(n)}}{s_{2 \hat{u}}} ρ (ϵ) ⌉

to groups so that user 2 can send the index of his symbols with at most

⌈ log (\frac{Δ_{X α}^{(n)}}{s_{2 \hat{u}}} ρ (ϵ)) ⌉

bits. Upon receiving the index, user 1 can reduce the number of possible symbols from user 2 to at most

s_{2 \hat{u}}

. Colors in the third round are restricted to a subset of

C_{α} \times C_{β}

such that for every color from user 1 (user 2), there are at most

s_{2 \hat{u}}

(

s_{1}

) possible colors exist from user 2 (user 1). Then Equation (94) follows from ([1], Lemma 2).

It can be observed from Equation (94) that different codeword lengths are obtained by different colorings, since they lead to different color and ambiguity set sizes. In general, there exists a trade-off between the ambiguity set sizes and the number of colors, such that using a smaller number of colors may in turn increase the ambiguity set sizes. The exact nature of the bound depends on the graphical structures such as degree and connectivity, however, any valid coloring allows error-free recovery. For instance, assigning a distinct color to each element of

X^{n}

and

Y^{n}

is a valid coloring scheme. If one restricts oneself to such set of colorings, the coding scheme of Theorem 3 will reduce to that of Theorem 2, hence the bound in Theorem 3 generalizes the achievable protocols in Theorem 2.

7. Conclusions

In this paper, we have considered a communication scenario in which two parties interact to compute a function of two correlated sources with zero error. The prior distribution available at one of the communicating parties is possibly different from the true distribution of the sources. In this setting, we have studied the impact of reconciling the missing information about the true distribution prior to communication on the worst-case message length. We have identified sufficient conditions under which reconciling the partial information is better or worse than not reconciling it but instead using a robust communication protocol that ensures zero-error recovery despite the asymmetry in the knowledge of the distribution. Accordingly, we have provided upper and lower bounds on the worst-case message length for computing multiple descriptions of the given function. Our results point to an inherent reconciliation-communication tradeoff, in that an increased reconciliation cost often leads to a lower communication cost. A number of interesting future directions remain. In this paper, we do not consider additional strategies which consider further information that may be revealed by the function realizations on the support set. Developing interaction strategies that leverage this information is another interesting future direction. A second one is finding the optimal joint reconciliation-communication strategy in general and the study of alternative upper bounds that take into account the specific structure of the function and input distributions. Another interesting direction is to model the case where knowledge asymmetry is due to one party having superfluous information.

Acknowledgments

This research was sponsored by the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053 (the ARL Network Science CTA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. Earlier versions of this work have partially appeared at the IEEE GlobalSIP Symposium on Network Theory, December 2013, IEEE Data Compression Conference (DCC’14), March 2014, and IEEE Data Compression Conference (DCC’16), March 2016. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.

Author Contributions

The ideas in this work were formed by the discussions between Basak Guler and Aylin Yener with Prithwish Basu and Ananthram Swami. Basak Guler is the main author of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

El Gamal, A.; Orlitsky, A. Interactive data compression. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’84), West Palm Beach, FL, USA, 24–26 October 1984; pp. 100–108. [Google Scholar]
Orlitsky, A. Worst-case interactive communication I: Two messages are almost optimal. IEEE Trans. Inf. Theory 1990, 36, 1111–1126. [Google Scholar] [CrossRef]
Guler, B.; Yener, A.; Basu, P. A study of semantic data compression. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP’13), Austin, TX, USA, 3–5 December 2013; pp. 887–890. [Google Scholar]
Guler, B.; Yener, A. Compressing semantic information with varying priorities. In Proceedings of the IEEE Data Compression Conference (DCC’14), Snowbird, UT, USA, 26–28 March 2014; pp. 213–222. [Google Scholar]
Yao, A.C. Some complexity questions related to distributed computing. In Proceedings of the 11th Annual ACM Symposium on Theory of Computing (STOC’79), Atlanta, GA, USA, 30 April–2 May 1979; pp. 209–213. [Google Scholar]
Feder, T.; Kushilevitz, E.; Naor, M.; Nisan, N. Amortized communication complexity. SIAM J. Comput. 1995, 24, 736–750. [Google Scholar] [CrossRef]
Kushilevitz, E.; Nisan, N. Communication Complexity; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Orlitsky, A.; Roche, J.R. Coding for computing. IEEE Trans. Inf. Theory 2001, 47, 903–917. [Google Scholar] [CrossRef]
Ma, N.; Ishwar, P. Some results on distributed source coding for interactive function Computation. IEEE Trans. Inf. Theory 2011, 57, 6180–6195. [Google Scholar] [CrossRef]
Ma, N.; Ishwar, P.; Gupta, P. Interactive source coding for function computation in collocated networks. IEEE Trans. Inf. Theory 2012, 58, 4289–4305. [Google Scholar] [CrossRef]
Yang, E.H.; He, D.K. Interactive encoding and decoding for one way learning: Near lossless recovery with side information at the decoder. IEEE Trans. Inf. Theory 2010, 56, 1808–1824. [Google Scholar] [CrossRef]
Shannon, C. The zero error capacity of a noisy channel. IRE Trans. Inf. Theory 1956, 2, 8–19. [Google Scholar] [CrossRef]
Witsenhausen, H.S. The zero-error side information problem and chromatic numbers. IEEE Trans. Inf. Theory 1976, 22, 592–593. [Google Scholar] [CrossRef]
Simonyi, G. On Witsenhausen’s zero-error rate for multiple sources. IEEE Trans. Inf. Theory 2003, 49, 3258–3260. [Google Scholar] [CrossRef]
Körner, J. Coding of an information source having ambiguous alphabet and the entropy of graphs. In Proceedings of the Sixth Prague Conference on Information Theory, Prague, Czech Republic, 19–25 September 1973; pp. 411–425. [Google Scholar]
Alon, N.; Orlitsky, A. Source coding and graph entropies. IEEE Trans. Inf. Theory 1995, 42, 1329–1339. [Google Scholar] [CrossRef]
Doshi, V.; Shah, D.; Médard, M.; Effros, M. Functional compression through graph coloring. IEEE Trans. Inf. Theory 2010, 56, 3901–3917. [Google Scholar] [CrossRef]
Minsky, Y.; Trachtenberg, A.; Zippel, R. Set reconciliation with nearly optimal communication complexity. IEEE Trans. Inf. Theory 2003, 49, 2213–2218. [Google Scholar] [CrossRef]
Nayak, J.; Rose, K. Graph capacities and zero-error transmission over compound channels. IEEE Trans. Inf. Theory 2005, 51, 4374–4378. [Google Scholar] [CrossRef]
Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic Web. Sci. Am. 2001, 284, 28–37. [Google Scholar] [CrossRef]
Sheth, A.; Bertram, C.; Avant, D.; Hammond, B.; Kochut, K.; Warke, Y. Managing semantic content for the Web. IEEE Internet Comput. 2002, 6, 80–87. [Google Scholar] [CrossRef]
Lee, E.A. Cyber physical systems: Design challenges. In Proceedings of the IEEE International Symposium on Object Oriented Real-Time Distributed Computing (ISORC’08), Orlando, FL, USA, 5–7 May 2008; pp. 363–369. [Google Scholar]
Sheth, A.; Henson, C.; Sahoo, S. Semantic sensor Web. IEEE Internet Comput. 2008, 12, 78–83. [Google Scholar] [CrossRef]
Chen, J.; He, D.K.; Jagmohan, A. On the duality between Slepian–Wolf coding and channel coding under mismatched decoding. IEEE Trans. Inf. Theory 2009, 55, 4006–4018. [Google Scholar] [CrossRef]
Juba, B.; Kalai, A.T.; Khanna, S.; Sudan, M. Compression without a common prior: An information-theoretic justification for ambiguity in language. In Proceedings of the Second Symposium on Innovations in Computer Science (ICS 2011), Beijing, China, 7–9 January 2011. [Google Scholar]
Haramaty, E.; Sudan, M. Deterministic compression with uncertain priors. Algorithmica 2016, 76, 630–653. [Google Scholar] [CrossRef]
Guler, B.; Yener, A.; MolavianJazi, E.; Basu, P.; Swami, A.; Andersen, C. Interactive Function Compression with Asymmetric Priors. In Proceedings of the IEEE Data Compression Conference (DCC’16), Snowbird, UT, USA, 30 March–1 April 2016; pp. 379–388. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, 2012. [Google Scholar]
Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Mehlhorn, K. Data Structures and Algorithms 1: Sorting and Searching; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]

Figure 1. Bipartite graph representation of the probability distribution from Equation (17). Edge labels represent the function values

f (x, y) = (x + y) mod 4

. Note that the maximum vertex degree is

Δ_{X} = 3

for

x \in X

and

Δ_{Y} = 5

for

y \in Y

whereas

λ_{G} = 3

and

μ_{G} = 4

.

Figure 2. Shared bipartite graph with

n = 1

.

X = Y = {1, \dots, 7}

representing the distribution

p_{2} (x, y)

from Equation (20). Edge labels represent the function values

f (x, y)

defined in Equation (19). Maximum vertex degrees are

Δ_{X} = Δ_{Y} = 4

for

x \in X

and

y \in Y

whereas

λ_{G} = μ_{G} = 3

.

Figure 3. Characteristic graphs (a)

G_{X}^{1}

and (b)

G_{Y}^{1}

constructed using distribution

p_{2} (x, y)

in Equation (20) and function

f (x, y)

in Equation (19).

Figure 4. Example graphs

G = {G_{1}, G_{2}}

and

B = {B_{1}, B_{2}}

, where

λ_{G_{1}} = λ_{G_{2}} = μ_{G_{1}} = μ_{G_{2}} = 2

.

Figure 5. Coloring of the characteristic graph

G_{Y, G}

.

Figure 6. Bipartite graphs

G = {G_{1}, G_{2}, G_{3}}

,

B = {B}

, and the corresponding reconciliation graph R.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Two-Party Zero-Error Function Computation with Asymmetric Priors^†

Abstract

1. Introduction

Related Work

2. Problem Setup

2.1. System Model

2.2. Motivating Example

3. Communication Strategies with Asymmetric Priors

3.1. Perfect Reconciliation

3.2. Protocols that Do Not Explicitly Start with a Reconciliation Procedure

3.3. Partial Reconciliation

4. Cases in which Strategies that Do Not Start with a Reconciliation Procedure is Better than Perfect Reconciliation

5. Cases in Which Partial Reconciliation is Better

6. Communication Strategies with Symmetric Priors

7. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Two-Party Zero-Error Function Computation with Asymmetric Priors †

Abstract

1. Introduction

Related Work

2. Problem Setup

2.1. System Model

2.2. Motivating Example

3. Communication Strategies with Asymmetric Priors

3.1. Perfect Reconciliation

3.2. Protocols that Do Not Explicitly Start with a Reconciliation Procedure

3.3. Partial Reconciliation

4. Cases in which Strategies that Do Not Start with a Reconciliation Procedure is Better than Perfect Reconciliation

5. Cases in Which Partial Reconciliation is Better

6. Communication Strategies with Symmetric Priors

7. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Two-Party Zero-Error Function Computation with Asymmetric Priors^†