Side Information Design in Zero-Error Coding for Computing

We investigate the zero-error coding for computing problems with encoder side information. An encoder provides access to a source X and is furnished with side information g(Y). It communicates with a decoder that possesses side information Y and aims to retrieve f(X,Y) with zero probability of error, where f and g are assumed to be deterministic functions. In previous work, we determined a condition that yields an analytic expression for the optimal rate R*(g); in particular, it covers the case where PX,Y is full support. In this article, we review this result and study the side information design problem, which consists of finding the best trade-offs between the quality of the encoder’s side information g(Y) and R*(g). We construct two greedy algorithms that give an achievable set of points in the side information design problem, based on partition refining and coarsening. One of them runs in polynomial time.


Introduction 1.Zero-Error Coding for Computing
The problem of Figure 1 is a zero-error setting that relates to Orlitsky and Roche's coding for computing problems from [1].This coding problem appears in video compression [2,3], where X n models a set of images known at the encoder.The decoder does not always want to retrieve each whole image.Instead, the decoder receives, for each image X t , t ≤ n, a request Y t to retrieve information f (X t , Y t ).This information can, for instance, be a detection: cat, dog, car, bike; or a scene recognition: street/city/mountain, etc.The encoder does not know the decoder's exact request but has prior information about it (e.g., type of request), which is modeled by (g(Y t )) t≤n .This problem also relates to the zero-error Slepian-Wolf open problem, which corresponds to the special case, where g is constant and f (X, Y) = X.
Encoder Decoder Zero-error coding for computing with side information at the encoder.
Similar schemes to the one depicted in Figure 1 have already been studied, but they differ from the one we are studying in two ways.First, they consider that no side information is available to the encoder.Second, and more importantly, they consider different coding constraints: the lossless case is studied by Orlitsky and Roche in [1], the lossy case by Yamamoto in [4], and the zero-error "unrestricted inputs" case by Shayevitz in [5].The latter results can be used as bounds for our problem depicted in Figure 1, but do not exactly characterize its optimal rate.Numerous extensions of the problem depicted in Figure 1 have been studied recently.The distributed context, for instance, has an additional encoder that encodes Y before transmitting it to the decoder.Achievability schemes have been proposed for this setting by Krithivasan and Pradhan in [6] using abelian groups; by Basu et al. in [7] using hypergraphs for the case with maximum distortion criterion; and by Malak and Médard in [8] using hyperplane separations for the continuous lossless case.
Another related context is the network setting, where the function of source random variables from source nodes has to be retrieved at the sink node of a given network.For tree networks, the feasible rate region is characterized by Feizi and Médard in [9] for networks of depth one, and by Sefidgaran and Tchamkerten in [10] under a Markov source distribution hypothesis.In [11], Ravi and Dey consider a bidirectional relay with zero-error "unrestricted inputs" and characterize the rate region for a specific class of functions.In [12], Guang et al. study zero-error function computation on acyclic networks with limited capacities, and give an inner bound based on network cut-sets.For both distributed and network settings, the zero-error coding for computing problems with encoder side information remains open.
In a previous work [13], we determined a condition that we called "pairwise shared side information" such that, if satisfied, the optimal rate R * (g) has a single-letter expression.This covers many cases of interest, in particular the case where P X,Y is full support for any functions f , g.For the sake of completeness, we review this result.Moreover, we propose an alternative and more interpretable expression for this pairwise shared side information.More precisely, we show that the instances where the "pairwise shared side information" condition is satisfied correspond to the worst possible optimal rates in an auxiliary zero-error Slepian-Wolf problem.

Encoder's Side Information Design
In the zero-error coding for computing problems with encoder side information, it can be observed that a "coarse" encoder side information (e.g., if g constant) yields a high optimal rate R * (g), whereas a "fine" encoder side information (e.g., g = Id) yields a low optimal rate R * (g).The side information design problem consists of determining the best trade-offs between the optimal rate R * (g) and the quality of the encoder's side information, which is measured by its entropy H(g(Y)).This expression describes the optimal rate of a zero-error code that transmits the quantized version of Y via the g function.The best trade-offs correspond to the Pareto front of the achievable set, i.e., whose corner-points cannot be obtained by a time sharing between other coding strategies.In short, we aim at determining the Pareto front of the convex hull of the achievable pairs H(g(Y)), R * (g) .
In this article, we propose a greedy algorithm that gives an achievable set of points in the side information design problem, when P X,Y is full support.Studying our problem with the latter hypothesis is interesting because, unlike the case of the Slepian-Wolf problem, it does not necessarily correspond to a worst-case scenario.Recall indeed, that, when P X,Y is full support, the Slepian-Wolf encoder does not benefit from the side information available at the decoder and needs to send X.In our problem instead, if the retrieval function f (X, Y) = Y, since the decoder already has access to Y, no information needs to be sent by the encoder and the optimal rate is 0. Finally, the proposed algorithm relies on our results with "pairwise shared side information", which gives the optimal rate for all functions g and performs a greedy partition coarsening when choosing the next achievable point.Moreover, it runs in polynomial time.
This paper is organized as follows.In Section 2, we formally present the zero-error coding for computing problems and the encoder's side information design problem.In Section 3, we give our theoretic results on the zero-error coding for computing problems, including the "pairwise shared side information" condition.In Section 4, we present our greedy algorithms for the encoder's side information design problem.

Formal Presentation of the Problem
We denote sequences by x n = (x 1 , . . ., x n ).The set of probability distributions over X is denoted by ∆(X ).The distribution of X is denoted by P X ∈ ∆(X ) and its support is denoted by supp P X .Given the sequence length n ∈ N ⋆ , we denote by ∆ n (X ) ⊂ ∆(X ) the set of empirical distributions of sequences from X n .We denote by {0, 1} * the set of binary words.The collection of subsets of a set Y is denoted by P (Y ).
Definition 1.The zero-error source-coding problem of Figure 1 is described by the following: -Four finite sets U , X , Y, Z and a source distribution P X,Y ∈ ∆(X × Y ).-For all n ∈ N ⋆ , (X n , Y n ) is the random sequence of n copies of (X, Y), drawn in an i.i.d.fashion using P X,Y .-Two deterministic functions -An encoder that knows X n and g(Y t ) t≤n sends binary strings over a noiseless channel to a decoder that knows Y n and that wants to retrieve f (X t , Y t ) t≤n without error.
A coding scheme in this setting is described by: -A time horizon n ∈ N ⋆ and an encoding function The rate is the average length of the codeword per source symbol, i.e., R .
)) t≤n , where ℓ denotes the codeword length function.-n, ϕ e , ϕ d must satisfy the zero-error property: The minimal rate under the zero-error constraint is defined by The definition of the Pareto front that we give below is adapted to the encoder's side information design problem and allows us to describe the best trade-off between the quality of the encoder side information and the rate to compute the function f (X, Y) at the decoder.In other works, the definition of a Pareto front may differ depending on the minimization/maximization problem considered and on the number of variables to be optimized.

Definition 2 (Pareto front). Let S ⊂ R 2
+ be a set, the Pareto front of S is defined by Definition 3. The side information design problem in Figure 1 consists of determining the Pareto front of the achievable pairs (H(g(Y)), R * (g)): where Conv denotes the convex hull.
In our zero-error setup, all alphabets are finite.Therefore, the Pareto front of the convex hull in ( 6) is computed on a finite set of points, which correspond to the best trade-offs for the encoder's side information.

Theoretic Results
Determining the optimal rate in the zero-error coding for computing problems, with or without encoder side information, is an open problem.In a previous contribution [13], we determined a condition that, when satisfied, yields an analytic expression for the optimal rate.Interestingly, this condition is general as it does not depend on the function f to be retrieved at the decoder.

General Case
We first build the characteristic graph G [n] , which is a probabilistic graph that captures the zero-error encoding constraints on a given number n of source uses.It differs from the graphs used in [5], as we do not need a cartesian representation of these graphs to study the optimal rates.Furthermore, it has a vertex for each possible realization of X n , g(Y t ) t≤n known at the encoder, instead of X n as in the zero-error Slepian-Wolf problem [14].

Definition 4 (Characteristic graph G [n]
).The characteristic graph G [n] is defined by the following: The characteristic graph G [n] is designed with the same core idea as in [15]: (x n , z n ) and (x ′n , z ′n ) are adjacent if there exists a side information symbol y n compatible with the observation of the encoder (i.e., z n = z ′n and y n ∈ g −1 (z n )), such that f (x n , y n ) ̸ = f (x ′n , y n ).In order to prevent erroneous decodings, the encoder must map adjacent pairs of sequences to different codewords; hence the use of graph colorings, defined below.
∈ E for all x, x ′ ∈ S. Let C be a finite set (the set of colors), a mapping c : V → C is a coloring if c −1 (i) is an independent subset for all i ∈ C.

The chromatic entropy of G [n]
gives the best rate of n-shot zero-error encoding functions, as in [14].
Theorem 1 (Optimal rate).The optimal rate is written as follows: Proof.By construction, the following holds: for all encoding functions ϕ e , ϕ e is a coloring of G [n] if and only if there exists a decoding function ϕ d such that (n, ϕ e , ϕ d ) satisfies the zero-error property.Thus, the best achievable rate is written as follows: H ϕ e X n , g(Y t ) t≤n (11) where (12) comes from Fekete's Lemma and from the definition of H χ .
A general single-letter expression for R * (g) is missing due to the lack of the intrinsic structure of G [n] .In Section 3.2, we introduce a hypothesis that gives structure to G [n] and allows us to derive a single-letter expression for R * (g).

Pairwise Shared Side Information
Definition 7. The distribution P X,Y and the function g satisfy the "pairwise shared side information" condition if ∀z ∈ Z, ∀x, x ′ ∈ X , ∃y ∈ g −1 (z), P XY (x, y)P XY (x ′ , y) > 0, (13) where Im(g) is the image of the function g.This means that for all z output of g, every pair (x, x ′ ) "shares" at least one side information symbol y ∈ g −1 (z).
Note that any full-support distribution P X,Y satisfies the "pairwise shared side information" hypothesis.In Theorem 2, we give an interpretation of the "pairwise shared side information" condition in terms of the optimal rate in an auxiliary zero-error Slepian-Wolf problem.
Theorem 2. The tuple (P X,Y , g) satisfies the condition "pairwise shared side information" (13) ⇐⇒ R * (g) = H(X|g(Y)) in the case f (X, Y) = X, and for all z ∈ Z, P X|g(Y)=z is full support.
The proof of Theorem 2 is given in Appendix A.1.

Definition 8 (AND, OR product
be two probabilistic graphs; their AND (resp.OR) product denoted by G 1 ∧ G 2 (resp.G 1 ∨ G 2 ) is defined by the following: V 1 × V 2 as a set of vertices, P V 1 P V 2 as probability distribution on the vertices, and with the convention that all vertices are self-adjacent.We denote by G ∧n 1 (resp.G ∨n 1 ) the n-th AND (resp.OR) power.
AND and OR powers significantly differ in terms of existing single-letter expression for the associated asymptotic chromatic entropy.Indeed, in the zero-error Slepian-Wolf problem in [14], the optimal rate lim n→∞ 1 n H χ (G ∧n ), which relies on an AND power, does not have a single-letter expression.Instead, closed-form expressions for OR powers of graphs exist.More precisely, as recalled in Proposition 1, lim n→∞ 1 n H χ (G ∨n ) admits a single-letter expression called the Körner graph entropy, introduced in [16], and defined below.This observation is key for us to derive a single-letter expression for our problem.More precisely, by using a convex combination of Körner graph entropies, we provide a single-letter expression in Theorem 3 for the optimal rate R * (g).Definition 9 (Körner graph entropy H κ ).For all G = (V, E , P V ), let Γ(G) be the collection of independent sets of vertices in G.The Körner graph entropy of G is defined by where the minimum is taken over all distributions P W|V ∈ ∆(W ) V , with W = Γ(G) and the constraint that the random vertex V belongs to the random set W with probability one.
Below, we recall that the limit of the normalized chromatic entropy of the OR product of graphs admits a closed-form expression given by the Körner graph entropy H κ .Moreover, the Körner graph entropy of OR products of graphs is simply the sum of the individual Körner graph entropies.
Proposition 1 (Properties of H κ ).Theorem 5 in [14] for all probabilistic graphs G and G ′ , Definition 10 (Auxiliary graph G f z ).For all z ∈ Z, we define the auxiliary graph G f z by -X as set of vertices with distribution P X|g(Y)=z ; -xx ′ are adjacent if f (x, y) ̸ = f (x ′ , y) for some y ∈ g −1 (z) ∩ supp P Y|X=x ∩ supp P Y|X=x ′ .
Theorem 3 (Pairwise shared side information).If P X,Y and g satisfy (13), the optimal rate is written as follows: The proof is in Appendix A.2, the keypoint is the particular structure of G [n] : a disjointed union of OR products.

Remark 1.
The "pairwise shared side information" assumption (13) implies that the adjacency condition (7) is satisfied, which makes G [n] a disjoint union of OR products.Moreover, Körner graph entropies appear in the final expression for R * (g), even if G [n] is not an n-th OR power.Now, consider the case where P X,Y is full support.This is a sufficient condition to have (13).The optimal rate in this setting is derived from Theorem 3, which leads to the analytic expression in Theorem 4.
Theorem 4 (Optimal rate when P X,Y is full support).When P X,Y is full support, the optimal rate is written as follows: where the function j returns a word in U * , defined by

Example
In this example, the "pairwise shared side information" assumption is satisfied and R * (g) is strictly less than a conditional Huffman coding of X knowing g(Y); and also strictly less than the optimal rate without exploiting g(Y) at the encoder.
An example of P X,Y and g that satisfies (13), along with the outcomes f (X, Y).The elements outside supp P X,Y are denoted by * .
On the other hand, with V 1 ∼ P X|g(Y)=1 ; where (25) follows from Γ(G The rate that we would obtain by transmitting X knowing g(Y) at both encoder and decoder with a conditional Huffman algorithm is written as The rate that we would obtain without exploiting g(Y) at the encoder is R No g = H(X) ≈ 1.985 because of the different function outputs generated by Y = 4 and Y = 5.
In this example, we have This illustrates the impact of the side information at the encoder in this setting, as we can observe a large gap between the optimal rate R * (g) and R No g .

Preliminary Results on Partitions
In order to optimize the function g in the encoder side information, we propose a new equivalent characterization of the function g in the form of a partition of the set Y. The equivalence is shown in Proposition 2 below.Proposition 2. For all g : Y → Z, the collection of subsets (g −1 (z)) z∈Z is a partition of Y.
Conversely, if A ⊂ P (Y ) is a partition of Y, then there exists a mapping g Proof.The direct part results directly from the fact that g is a function.For the converse part, we take Z such that |Z | = |A| and we define g A : Y → Z by g A (y) = z, where z ∈ Z is the unique index such that y ∈ A z .The property ∀z ∈ Im g A , ∃A z ∈ A, A z = g −1 A (z) is therefore satisfied.Now, let us define coarser and finer partitions, with the corresponding notions of merging and splitting.These operations on partitions are the core idea of our greedy algorithms; as illustrated in Proposition 2, the partitions of Y correspond to functions g : Y → Z for the encoder's side information.Therefore, obtaining a partition from another means finding another function g : Y → Z for the encoder's side information.
Definition 11 (Coarser, Finer).Let A, B ⊂ P (Y ) be two partitions of the finite set Y. We say that A is coarser than If so, we also say that B is finer than A.
We also define the set of partitions Merge(A) (resp.Split(A)), which correspond to all partitions that can be obtained with a merging (resp.splitting) of A: Proposition 3. If A is coarser (resp.finer) than B, then A can be obtained from B by performing a finite number of mergings (resp.splittings).

Greedy Algorithms Based on Partition Coarsening and Refining
In this Section, we assume P X,Y to be full support.
With Proposition 2, we know that determining the Pareto front by a brute force approach would at least require to enumerate the partitions of Y. Therefore, the complexity of this approach is exponential in |Y |.In the following we describe the greedy Algorithms 1 and 2 that give an achievable set for the encoder's side information design problem; one of them has a polynomial complexity.Then we give an example where the Pareto front coincides with the boundary of the convex hull of the achievable rate region obtained by both greedy algorithms.// Maximize over B merging of A the slope between H(g B (Y)), R * (g B ) and H(g A (Y)), R * (g A ) .

A ← argmax
Front[i] ← A Front[i] ← A In these a argmin (resp.argmax) means any minimizer (resp.maximizer) of the specified quantity; and the function g A : Y → Z is a function for the encoder's side information corresponding to the partition A, whose existence is given by Proposition 2.
The coarsening (resp.refining) algorithm starts by computing its first achievable point H(g A (Y)), R * (g A ) with A being the finest (resp.coarsest) partition: it evaluates R * (g A ), with g A = Id (resp.g A constant); and H(g Then, at each iteration, the next achievable point will be computed by using a merging (resp.splitting) of the current partition A. The next partition will be a coarser (resp.finer) partition chosen from Merge(A) (resp.Split(A)), following a greedy approach.Since we want to achieve good trade-offs between H(g A (Y)) and R * (g A ), we want to decrease H(g(Y)) (resp.R * (g A )) as much as possible while increasing the other quantity as less as possible.We do so by maximizing over B ∈ Merge(A) the negative ratio resp.minimizing over B ∈ Split(A) the negative ratio hence the use of slope maximization (resp.minimization) in the algorithm.At the end, the set of achievable points computed by the algorithm is returned.
In Figure 3, we show rate pairs associated with all possible partitions of Y: a point corresponds to a partition of Y, its position gives the associated rates H(g(Y)), R * (g) .Two points are linked if their corresponding partitions A, B satisfy A ∈ Merge(B) or A ∈ Split(B).The obtained graph is the Hasse diagram for the partial order "coarser than".Note that due to symmetries in the chosen example, several points associated with different partitions may overlap.In Figure 4, (resp.Figure 5), we give an illustration of the trajectory of the greedy coarsening (resp.refining) algorithm.

Figure 4 .
Figure 4.An illustration of the trajectory of the coarsening greedy algorithm (blue), with the Pareto front of the achievable rates (dashed red).

Figure 5 .
Figure 5.An illustration of the trajectory of the refining greedy algorithm (green), with the Pareto front of the achievable rates (dashed red).

Figures 3 -Theorem 5 .
Figures 3-5 are obtained with the following problem data: Front ← [A, ndef, . . ., ndef] // Will contain the list of the |Y | partitions chosen during the execution Minimize over B splitting of A the slope between H(g A 1: A ← {1, . . ., |Y |} // A starts by being the coarsest partition of Y, i.e., g A = Id.2: