Rate-Distortion Analysis of Distributed Indirect Source Coding

Tang, Jiancheng; Yang, Qianqian

doi:10.3390/e27080844

Open AccessArticle

Rate-Distortion Analysis of Distributed Indirect Source Coding

by

Jiancheng Tang

and

Qianqian Yang

^*

College of information Science and Electronic Engineering, Zhejiang University, Hangzhou 310007, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(8), 844; https://doi.org/10.3390/e27080844

Submission received: 7 July 2025 / Revised: 1 August 2025 / Accepted: 6 August 2025 / Published: 8 August 2025

(This article belongs to the Special Issue Semantic Information Theory)

Download

Browse Figures

Versions Notes

Abstract

Motivated by task-oriented semantic communication and distributed learning systems, this paper studies a distributed indirect source coding problem where M correlated sources are independently encoded for a central decoder. The decoder has access to correlated side information in addition to the messages received from the encoders and aims to recover a latent random variable under a given distortion constraint rather than recovering the sources themselves. We characterize the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we develop a distributed Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. Numerical examples are provided to demonstrate the effectiveness of the proposed approach in calculating the rate-distortion region.

Keywords:

semantic communication; distributed source coding; rate-distortion theory; side information; Blahut–Arimoto algorithm

1. Introduction

Consider the multiterminal source coding setup as shown in Figure 1. Let

(T, X_{1}, \dots, X_{M}, Y)

\sim p (t, x_{1}, \dots, x_{M}, y)

be a discrete memoryless source (DMS) taking values in the finite alphabets

T \times X_{1} \times \dots \times X_{M} \times Y

according to a fixed and known probability distribution

p (t, x_{1}, \dots, x_{M}, y)

. In this setup, the encoder

m, m \in M : = {1, \dots, M}

has local observations

X_{m}^{n} : = (X_{m}^{1}, \dots, X_{m}^{n})

. The agents independently encode their observations into binary sequences at rates

{R_{1}, \dots, R_{M}}

bits per input symbol, respectively. The decoder with side information

Y^{n} = (Y_{1}, \dots, Y_{n})

aims to recover some task-oriented latent information

T^{n} : = (T_{1}, \dots, T_{n})

which is correlated with

(X_{1}^{n}, \dots, X_{M}^{n})

, but it is not observed directly by any of the encoders. We are interested in the lossy reconstruction of

T^{n}

with the average distortion measured by

E [\frac{1}{n} \sum_{i = 1}^{n} d (T_{i}, {\hat{T}}_{i})]

for some prescribed single-letter distortion measure

d (\cdot, \cdot)

. A formal

(2^{n R_{1}}, \dots, 2^{n R_{M}}, n)

rate-distortion code for this setup consists of the following:

M independent encoders, where encoder $m \in M$ assigns an index $s_{m} (x_{m}^{n}) \in \{1, \dots, 2^{n R_{m}}\}$ to each sequence $x_{m}^{n} \in X_{m}^{n}$ ;
A decoder that produces the estimate ${\hat{t}}^{n} (s_{1}, \dots, s_{M}, y^{n}) \in T^{n}$ to each index tuple $(s_{1}, \dots, s_{M})$ and side information $y^{n} \in Y^{n}$ .

A rate tuple

(R_{1}, \dots, R_{M})

is said to be achievable with the distortion measure

d (\cdot, \cdot)

and the distortion value D if there exists a sequence of

(2^{n R_{1}}, \dots, 2^{n R_{M}}, n)

codes that satisfy

\begin{matrix} \underset{n \to \infty}{lim sup} E [\frac{1}{n} \sum_{i = 1}^{n} d (T_{i}, {\hat{T}}_{i})] \leq D . \end{matrix}

(1)

The rate-distortion region

R_{X_{1}, \dots, X_{m} | Y}^{*} (D)

for this distributed source coding problem is the closure of the set of all achievable rate tuples

(R_{1}, \dots, R_{M})

that permit the reconstruction of the latent variable

T^{n}

within the average distortion constraint D.

The problem as illustrated in Figure 1 is motivated by semantic/task-oriented communication and distributed learning problems. In semantic/task-oriented communication [1], the decoder only needs to reconstruct some task-oriented information implied by the sources. For instance, it might extract hidden features from a scene captured by multiple cameras positioned at various angles. Here,

T_{i}

may also be a deterministic function of the source samples

(X_{1, i}, \dots, X_{M, i})

, which then reduces to the problem of lossy distributed function computation [2,3,4]. A similar problem also arises in distributed training. Consider

Y^{n}

as the global model available at the server at an iteration of a federated learning process and

(X_{1}^{n}, \dots, X_{M}^{n})

as the independent correlated versions of this model after downlink transmission and local training. The server aims to recover the updated global model,

T^{n}

, based on the messages received from all M clients. It is often assumed that the global model is transmitted to the clients intact, but in practical scenarios where downlink communication is limited, the clients may receive noisy or compressed versions of the global model [5,6,7].

For the case of

M = 1

, the problem reduces to the remote compression in a point-to-point scenario with side information available at the decoder. In [8,9], the authors studied this problem without the correlated side information at the receiver, which is motivated in the context of semantic communication. This problem is known in the literature as the remote rate-distortion problem [10,11], and the rate-distortion trade-off is fully characterized in the general case. The authors studied this trade-off in detail for specific source distributions in [8]. Similarly, the authors of [12] characterized the remote rate-distortion trade-off when correlated side information is available both at the encoder and decoder. Our problem for

M = 1

can be solved by combining the remote rate-distortion problem with the classical Wyner–Ziv rate-distortion function [13,14].

The rate-distortion region for the multi-terminal version of the remote rate-distortion problem considered here remains open. Wagner et al. developed an improved outer bound for a general multiterminal source model [15]. Sung et al. proposed an achievable rate region for the distributed lossy computation problem without giving an conclusive rate-distortion function [16]. Gwanmo et al. considered a special case in which the sources are independent and derived a single-letter expression for the rate-distortion region [17]. Gastpar [18] considered the lossy compression of the source sequences in the presence of side information at the receiver. He characterized the rate-distortion region for the special case, in which

X_{i}

values are conditionally independent given the side information.

To provide a performance reference for practical coding schemes, it is necessary not only to characterize the exact theoretical expression for the rate-distortion region but also to calculate the rate-distortion region for a given distribution and a specific distortion metric. In the traditional single-source direct scenario, determining the rate-distortion function involves solving a convex optimization problem, which can be addressed using the globally convergent iterative Blahut–Arimoto algorithm, as discussed in [17]. In this paper, we are interested in computing the rate-distortion region

R_{X_{1}, \dots, X_{m} | Y}^{*} (D)

for the general distributed coding problem. We pay particular attention to the special case in which the sources are conditionally independent given the side information, which is motivated by the aforementioned examples. For the sake of brevity of the presentation, we set

M = 3

in this paper with the understanding that the results can be readily extended to an arbitrary number of sources. To numerically compute the rate-distortion region, we introduce a distributed iterative optimization framework that generalizes the classical Blahut–Arimoto (BA) algorithm. While the standard BA algorithm is designed for single-source point-to-point settings, our framework extends its alternating minimization structure to a distributed scenario with multiple encoders, indirect source reconstruction, and decoder-side side information. This extension enables the computation of rate-distortion regions in settings that are significantly more general than those considered in the existing literature.

In Section 2, we derive an achievable region

R_{a} (D) \subseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

. In Section 3, we determine a general outer bound

R_{o} (D) \supseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

. In Section 4, we show that the two regions coincide and the region is optimal when the sources

(X_{1}, X_{2}, X_{3})

are conditionally independent given the side information Y. In Section 5, we develop an alternating minimization framework to calculate the rate-distortion region by generalizing the Blahut–Arimoto (BA) algorithm. In Section 6, we demonstrate the effectiveness of the proposed framework through numerical examples.

2. An Achievable Rate Region

In this section, we introduce an achievable rate region

R_{a} (D)

, which is contained within the goal rate-distortion region

R_{a} (D) \subseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

.

Theorem 1.

R_{a} (D) \subseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

, where

R_{a} (D)

is the set of all rate tuples

(R_{1}, R_{2}, R_{3})

such that there exists a tuples

(W_{1}, W_{2}, W_{3})

of discrete random variables with

p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) =

p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3}) p (x_{1}, x_{2}, x_{3}, y)

, for which the following conditions are satisfied

\begin{matrix} R_{1} & ⩾ I (X_{1}; W_{1}) - I (W_{1}; W_{2}, W_{3}, Y) \end{matrix}

(2a)

\begin{matrix} R_{2} & ⩾ I (X_{2}; W_{2}) - I (W_{2}; W_{1}, W_{3}, Y) \end{matrix}

(2b)

\begin{matrix} R_{3} & ⩾ I (X_{3}; W_{3}) - I (W_{3}; W_{1}, W_{2}, Y) \end{matrix}

(2c)

\begin{matrix} R_{1} + R_{2} & ⩾ I (X_{1}; W_{1}) + I (X_{2}; W_{2}) - I (W_{1}; W_{2}, W_{3}, Y) \\ - I (W_{2}; W_{1}, W_{3}, Y) + I (W_{1}; W_{2} | W_{3}, Y) \end{matrix}

(2d)

\begin{matrix} R_{1} + R_{3} & ⩾ I (X_{1}; W_{1}) + I (X_{3}; W_{3}) - I (W_{1}; W_{2}, W_{3}, Y) \\ - I (W_{3}; W_{1}, W_{2}, Y) + I (W_{1}; W_{3} | W_{2}, Y) \end{matrix}

(2e)

\begin{matrix} R_{2} + R_{3} & ⩾ I (X_{2}; W_{2}) + I (X_{3}; W_{3}) - I (W_{2}; W_{1}, W_{3}, Y) \\ - I (W_{3}; W_{1}, W_{2}, Y) + I (W_{2}; W_{3} | W_{1}, Y), \end{matrix}

(2f)

\begin{matrix} R_{1} + R_{2} + R_{3} & ⩾ I (X_{1}; W_{1}) + I (X_{2}; W_{2}) + I (X_{3}; W_{3}) \\ - I (W_{1}; W_{2}, W_{3}, Y) - I (W_{2}; W_{1}, W_{3}, Y) \\ - I (W_{3}; W_{1}, W_{2}, Y) + I (W_{1}; W_{2} | W_{3}, Y) \\ + I (W_{1}, W_{2}; W_{3} | Y), \end{matrix}

(2g)

and there exist a decoder

g (\cdot)

such that

\begin{matrix} E d (T, g (W_{1}, W_{2}, W_{3}, Y)) ⩽ D . \end{matrix}

(3)

The auxiliary random variables

W_{1}

,

W_{2}

and

W_{3}

serve as intermediate variables in the encoding process, which was introduced to optimize compression efficiency. Ideally,

W_{1}

,

W_{2}

and

W_{3}

should carry information about the source

X_{1}

,

X_{2}

and

X_{3}

, respectively. But this information should be as independent as possible from the side information Y in order to avoid redundancy and fully exploit the decoder-side knowledge. This helps minimize the required transmission rate R. The rigorous proof of Theorem 1 is provided in Appendix A.

Corollary 1.

The conditions (25) of Theorem 1 can be expressed equivalently as

\begin{matrix} R_{1} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y) \end{matrix}

(4a)

\begin{matrix} R_{2} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y) \end{matrix}

(4b)

\begin{matrix} R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{3} | W_{1}, W_{2}, Y) \end{matrix}

(4c)

\begin{matrix} R_{1} + R_{2} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2} | W_{3}, Y) \end{matrix}

(4d)

\begin{matrix} R_{1} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{3} | W_{2}, Y) \end{matrix}

(4e)

\begin{matrix} R_{2} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{2}, W_{3} | W_{1}, Y) \end{matrix}

(4f)

\begin{matrix} R_{1} + R_{2} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2}, W_{3} | Y) \end{matrix}

(4g)

Proof.

First, we prove

R_{1} ⩾ I (X_{1}; W_{1}) - I (W_{1}; W_{2}, W_{3}, Y) = I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y)

. The bound of (4a) can be written as

\begin{matrix} I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y) \\ = I (X_{1}; W_{1} | W_{2}, W_{3}, Y) + \underset{= 0}{\underset{︸}{I (X_{2}, X_{3}; W_{1} | X_{1}, W_{2}, W_{3}, Y)}}, \end{matrix}

(5)

where

I (X_{2}, X_{3}; W_{1} | X_{1}, W_{2}, W_{3}, Y) = 0

because

(X_{2}, X_{3}, Y)

is conditionally independent of

W_{1}

for given

X_{1}

. For the first term of the right in (5), we have

\begin{matrix} I (X_{1}; W_{1} | W_{2}, W_{3}, Y) + I (W_{1}; W_{2}, W_{3}, Y) \\ = I (W_{1}; X_{1}, W_{2}, W_{3}, Y) \\ = I (W_{1}; X_{1}) + \underset{= 0}{\underset{︸}{I (W_{2}, W_{3}, Y; W_{1} | X_{1})}}, \end{matrix}

(6)

where

I (W_{2}, W_{3}, Y; W_{1} | X_{1}) = 0

because

(W_{2}, W_{3}, Y)

is conditionally independent of

W_{1}

given

X_{1}

. Then, we have

\begin{matrix} I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y) \\ = I (X_{1}; W_{1} | W_{2}, W_{3}, Y) \\ = I (W_{1}; X_{1}) - I (W_{1}; W_{2}, W_{3}, Y) \\ \leq R_{1} . \end{matrix}

(7)

This completes the proof of (4a). (4b) and (4c) can be proved in the same way.

Then, we prove (4d), the bound of the sum rate

R_{1} + R_{2}

can be written as

\begin{matrix} I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2} | W_{3}, Y) \\ = I (X_{1}; W_{1}, W_{2} | W_{3}, Y) + I (X_{2}, X_{3}; W_{1}, W_{2} | W_{3}, Y, X_{1}) \end{matrix}

(8a)

\begin{matrix} = I (X_{1}; W_{2} | W_{3}, Y) + \underset{= I (X_{1}; W_{1}) - I (W_{1}; W_{2}, W_{3}, Y) \leq R_{1}}{\underset{︸}{I (X_{1}; W_{1} | W_{2}, W_{3}, Y)}} \\ \underset{= 0}{\underset{︸}{I (X_{2}, X_{3}; W_{1} | W_{3}, Y, X_{1})}} + I (X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y, X_{1}), \end{matrix}

(8b)

where (8a) and (8b) are obtained by the chain rule of mutual information,

I (X_{2}, X_{3}; W_{1} | W_{3},

Y, X_{1}) = 0

, because

(X_{2}, X_{3})

is conditionally independent of

W_{1}

given

X_{1}

. For the last term in (8b), we have

\begin{matrix} I (X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y, X_{1}) + I (X_{1}; W_{2} | W_{1}, W_{3}, Y) \end{matrix}

(9a)

\begin{matrix} = I (X_{1}, X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y) \end{matrix}

(9b)

\begin{matrix} = I (X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y) + \underset{= 0}{\underset{︸}{I (X_{1}; W_{2} | W_{1}, W_{3}, Y, X_{2}, X_{3})}} \end{matrix}

(9c)

\begin{matrix} = \underset{= I (X_{2}; W_{2}) - I (W_{2}; W_{1}, W_{3}, Y) \leq R_{2}}{\underset{︸}{I (X_{2}; W_{2} | W_{1}, W_{3}, Y)}} + \underset{= 0}{\underset{︸}{I (X_{1}, X_{3}; W_{2} | W_{1}, W_{3}, Y, X_{2})}} \end{matrix}

(9d)

where

I (X_{1}; W_{2} | W_{1}, W_{3}, Y, X_{2}, X_{3}) = 0

because

X_{1}

is conditionally independent of

W_{2}

given

X_{2}

, and

I (X_{1}, X_{3}; W_{2} | W_{1}, W_{3}, Y, X_{2}) = 0

because

(X_{1}, X_{3})

is conditionally independent of

W_{2}

given

X_{2}

. Thus, the last term in (8b) can be written as

\begin{matrix} I (X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y, X_{1}) \\ = & \underset{= I (X_{2}; W_{2}) - I (W_{2}; W_{1}, W_{3}, Y) \leq R_{2}}{\underset{︸}{I (X_{2}; W_{2} | W_{1}, W_{3}, Y)}} - I (X_{1}; W_{2} | W_{1}, W_{3}, Y) \end{matrix}

(10)

For the last term in the right-hand side, we have

\begin{matrix} I (X_{1}; W_{2} | W_{1}, W_{3}, Y) + I (W_{1}; W_{2} | W_{3}, Y) \\ = I (X_{1}, W_{1}; W_{2} | W_{3}, Y) \\ = I (X_{1}; W_{2} | W_{3}, Y) + \underset{= 0}{\underset{︸}{I (W_{1}; W_{2} | X_{1}, W_{1}, W_{3}, Y)}}, \end{matrix}

(11)

thus, the last term in (10) can be written as

\begin{matrix} I (X_{1}; W_{2} | W_{1}, W_{3}, Y) \\ = & I (X_{1}; W_{2} | W_{3}, Y) - I (W_{1}; W_{2} | W_{3}, Y) \end{matrix}

(12)

By combining (8), (9), (10) and (12), we have

\begin{matrix} R_{1} + R_{2} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2} | W_{3}, Y) \\ = I (X_{1}; W_{1}) + I (X_{2}; W_{2}) \\ - I (W_{1}; W_{2}, W_{3}, Y) - I (W_{2}; W_{1}, W_{3}, Y) \\ + I (W_{1}; W_{2} | W_{3}, Y) . \end{matrix}

(13)

The rest sum rate bounds in (4) can be proved in the same way, which is omitted here. □

3. A General Outer Bound

In this section, we derive a region

R_{o} (D)

which contains the goal rate-distortion region

R_{o} (D) \supseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

.

Theorem 2.

R_{o} (D) \supseteq R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D)

, where

R_{o} (D)

is the set of all rate triples

(R_{1}, R_{2}, R_{3})

such that there exists a triple

(W_{1}, W_{2}, W_{3})

of discrete random variables with

p (w_{1} | x_{1}, x_{2}, x_{3}, y) = p (w_{1} | x_{1})

,

p (w_{2} | x_{1}, x_{2}, x_{3}, y) = p (w_{2} | x_{2})

and

p (w_{3} | x_{1}, x_{2}, x_{3}, y) = p (w_{3} | x_{3})

, for which the following conditions are satisfied

\begin{matrix} \begin{matrix} R_{1} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y) \\ R_{2} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{2} | W_{1}, W_{3}, Y) \\ R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{3} | W_{1}, W_{2}, Y) \\ R_{1} + R_{2} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2} | W_{3}, Y) \\ R_{1} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{3} | W_{2}, Y) \\ R_{2} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{2}, W_{3} | W_{1}, Y) \\ R_{1} + R_{2} + R_{3} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1}, W_{2}, W_{3} | Y) \end{matrix} \end{matrix}

(14)

and there exists a decoding function

g (\cdot)

such that

\begin{matrix} E d (T, g_{1} (W_{1}, W_{2}, W_{3}, Y)) ⩽ D, \end{matrix}

(15)

The rigorous proof of Theorem 2 is provided in Appendix B.

While the expressions of the inner bound (4) and the outer bound (14) are the same, these two regions do not coincide because the marginal constrains

p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) =

p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3}) p (x_{1}, x_{2}, x_{3}, y)

in Theorem 1 limit the degree of freedom for choosing the auxiliary random variables

(W_{1}, W_{2}, W_{3})

compared with the marginal constraints in Theorem 2. In the next section, we will demonstrate that the additional degree of freedom in choosing the auxiliary random variables

(W_{1}, W_{2}, W_{3})

in Theorem 2 cannot lower the value of the rate-distortion functions.

4. Conclusive Rate-Distortion Results

Corollary 2.

If

X_{1}, X_{2}, X_{3}

are conditionally independent given the side information Y,

R_{a} (D) \subseteq R_{X_{1}, X_{2}, | Y}^{*} (D)

where

R_{a} (D)

is the set of all rate triples

(R_{1}, R_{2}, R_{3})

such that there exists a triple

(W_{1}, W_{2}, W_{3})

of random variables with

p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) = p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3})

p (x_{1} | y) p (x_{2} | y) p (x_{3} | y) p (y)

, for which the following conditions are satisfied

\begin{matrix} R_{1} & ⩾ I (X_{1}; W_{1}) - I (W_{1}; Y) \\ R_{2} & ⩾ I (X_{2}; W_{2}) - I (W_{2}; Y) \\ R_{3} & ⩾ I (X_{3}; W_{3}) - I (W_{3}; Y) \end{matrix}

(16)

and there exist decoding functions

g (\cdot)

such that

\begin{matrix} E d (T, g (W_{1}, W_{2}, W_{3}, Y)) ⩽ D . \end{matrix}

(17)

Proof.

Since the joint distribution can be written as

\begin{matrix} p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) \\ = p (w_{1}, w_{2}, w_{3} | x_{1}, x_{2}, x_{3}, y) p (x_{1}, x_{2}, x_{3} | y) p (y) \\ = p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3}) p (x_{1} | y) \\ \times p (x_{2} | y) p (x_{3} | y) p (y), \end{matrix}

(18)

the terms

I (W_{1}; W_{2} | W_{3}, Y)

in the sum rate bound (2d) are 0 because

W_{2}

is conditionally independent of

W_{1}

. Similarly, the terms

I (W_{1}; W_{3} | W_{2}, Y), I (W_{2}; W_{3} | W_{1}, Y)

and

I (W_{1}; W_{2} | W_{3}, Y) +

I (W_{1}, W_{2}; W_{3} | Y)

in the sum rate bound (2e)–(2g) are all 0. Therefore, the sum rate bound can be expressed as the combination of the side bounds and hence can be omitted. Meanwhile, the term

I (W_{1}; W_{2}, W_{3}, Y)

in the side bound in (2a) can be written as

\begin{matrix} I (W_{1}; W_{2}, W_{3}, Y) = I (W_{1}; Y) + \underset{= 0}{\underset{︸}{I (W_{2}, W_{3}; W_{1} | Y)}} . \end{matrix}

(19)

Similarly, we have

\begin{matrix} I (W_{2}; W_{1}, W_{3}, Y) = I (W_{2}; Y) + \underset{= 0}{\underset{︸}{I (W_{1}, W_{3}; W_{2} | Y)}} \\ I (W_{3}; W_{1}, W_{2}, Y) = I (W_{3}; Y) + \underset{= 0}{\underset{︸}{I (W_{1}, W_{2}; W_{3} | Y)}} . \end{matrix}

(20)

This completes the proof Corollary 2. □

Corollary 3.

If

X_{1}, X_{2}, X_{3}

are conditionally independent given the side information Y,

{R_{o}}^{'} (D) \supseteq R_{o} (D)

, and hence

{R_{o}}^{'} (D) \supseteq R_{X_{1}, X_{2}, W_{3} | Y}^{*} (D)

where

{R_{o}}^{'} (D)

is the set of all rate triples

(R_{1}, R_{2}, R_{3})

such that there exists a triple

(W_{1}, W_{2}, W_{3})

of discrete random variables with

p (w_{1} | x_{1}, x_{2}, w_{3}, y) = p (w_{1} | x_{1})

,

p (w_{2} | x_{1}, x_{2}, w_{3}, y) = p (w_{2} | x_{2})

and

p (w_{3} | x_{1}, x_{2}, x_{3}, y) = p (w_{3} | x_{3})

, for which the following conditions are satisfied

\begin{matrix} \begin{matrix} R_{1} & ⩾ I (X_{1}; W_{1}) - I (W_{1}; Y) \\ R_{2} & ⩾ I (X_{2}; W_{2}) - I (W_{2}; Y) \\ R_{3} & ⩾ I (X_{3}; W_{3}) - I (W_{3}; Y), \end{matrix} \end{matrix}

(21)

and there exists decoding functions

g (\cdot)

such that

\begin{matrix} E d (T, g_{1} (W_{1}, W_{2}, W_{3}, Y)) ⩽ D, \end{matrix}

(22)

Proof.

First, we can enlarge the region

R_{o} (D)

by omitting the sum rate bound in (14). Then, the side rate bounds in (14) can be relaxed as

\begin{matrix} \begin{matrix} R_{1} & ⩾ I (X_{1}, X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y) \\ ⩾ I (X_{1}; W_{1} | W_{2}, W_{3}, Y) + I (X_{2}, X_{3}; W_{1} | W_{2}, W_{3}, Y, X_{1}) \\ ⩾ I (X_{1}; W_{1} | W_{2}, W_{3}, Y) \\ = I (X_{1}; W_{1}, W_{2}, W_{3} | Y) - \underset{= 0}{\underset{︸}{I (X_{1}; W_{2}, W_{3} | Y)}} . \end{matrix} \end{matrix}

(23)

According to the conditional independence relations, we have

I (X_{1}; W_{2}, W_{3} | Y) = 0

, and then we have

\begin{matrix} R_{1} & ⩾ I (X_{1}; W_{1}, W_{2}, W_{3} | Y) \end{matrix}

\begin{matrix} = I (X_{1}; W_{1} | Y) + \underset{= 0}{\underset{︸}{I (X_{1}; W_{2}, W_{3} | Y, W_{1})}} \end{matrix}

(24a)

\begin{matrix} = I (X_{1}, Y; W_{1}) - I (W_{1}; Y) \end{matrix}

(24b)

\begin{matrix} = I (X_{1}; W_{1}) - I (W_{1}; Y) \end{matrix}

(24c)

where (24a) is obtained by the condition that

X_{1}, X_{2}, X_{3}

are conditionally independent given the side information Y, (24b) follows from the chain rule of mutual information and (24c) is derived by the Markov chain

Y - X_{1} - W_{1}

. The same derivation can be applied to

R_{2}

and

R_{3}

; this proves Corollary 3. □

Theorem 3.

If

X_{1}, X_{2}, X_{3}

are conditionally independent given the side information Y,

\begin{matrix} R_{a} (D) = R_{o} (D) = R_{X_{1}, X_{2}, X_{3} | Y}^{*} (D) . \end{matrix}

(25)

Proof.

We note that the only difference between

R_{a} (D)

and

R_{o} (D)

is the degrees of freedom when choosing the auxiliary random variables

(W_{1}, W_{2}, W_{3})

, and all of the mutual information functions in (16) and (21) only depend on the marginal distribution

(X_{1}, W_{1}, Y)

,

(X_{2}, W_{2}, Y)

and

(X_{3}, W_{3}, Y)

. Randomly choose a certain rate triple

(R_{1}, R_{2}, R_{3})

with a auxiliary random variable triple

(W_{1}, W_{2}, W_{3})

meeting the conditions of Corollary 3, and the corresponding joint distribution is given in (18). Then, we construct the auxiliary random variables

(W_{1}^{'}, W_{2}^{'}, W_{3}^{'})

such that

\begin{matrix} \begin{matrix} p_{W_{1}^{'} | X_{1}} (w_{1} | s_{1}) = \sum_{w_{2}, s_{2}, w_{3}, s_{3}} p (w_{1}, w_{2}, w_{3} | s_{1}, s_{2}, s_{3}) p (s_{2}, s_{3} | s_{1}) \\ p_{W_{2}^{'} | X_{2}} (w_{2} | s_{2}) = \sum_{w_{1}, s_{1}, w_{3}, s_{3}} p (w_{1}, w_{2}, w_{3} | s_{1}, s_{2}, s_{3}) p (s_{1}, s_{3} | s_{2}) \\ p_{W_{3}^{'} | X_{3}} (w_{3} | s_{3}) = \sum_{w_{1}, s_{1}, w_{2}, s_{2}} p (w_{1}, w_{2}, w_{3} | s_{1}, s_{2}, s_{3}) p (s_{1}, s_{2} | s_{3}) . \end{matrix} \end{matrix}

(26)

The joint distribution

\begin{matrix} \begin{matrix} p (w_{1}^{'}, w_{2}^{'}, w_{3}^{'}, x_{1}, x_{2}, x_{3}, Y) \\ = p (w_{1}^{'} | x_{1}) p (w_{2}^{'} | x_{2}) p (w_{3}^{'} | x_{3}) p (x_{1} | y) p (x_{2} | y) p (x_{3} | y) p (y) \end{matrix} \end{matrix}

(27)

has the same marginal distributions on

(X_{1}, W_{1}, Y)

,

(X_{2}, W_{2}, Y)

and

(X_{3}, W_{3}, Y)

. Therefore, the additional degree of freedom for choosing the auxiliary random variables

(W_{1}, W_{2}, W_{3})

in Corollary 3 cannot lower the value of rate-distortion functions. This proves Theorem 3. The arguments leading to Theorem 3 indicate that the result extends to the M sources scenario

\begin{matrix} R_{i} \geq I (X_{i}; W_{i}) - I (W_{i}; Y) for all l \in {1, \dots, M} . \end{matrix}

(28)

□

5. Iterative Optimization Framework Based on BA Algorithm

In this section, we present the iterative optimization framework for calculating the rate distortion region. Starting with the standard Lagrange multiplier method, the problem of calculating the rate-distortion region

R_{X_{1}, \dots, X_{m} | Y}^{*} (D)

is equivalent to minimize

\sum_{i \in M : = {1, \dots, M}} I (W_{i}; X_{i}) - I (W_{i}; Y) + λ (E [d (T, \hat{T})] - D)

(29)

By the definition of mutual information, we can rewrite (29) as

\begin{matrix} L_{λ} (Q, q, q^{'}) \\ = \sum_{y, \hat{t}, x_{i}, w_{i}, i \in M} p (y, x_{i}) q_{i} (w_{i} ∣ x_{i}) q^{'} (\hat{t} ∣ y, w) log \frac{q_{i} (w_{i} ∣ x_{i})}{Q_{i} (w_{i} ∣ y)} \\ + λ \sum_{w, x, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) \prod_{i \in M} q_{i} (w_{i} ∣ x_{i}), \end{matrix}

(30)

where

Q, q, q^{'}

represent the distributions that need to be iteratively updated, and the vectorized notation

Q

represents the conditional distribution of the auxiliary variables given Y, i.e.,

[Q_{i} (w_{i} ∣ y) ∣ w_{i} \in W_{i}, y \in Y, i \in M]

,

q

represents the conditional distribution of the auxiliary variables given sources

X_{i}

,

[q_{i} (w_{i} ∣ x_{i}) ∣ w_{i} \in W_{i}, x_{i} \in X_{i}, i \in M]

and

q^{'}

represents the conditional distribution of the indirect variable T given Y and auxiliary variables,

q^{'} (\hat{t} ∣ y, w_{1}, \dots, w_{M})

.

Lemma 1

(Optimization of

Q

). For a fixed

Q_{∖ m}, q, q^{'}

, the Lagrangian

L_{λ} (Q, q, q^{'})

is minimized by

Q_{m}^{*} (w_{m} ∣ y) ≜ \frac{\sum_{x_{m}, \hat{t}} p (y, x_{m}) q (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w)}{\sum_{x_{m}, \hat{t}, w_{m}} p (y, x_{m}) q (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w),}

(31)

where

Q_{∖ m} = [Q_{i} (w_{i} ∣ y) ∣ w_{i} \in W_{i}, y \in Y, i \in M ∖ m]

.

Proof.

For any

Q_{m}

\begin{matrix} \begin{matrix} L_{λ} (Q_{m}^{*}, q, q^{'}) - L_{λ} (Q_{m}, q, q^{'}) \\ = \sum_{y, \hat{t}, x_{m}, w_{m}} p (y, x_{m}) q_{i} (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w) log \frac{q_{m} (w_{m} ∣ x_{m})}{Q_{m}^{*} (w_{m} ∣ y)} \\ - \sum_{y, \hat{t}, x_{m}, w_{m}} p (y, x_{m}) q_{i} (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w) log \frac{q_{m} (w_{m} ∣ x_{m})}{Q_{m} (w_{m} ∣ y)} \\ = \sum_{y, \hat{t}, x_{m}, w_{m}} p (y, x_{m}) q_{i} (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w) log \frac{Q_{m} (w_{m} ∣ y)}{Q_{m}^{*} (w_{m} ∣ y)} \\ \overset{(a)}{\leq} \sum_{y, \hat{t}, x_{m}, w_{m}} p (y, x_{m}) q_{i} (w_{m} ∣ x_{m}) q^{'} (\hat{t} ∣ y, w) (\frac{Q_{m} (w_{m} ∣ y)}{Q_{m}^{*} (w_{m} ∣ y)} - 1) \\ = 0, \end{matrix} \end{matrix}

(32)

where (a) follows from the fact that

log (1 + x) \leq x

, and the equality is achieved if

Q_{m}^{*} = Q_{m}

. This completes the proof of Lemma 1. □

Lemma 2

(Optimization of

q

). For a fixed

Q, q_{∖ m}, q^{'}

, the Lagrangian

L_{λ} (Q, q, q^{'})

is minimized by

q^{*} (w_{m} ∣ x_{m}) = \frac{exp [\sum_{y} p (y ∣ x_{m}) log Q (w_{m} ∣ y) - λ \sum_{w_{∖ m}, x_{∖ m}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) q_{∖ m} (w_{∖ m} ∣ x_{∖ m})]}{\sum_{w_{m}} exp [\sum_{y} p (y ∣ x_{m}) log Q (w_{m} ∣ y) - λ \sum_{w_{∖ m}, x_{∖ m}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) q_{∖ m} (w_{∖ m} ∣ x_{∖ m})],}

(33)

and the minimum is given by

\begin{matrix} L_{λ} (Q, q_{∖ m}^{*}, q^{'}) \\ = \sum_{x_{i}, i \in M} p (x_{i}) min_{w_{i}} [\sum_{y, \hat{t}} p (y ∣ x_{i}) q^{'} (\hat{t} ∣ y, w) log \frac{q^{*} (w_{i} ∣ x_{i})}{Q (w_{i} ∣ y)} \\ + λ \sum_{w, x, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) \prod_{i \in M} q_{i} (w_{i} ∣ x_{i})] . \end{matrix}

(34)

Proof.

For a fixed

Q, q_{∖ m}, q^{'}

, the Lagrangian

L_{λ} (Q, q, q^{'})

is minimized by

q^{*} (w_{m} ∣ x_{m})

if and only if the following Kuhn–Tucker (KT) conditions are satisfied

{\frac{\partial L_{λ}}{\partial q_{m}}|}_{q_{m}^{*}} = γ, if q^{*} (w_{m} ∣ x_{m}) > 0,

(35)

and

{\frac{\partial L_{λ}}{\partial q_{m}}|}_{q_{m}^{*}} \leq γ, if q_{m}^{*} (w_{m} ∣ x_{m}) = 0 .

(36)

Since

\begin{matrix} \frac{\partial L_{λ}}{\partial q_{m}} \\ = \sum_{x_{m}, y} p (x_{m}, y) (log \frac{p (w_{m} ∣ x_{m})}{p (w_{m} ∣ y)} + 1) \\ + λ \sum_{w, x, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) q_{∖ m} (u_{∖ m} ∣ x_{∖ m}), \end{matrix}

(37)

the first KT condition (35) becomes

\begin{matrix} \tilde{γ} \\ = \sum_{x_{m}, y} p (x_{m}, y) (log p (w_{m} ∣ x_{m}) - log p (w_{m} ∣ y)) \\ + λ \sum_{w_{∖ m}, x_{∖ m}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) q_{∖ m} (w_{∖ m} ∣ x_{∖ m}), \end{matrix}

(38)

where

q_{∖ m} (w_{∖ m} ∣ x_{∖ m}) = \prod_{i \in M ∖ m} q_{i} (w_{i} ∣ x_{i})

(39)

and we have

\begin{matrix} q (w_{m} ∣ x_{m}) \\ = exp (\frac{\tilde{γ}}{p (x_{m})}) exp (\sum_{y} p (y ∣ x_{m}) log Q (w_{m} ∣ y) \\ - λ \sum_{w_{∖ m}, x_{∖ m}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) q_{∖ m} (w_{∖ m} ∣ x_{∖ m})) . \end{matrix}

(40)

Then, (33) can be obtained after normalizing

q (w_{m} ∣ x_{m})

. □

Lemma 3

(Optimization of

q^{'}

). For a fixed

Q, q

, the Lagrangian

L_{λ} (Q, q, q^{'})

is minimized by the maximum Bayes detector.

q^{' *} (\hat{t} ∣ w, y) = \{\begin{matrix} α (w), & \hat{t} \in arg min_{\hat{t} \in \hat{T}} E [d (t, \hat{t}) ∣ W = w], \\ 0, & otherwise \end{matrix}

(41)

where

α (w)

is selected to guarantee

\sum_{\hat{t}} q^{' *} (\hat{t} ∣ w, y) = 1

(42)

and

E [d (t, \hat{t}) ∣ W = w]

denotes

\sum_{w_{∖ m}, x_{∖ m}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) \prod_{i \in M} q_{i} (w_{i} ∣ x_{i})

(43)

Proof.

We note that the distortion term in the Lagrangian

L_{λ}

that depends on

q^{'} (\hat{t} ∣ y, w)

is the mean distortion, which can be minimized by a Bayes detector.

Based on Lemmas 1–3, we have the iterative algorithm for computing the rate-distortion region for the distributed indirect source with decoder side information. For a given

i \in M

, we iteratively calculate the following (44) and (45) to alternately update

Q_{i}

and

q_{i}

Q_{i}^{ℓ, k_{i}} = \frac{\sum_{x_{i}, \hat{t}} p (y, x_{i}) q^{(ℓ, k_{i})} (w_{i} ∣ x_{i}) q^{'} (\hat{t} ∣ y, w)}{\sum_{x_{i}, \hat{t}, w_{i}} p (y, x_{i}) q^{(ℓ, k_{i})} (w_{i} ∣ x_{i}) q^{'} (\hat{t} ∣ y, w)}

(44)

q_{i}^{(l, k_{i} + 1)} = \frac{exp [\sum_{y} p (y ∣ x_{i}) log Q^{(l, k_{i})} (w_{i} ∣ y) - λ \sum_{w_{∖ i}, x_{∖ i}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{' (ℓ, i)} (\hat{t} ∣ y, w) q_{∖ i}^{(ℓ, i)} (w_{∖ i} ∣ x_{∖ i})]}{\sum_{w_{i}} exp [\sum_{y} p (y ∣ x_{i}) log Q^{(ℓ, k_{i})} (w_{i} ∣ y) - λ \sum_{w_{∖ i}, x_{∖ i}, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{' (ℓ, i)} (\hat{t} ∣ y, w) q_{∖ i}^{(ℓ, i)} (w_{∖ i} ∣ x_{∖ i})]}

(45)

until the Lagrangian

L_{λ}

converges, and the associated

q_{i}

is

q_{i}^{(ℓ, *)} (w_{i} ∣ x_{i}) = lim_{k_{i} \to \infty} q_{i}^{(ℓ, k_{i})} (w_{i} ∣ x_{i}) = : q_{i}^{(ℓ + 1, 1)} .

(46)

Then, we update

q^{'} (\hat{t} ∣ w, y)

according to

q^{' ℓ^{'}, i^{'}} = \{\begin{matrix} α^{ℓ^{'}, i^{'}} (w), & \hat{t} \in arg min_{\hat{t} \in \hat{T}} J^{(ℓ, i)} (\hat{t}, w), \\ 0, & otherwise \end{matrix}

(47)

with

i^{'} = i + 1

and

ℓ^{'} = ℓ

if

i < M

and

i^{'} = 1

and

ℓ^{'} = ℓ + 1

if

i = M

, and where

α^{ℓ^{'}, i^{'}} (w)

is selected to guarantee

\sum_{\hat{t}} q^{' (ℓ^{'}, i^{'})} (\hat{t} ∣ w, y) = 1

(48)

and

\begin{matrix} J^{(ℓ, i)} (\hat{t}, w) & = \sum_{x, t, y} d (t, \hat{t}) p (t, x, y) \\ \times q_{∖ i}^{(ℓ, i)} (w_{∖ i} ∣ x_{∖ i}) q_{i}^{(ℓ, *)} (w_{i} ∣ x_{i}) . \end{matrix}

(49)

□

Next, we repeat the process for the next user, i.e.,

(i \leftarrow i + 1 if i < M and i \leftarrow 1, ℓ \leftarrow ℓ + 1 if i = M)

.

Convergence analysis:

The algorithm employs an alternating minimization approach that produces Lagrangian values that are monotonically non-increasing and bounded below, thereby generating a convergent sequence of Lagrangians. Since the rate component of the Lagrangian is convex, and the sum of convex functions remains convex, the Lagrangian will also be convex if the expected distortion is a convex function. As a result, the proposed iterative optimization framework is capable of achieving the global minimum [19].

\sum_{w, x, t, \hat{t}, y} d (t, \hat{t}) p (t, x, y) q^{'} (\hat{t} ∣ y, w) \prod_{i \in M} q_{i} (w_{i} ∣ x_{i})

(50)

We note that the expected distortion (50) includes a product of variables in the optimization process, it is not a linear function, and the Lagrangian may exhibit non-convex behavior. Even when the problem is non-convex, the authors in [20] demonstrate that the BA-based iterative algorithm initialized randomly and followed by selecting the minimum Lagrangian among all converged values can still provide highly effective information-theoretic inner bounds for the rate-distortion region, serving as a benchmark for practical quantization schemes.

6. Numerical Examples

In this section, we provide an example to illustrate the proposed iterative algorithms for computing the rate-distortion region of a distributed source coding problem with decoder-side information. As in the problem considered in this paper, distributed edge devices compress their observations

{X_{1}, \dots, X_{M}}

and transmit them to a central server (CEO). The central server then aims to recover the indirect information T from the received data, utilizing side information Y. For the convenience of demonstration, we consider a simple case where

M = 2

and the sources are binary, i.e.,

X_{1} = X_{2} = Y = {0, 1}

. The joint distributions, denoted by

Q (x_{1}, y)

and

Q (x_{2}, y)

, are given by

\begin{matrix} Q (x_{1}, y) = \frac{(1 - p_{1})}{2} δ_{x_{1}, y} + \frac{p_{1}}{2} (1 - δ_{x_{1}, y}), \\ Q (x_{2}, y) = \frac{(1 - p_{2})}{2} δ_{x_{2}, y} + \frac{p_{2}}{2} (1 - δ_{x_{2}, y}), \end{matrix}

(51)

where the Kronecker delta function

δ_{x, y}

equals 1 when x = y and 0 otherwise. We can consider Y as the input to two different binary symmetric channels (BSCs) with crossover probabilities

p_{1}, p_{2}

, respectively, where

0 \leq p_{i} \leq \frac{1}{2}, i \in {1, 2}

. The corresponding outputs of these channels are

X_{1}

and

X_{2}

. In this example, we set

p_{1} = p_{2} = 0.3

.

We also assume that the information of interest is directly the combination of the two distributed sources, i.e.,

T = {X_{1}, X_{2}}

. The distortion measure is given by

d (t, \hat{t}) = d (x_{1}, {\hat{x}}_{1}) + d (x_{2}, {\hat{x}}_{2})

, where

d (x_{i}, {\hat{x}}_{i}) = \{\begin{matrix} 0, & x_{i} = {\hat{x}}_{i}, \\ 1, & x_{i} \neq {\hat{x}}_{i} . \end{matrix}

(52)

By applying the proposed iterative optimization framework, we can obtain the optimal transition probability distribution

Q_{i}^{*} (w_{i} ∣ y), q_{i}^{*} (w_{i} ∣ x_{i})

and

q^{' *} (\hat{t} ∣ w, y)

that meets a given distortion constraint D on

d (t, \hat{t})

, and the corresponding minimum rate

R_{i}

can be calculated by

\begin{matrix} R_{i} = \sum_{y, x_{i}, w_{i}} p (y, x_{i}) q_{i}^{*} (w_{i} ∣ x_{i}) log \frac{q_{i}^{*} (w_{i} ∣ x_{i})}{Q_{i}^{*} (w_{i} ∣ y)} \end{matrix}

(53)

The contour plot of the rate-distortion region for this scenario is presented in Figure 2, while Figure 3 displays a surface plot depicting the rate-distortion region. We note that when

M = 1

, the considered problem reduces to the traditional point-to-point Wyner–Ziv problem. In Figure 4, we compare the rate-distortion results computed using the proposed approach with the theoretical analysis by Wyner et al. [13]. We observe that the two rate-distortion function curves coincide, demonstrating the effectiveness of the proposed iterative approach for calculating rate distortion.

7. Conclusions

This paper explored a variant of the rate-distortion problem motivated by semantic communication and distributed learning systems, where correlated sources are independently encoded for a central decoder to reconstruct the indirect source of interest. In addition to receiving messages from the encoders, the decoder has access to correlated side information and aims to reconstruct the indirect source under a specified distortion constraint. We derived the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we introduced a distributed iterative optimization framework based on the Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. A numerical example has been provided to demonstrate the effectiveness of the proposed approach.

Author Contributions

Conceptualization, Q.Y.; Methodology, J.T.; Validation, J.T.; Formal analysis, J.T.; Investigation, Q.Y.; Resources, Q.Y.; Writing—original draft, J.T.; Writing—review & editing, Q.Y.; Visualization, J.T.; Project administration, Q.Y.; Funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partly supported by NSFC under grant No. 62293481, No. 62201505, partly by the SUTD-ZJU IDEA Grant (SUTD-ZJU (VP) 202102).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Here, we provide the rigorous proof of Theorem 1.

Lemma A1

(Extended Markov Lemma). Let

\begin{matrix} p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) \\ = p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3}) p (x_{1}, x_{2}, x_{3}, y) . \end{matrix}

(A1)

For a fixed

(x_{1}^{n}, x_{2}^{n}, x_{3}^{n} y^{n}) \in A_{ϵ}^{* (n)}

,

w_{1}^{n}

,

w_{2}^{n}

and

w_{3}^{n}

are drawn from

p (w_{1} | x_{1})

,

p (w_{2} | x_{2})

and

p (w_{3} | x_{3})

, respectively. Then

\begin{matrix} lim_{n \to \infty} P r {(w_{1}^{n}, w_{2}^{n}, w_{3}^{n}, x_{1}^{n}, x_{2}^{n}, x_{3}^{n}, y^{n}) \in A_{ϵ}^{* (n)}} = 1 . \end{matrix}

(A2)

Proof.

According to (A1), we have several Markov chains

W_{1} - X_{1} - (X_{2}, Y)

,

W_{2} - X_{2} - (X_{1}, Y)

,

W_{1} - (X_{1}, X_{2}, Y) - W_{2}

. We use the Markov lemma three times: first to derive that

((x_{2}^{n}, y^{n}), x_{1}^{n}, w_{1}^{n})

are jointly typical and

((x_{1}^{n}, y^{n}), x_{2}^{n}, w_{2}^{n})

are jointly typical. Then, using the Markov chain

W_{1} - (X_{1}, X_{2}, Y) - W_{2}

, we can demonstrate that

(w_{1}^{n}, w_{2}^{n}, x_{1}^{n}, x_{2}^{n}, y^{n})

are joint typical. For a fixed

(x_{1}^{n}, x_{2}^{n}, x_{3}^{n}, y^{n}) \in A_{ϵ}^{* (n)}

,

W_{1}^{n}

is drawn from

p (w_{1} | x_{1})

, which implies that

(x_{1}^{n}, W_{1}^{n}) \in A_{ϵ}^{* (n)}

for sufficiently large n.

W_{2}^{n}

is drawn from

p (w_{2} | x_{2})

, which implies that

(x_{2}^{n}, W_{2}^{n}) \in A_{ϵ}^{* (n)}

for sufficiently large n;

W_{3}^{n}

is drawn from

p (w_{3} | x_{3})

, which implies that

(x_{3}^{n}, W_{3}^{n}) \in A_{ϵ}^{* (n)}

for sufficiently large n. Then, with high probability,

(W_{1}^{n}, W_{2}^{n}, W_{3}^{n}, x_{1}^{n}, x_{2}^{n}, x_{3}^{n}, y^{n}) \in A_{ϵ}^{* (n)}

. □

Proof of Theorem 1.

For

m = 1, 2, 3

, fix

p (w_{m} | x_{m})

and

g (W_{1}, W_{2}, W_{3}, Y)

such that the distortion constraint

E d (T, \hat{T}) ⩽ D

is satisfied. Calculate

p (w_{m}) = \sum_{x_{m}} p (x_{m}) p (w_{m} | x_{m})

. □

Generation of codebooks: Generate

2^{n R_{m}^{'}}

i.i.d codewords

{w_{m}}^{n} (s_{m}) \sim \prod_{i = 1}^{n} p (w_{m, i})

, and index them by

s_{m} \in \{1, 2, \dots, 2^{n R_{m}^{'}}\}

. Provide

2^{n R_{m}}

random bins with indices

t_{m} \in \{1, 2, \dots, 2^{n R_{m}}\}

. Randomly assign the codewords

{w_{m}}^{n} (s_{m})

to one of

2^{n R_{m}}

bins using a uniform distribution. Let

B_{m} (t_{m})

denote the set of codeword indices

s_{m}

assigned to bin index

t_{m}

.

Encoding: Given a source sequence

X_{m}^{n}

, the encoder looks for a codeword

W_{m}^{n} (s_{m})

such that

(X_{m}^{n}, W_{m}^{n} (s_{m})) \in A_{ϵ}^{* (n)}

. The encoder sends the index of the bin

t_{m}

in which

s_{m}

belongs.

Decoding: The decoder looks for a pair

(W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}))

such that

s_{m} \in B_{m} (t_{m})

and

(W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}

. If the decoder finds a unique triple

(s_{1}, s_{2}, s_{3})

, he then calculates

{\hat{T}}^{n}

, where

{\hat{T}}_{i} = g (W_{1, i}, W_{2, i}, W_{3, i}, Y_{i})

.

Analysis of the probability of error:

1. The encoders cannot find the codewords

W_{m}^{n} (s_{m})

such that

(X_{m}^{n}, W_{m}^{n} (s_{m})) \in A_{ϵ}^{* (n)}

. The probability of this event is small if

\begin{matrix} {R_{m}}^{'} > I (X_{m}, W_{m}) \end{matrix}

(A3)

2. The pair of sequences

(X_{1}^{n}, W_{1}^{n} (s_{1})) \in A_{ϵ}^{* (n)}

,

(X_{2}^{n}, W_{2}^{n} (s_{2})) \in A_{ϵ}^{* (n)}

and

(X_{3}^{n}, W_{3}^{n} (s_{3}))

\in A_{ϵ}^{* (n)}

but the codewords

{W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3})}

are not jointly typical with the side information sequences

Y^{n}

, i.e.,

(W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \notin A_{ϵ}^{* (n)}

. We have assumed that

\begin{matrix} p (w_{1}, w_{2}, w_{3}, x_{1}, x_{2}, x_{3}, y) \\ = p (w_{1} | x_{1}) p (w_{2} | x_{2}) p (w_{3} | x_{3}) p (x_{1}, x_{2}, x_{3}, y) . \end{matrix}

(A4)

Hence, by the Markov lemma, the probability of this event goes to zero if n is large enough.

3. There exists another

s^{'}

with the same bin index that is jointly typical with the side information sequences. The correct codeword indices are denoted by

s_{1}

,

s_{2}

and

s_{3}

. We first consider the situation where the codeword index

s_{1}

is in error. The probability that a randomly chosen

W_{1}^{n} ({s^{'}}_{1})

is jointly typical with

(W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n})

can be bounded as

\begin{matrix} \begin{matrix} Pr \{(W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ 2^{- n (I (W_{1}; W_{2}, W_{3}, Y) - 3 ϵ)} . \end{matrix} \end{matrix}

(A5)

The probability of this error event is bounded by the number of codewords in the bin

t_{1}

times the probability of joint typicality

\begin{matrix} \begin{matrix} Pr \{\exists s_{1}^{'} \in B_{1} (t_{1}), s_{1}^{'} \neq s_{1} : \\ (W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ \sum_{\binom{s_{1}^{'} \neq s_{1},}{s_{1}^{'} \in B_{1} (t_{1})}} Pr \{(W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ 2^{n (R_{1} - R_{1}^{'})} 2^{- n (I (W_{1}; W_{2}, W_{3}, Y) - 3 ϵ)} . \end{matrix} \end{matrix}

(A6)

Similarly, the probability that the codeword index

s_{2}

or

s_{3}

is in error can be bounded by

\begin{matrix} Pr \{\exists s_{2}^{'} \in B_{2} (t_{2}), s_{2}^{'} \neq s_{2} : \\ (W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}^{'}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ 2^{n (R_{2} - R_{2}^{'})} 2^{- n (I (W_{2}; W_{1}, W_{3}, Y) - 3 ϵ)}, \\ Pr \{\exists s_{3}^{'} \in B_{3} (t_{3}), s_{3}^{'} \neq s_{3} : \\ (W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}^{'}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ 2^{n (R_{3} - R_{3}^{'})} 2^{- n (I (W_{2}; W_{1}, W_{3}, Y) - 3 ϵ)} . \end{matrix}

(A7)

We then consider the case that two of the three codeword indices are in error. The probability that the randomly chosen

W_{1}^{n} ({s^{'}}_{1})

and

W_{2}^{n} ({s^{'}}_{2})

are jointly typical with

(W_{3}^{n} (s_{3}), Y^{n})

can be bounded as

\begin{matrix} Pr \{(W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}^{'}), W_{2}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ = \sum_{(W_{1}, W_{2}, W_{3}, Y) \in A_{ϵ}^{* (n)}} p (w_{1}^{n}) p (w_{2}^{n}) p (w_{3}^{n}, y^{n}) \\ ⩽ 2^{- n (I (W_{1}; W_{2}, W_{3}, Y) + I (W_{2}; W_{1}, W_{3}, Y) - I (W_{1}; W_{2} | W_{3}, Y) - 4 ϵ)} . \end{matrix}

(A8)

Hence, the error probability can be bounded as

\begin{matrix} Pr \{\exists s_{1}^{'} \in B_{1} (t_{1}), s_{1}^{'} \neq s_{1}, \exists s_{2}^{'} \in B_{2} (t_{2}), s_{2}^{'} \neq s_{2} : \\ (W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}^{'}), W_{2}^{n} (s_{3}), y^{n}) \in A_{ϵ}^{* (n)}\} \\ ⩽ 2^{n (R_{1} - R_{1}^{'} + R_{2} - R_{2}^{'})} \\ \times 2^{- n (I (W_{1}; W_{2}, W_{3}, Y) + I (W_{2}; W_{1}, W_{3}, Y) - I (W_{1}; W_{2} | W_{3}, Y) - 4 ϵ)} . \end{matrix}

(A9)

Similarly, we can obtain the probability that the codeword indices

(s_{1}, s_{3})

or

(s_{2}, s_{3})

are in error, which we omit here.

For the case where all the codeword indices

s_{1}

,

s_{2}

and

s_{3}

are in error, the probability that the randomly chosen

W_{1}^{n} ({s^{'}}_{1})

,

W_{2}^{n} ({s^{'}}_{2})

and

W_{3}^{n} ({s^{'}}_{3})

are jointly typical with

Y^{n}

can be bounded as

\begin{matrix} \begin{matrix} Pr \{(W_{1}^{n} (s_{1}^{'}), W_{2}^{n} (s_{2}^{'}), W_{2}^{n} (s_{3}^{'}), Y^{n}) \in A_{ϵ}^{* (n)}\} \\ = \sum_{(W_{1}, W_{2}, W_{3}, Y) \in A_{ϵ}^{* (n)}} p (w_{1}^{n}) p (w_{2}^{n}) p (w_{3}^{n}) p (Y^{n}) \\ ⩽ 2^{- n (I (W_{1}; W_{2}, W_{3}, Y) + I (W_{2}; W_{1}, W_{3}, Y))} \\ \times 2^{- n (I (W_{3}; W_{1}, W_{2}, Y) - I (W_{1}; W_{2} | W_{3}, Y) - I (W_{1}, W_{2}; W_{3} | Y) - 5 ϵ)} . \end{matrix} \end{matrix}

(A10)

The probability of the above error events goes to 0 when

\begin{matrix} \begin{matrix} R_{1}^{'} - R_{1} & ⩽ I (W_{1}; W_{2}, W_{3}, Y) \\ \dots \end{matrix} \end{matrix}

\begin{matrix} \begin{matrix} R_{1}^{'} - R_{1} + R_{2}^{'} - R_{2} + R_{3}^{'} - R_{3} & ⩽ I (W_{1}; W_{2}, W_{3}, Y) \\ + I (W_{2}; W_{1}, W_{3}, Y) \\ + I (W_{3}; W_{1}, W_{2}, Y) \\ - I (W_{1}; W_{2} | W_{3}, Y) \\ - I (W_{1}, W_{2}; W_{3} | Y) . \end{matrix} \end{matrix}

(A11)

Therefore, (2) can be obtained by combining (A3) and (A11).

If

(s_{1}, s_{2}, s_{3})

are correctly decoded, we have

(X_{1}^{n}, X_{2}^{n}, X_{3}^{n}, W_{1}^{n} (s_{1}), W_{2}^{n} (s_{2}), W_{3}^{n} (s_{3}), Y^{n}) \in A_{ϵ}^{* (n)}

. Therefore, the empirical joint distribution is close to the distribution

p (w_{1} | x_{1})

p (w_{2} | x_{2})

p (w_{3} | x_{3}) p (x_{1}, x_{2}, x_{3}, y)

that achieves distortion D.

Appendix B

Here, we provide the rigorous proof of Theorem 2. Define a series of encoders

f_{m} : X_{m}^{n} \to \{1, \dots, 2^{n R_{m}}\}

and decoders

g : \prod_{m \in M} \{1, \dots, 2^{n R_{m}}\} \times Y^{n} \to {\hat{T}}^{n}

that achieve the given distortion D. We can derive the following inequalities

\begin{array}{l} n R_{1} \\ \geq H (f_{1} (X_{1}^{n})) \\ (A12a) & \geq H (f_{1} (X_{1}^{n}) | f_{2} (X_{2}^{n}), f_{3} (X_{3}^{n}), Y^{n}) \\ \geq H (f_{1} (X_{1}^{n}) | f_{2} (X_{2}^{n}), f_{3} (X_{3}^{n}), Y^{n}) \\ - H (f_{1} (X_{1}^{n}) | f_{2} (X_{2}^{n}), f_{3} (X_{3}^{n}), Y^{n}, X_{1}^{n}, X_{2}^{n}, X_{3}^{n}) \\ (A12b) & = I (X_{1}^{n}, X_{2}^{n}, X_{3}^{n}; f_{1} (X_{1}^{n}) | f_{2} (X_{2}^{n}), f_{3} (X_{3}^{n}), Y^{n}) \\ = \sum_{i = 1}^{n} I (X_{1, i}, X_{2, i}, X_{3, i}; f_{1} (X_{1}^{n}) | f_{2} (X_{2}^{n}), f_{3} (X_{3}^{n}), \\ (A12c) & Y^{n}, X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}, X_{3, 1}^{i - 1}) \\ = \sum_{i = 1}^{n} H (X_{1, i}, X_{2, i} | f_{2} (X_{2}^{n}), Y^{n}, X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}) \\ (A12d) & - H (X_{1, i}, X_{2, i} | f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}), Y^{n}, X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}) \\ (A12e) & = \sum_{i = 1}^{n} H (X_{1, i}, X_{2, i} | W_{2, i}, Y_{i}) - H (X_{1, i}, X_{2, i} | W_{1, i}, W_{2, i}, Y_{i}) \\ = \sum_{i = 1}^{n} I (X_{1, i}, X_{2, i}; W_{1, i} | W_{2, i}, Y_{i}), \end{array}

where (A12a) follows from the fact that conditioning reduces entropy, (A12b) and (A12d) are obtained by the definition of conditional mutual information and (A12c) is the chain rule of mutual information. In (A12e), we let

W_{1, i} = (f_{1} (X_{1}^{n}), X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}, Y_{1}^{i - 1}, Y_{i + 1}^{n})

, and

W_{2, i} = (f_{2} (X_{2}^{n}), X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}, Y_{1}^{i - 1}, Y_{i + 1}^{n})

. Similarly, we have

\begin{matrix} n R_{2} \geq \sum_{i = 1}^{n} I (X_{1, i}, X_{2, i}, X_{3, i}; W_{2, i} | W_{1, i}, W_{3, i}, Y_{i}), \\ n R_{3} \geq \sum_{i = 1}^{n} I (X_{1, i}, X_{2, i}, X_{3, i}; W_{3, i} | W_{2, i}, W_{3, i}, Y_{i}) . \end{matrix}

(A13)

For the sum rate part, we have

\begin{array}{l} n (R_{1} + R_{2}) \\ \geq H (f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n})) \\ (A14a) & \geq H (f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}) | Y^{n}) \\ \geq H (f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}) | Y^{n}) \\ - H (f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}) | Y^{n}, X_{1}^{n}, X_{2}^{n}) \\ (A14b) & = I (f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}); X_{1}^{n}, X_{2}^{n} | Y^{n}) \\ (A14c) & = \sum_{i = 1}^{n} I (f_{1} (X_{1, i}), f_{2} (X_{1, i}); X_{1}^{n}, X_{2}^{n} | Y^{n}, X_{1}^{i - 1}, X_{2}^{i - 1}) \\ = \sum_{i = 1}^{n} H (X_{1, i}, X_{2, i} | Y^{n}, X_{1}^{i - 1}, X_{2}^{i - 1}) \\ (A14d) & - H (X_{1, i}, X_{2, i} | f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}), X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}, Y^{n}) \\ = \sum_{i = 1}^{n} H (X_{1, i}, X_{2, i} | Y_{i}) \\ (A14e) & - H (X_{1, i}, X_{2, i} | f_{1} (X_{1}^{n}), f_{2} (X_{2}^{n}), X_{1, 1}^{i - 1}, X_{2, 1}^{i - 1}, Y^{n}) \\ (A14f) & = \sum_{i = 1}^{n} H (X_{1, i}, X_{2, i} | Y_{i}) - H (X_{1, i}, X_{2, i} | W_{1, i}, W_{2, i}, Y_{i}) \\ = \sum_{i = 1}^{n} I (X_{1, i}, X_{2, i}; W_{1, i}, W_{2, i} | Y_{i}), \end{array}

where (A14a) follows from the fact that conditioning reduces entropy, (A14b) and (A14d) are obtained by the definition of conditional mutual information, (A14c) is the chain rule of mutual information and (A14e) follows from the fact that

X_{1, i}, X_{2, i}

is conditionally independent of the past and feature of

X_{1}, X_{2}, Y

given

Y_{i}

. In (A14f), the definition of

W_{1, i}, W_{2, i}

is used again.

References

Han, T.; Yang, Q.; Shi, Z.; He, S.; Zhang, Z. Semantic-preserved communication system for highly efficient speech transmission. IEEE J. Sel. Areas Commun. 2022, 41, 245–259. [Google Scholar] [CrossRef]
Adikari, T.; Draper, S. Two-terminal source coding with common sum reconstruction. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1420–1424. [Google Scholar]
Korner, J.; Marton, K. How to encode the modulo-two sum of binary sources (corresp.). IEEE Trans. Inf. Theory 1979, 25, 219–221. [Google Scholar] [CrossRef]
Pastore, A.; Lim, S.H.; Feng, C.; Nazer, B.; Gastpar, M. Distributed Lossy Computation with Structured Codes: From Discrete to Continuous Sources. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1681–1686. [Google Scholar]
Amiri, M.M.; Gunduz, D.; Kulkarni, S.R.; Poor, H.V. Federated Learning with Quantized Global Model Updates. arXiv 2020, arXiv:2006.10672. [Google Scholar] [CrossRef]
Gruntkowska, K.; Tyurin, A.; Richtárik, P. Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity. arXiv 2024, arXiv:2402.06412. [Google Scholar] [CrossRef]
Amiri, M.M.; Gündüz, D.; Kulkarni, S.R.; Poor, H.V. Convergence of Federated Learning Over a Noisy Downlink. IEEE Trans. Wirel. Commun. 2022, 21, 1422–1437. [Google Scholar] [CrossRef]
Stavrou, P.A.; Kountouris, M. The Role of Fidelity in Goal-Oriented Semantic Communication: A Rate Distortion Approach. IEEE Trans. Commun. 2023, 71, 3918–3931. [Google Scholar] [CrossRef]
Liu, J.; Zhang, W.; Poor, H.V. A rate-distortion framework for characterizing semantic information. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Virtual, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2894–2899. [Google Scholar]
Dobrushin, R.; Tsybakov, B. Information transmission with additional noise. IRE Trans. Inf. Theory 1962, 8, 293–304. [Google Scholar] [CrossRef]
Wolf, J.; Ziv, J. Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Trans. Inf. Theory 1970, 16, 406–411. [Google Scholar] [CrossRef]
Guo, T.; Wang, Y.; Han, J.; Wu, H.; Bai, B.; Han, W. Semantic compression with side information: A rate-distortion perspective. arXiv 2022, arXiv:2208.06094. [Google Scholar] [CrossRef]
Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
Slepian, D.; Wolf, J. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
Wagner, A.B.; Anantharam, V. An improved outer bound for multiterminal source coding. IEEE Trans. Inf. Theory 2008, 54, 1919–1937. [Google Scholar] [CrossRef]
Lim, S.H.; Feng, C.; Pastore, A.; Nazer, B.; Gastpar, M. Towards an algebraic network information theory: Distributed lossy computation of linear functions. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1827–1831. [Google Scholar]
Cheng, S.; Stankovic, V.; Xiong, Z. Computing the channel capacity and rate-distortion function with two-sided state information. IEEE Trans. Inf. Theory 2005, 51, 4418–4425. [Google Scholar] [CrossRef]
Gastpar, M. The Wyner-Ziv problem with multiple sources. IEEE Trans. Inf. Theory 2004, 50, 2762–2768. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
Ku, G.; Ren, J.; Walsh, J.M. Computing the rate distortion region for the CEO problem with independent sources. IEEE Trans. Signal Process. 2014, 63, 567–575. [Google Scholar] [CrossRef]

Figure 1. Distributed remote compression of a latent variable with M correlated sources at distributed transmitters and side information at the receiver.

Figure 2. Contour plot of the rate distortion region with two distributed binary sources

{X_{1}, X_{2}}

, where the labels on the contours represent the distortion values D on

d (t, \hat{t})

.

Figure 2. Contour plot of the rate distortion region with two distributed binary sources

{X_{1}, X_{2}}

, where the labels on the contours represent the distortion values D on

d (t, \hat{t})

.

Figure 3. Surface plot of the rate-distortion region.

Figure 4. The rate-distortion function for the case when

M = 1

, i.e., the Wyner–Ziv problem.

Figure 4. The rate-distortion function for the case when

M = 1

, i.e., the Wyner–Ziv problem.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, J.; Yang, Q. Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy 2025, 27, 844. https://doi.org/10.3390/e27080844

AMA Style

Tang J, Yang Q. Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy. 2025; 27(8):844. https://doi.org/10.3390/e27080844

Chicago/Turabian Style

Tang, Jiancheng, and Qianqian Yang. 2025. "Rate-Distortion Analysis of Distributed Indirect Source Coding" Entropy 27, no. 8: 844. https://doi.org/10.3390/e27080844

APA Style

Tang, J., & Yang, Q. (2025). Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy, 27(8), 844. https://doi.org/10.3390/e27080844

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rate-Distortion Analysis of Distributed Indirect Source Coding

Abstract

1. Introduction

2. An Achievable Rate Region

3. A General Outer Bound

4. Conclusive Rate-Distortion Results

5. Iterative Optimization Framework Based on BA Algorithm

6. Numerical Examples

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI