Next Article in Journal
PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning
Previous Article in Journal
Multifractal Nonlinearity in Behavior During a Computer Task with Increasing Difficulty: What Does It Teach Us?
Previous Article in Special Issue
Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rate-Distortion Analysis of Distributed Indirect Source Coding

College of information Science and Electronic Engineering, Zhejiang University, Hangzhou 310007, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(8), 844; https://doi.org/10.3390/e27080844
Submission received: 7 July 2025 / Revised: 1 August 2025 / Accepted: 6 August 2025 / Published: 8 August 2025
(This article belongs to the Special Issue Semantic Information Theory)

Abstract

Motivated by task-oriented semantic communication and distributed learning systems, this paper studies a distributed indirect source coding problem where M correlated sources are independently encoded for a central decoder. The decoder has access to correlated side information in addition to the messages received from the encoders and aims to recover a latent random variable under a given distortion constraint rather than recovering the sources themselves. We characterize the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we develop a distributed Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. Numerical examples are provided to demonstrate the effectiveness of the proposed approach in calculating the rate-distortion region.

1. Introduction

Consider the multiterminal source coding setup as shown in Figure 1. Let ( T , X 1 , , X M , Y ) p ( t , x 1 , , x M , y ) be a discrete memoryless source (DMS) taking values in the finite alphabets T × X 1 × × X M × Y according to a fixed and known probability distribution p ( t , x 1 , , x M , y ) . In this setup, the encoder m , m M : = { 1 , , M } has local observations X m n : = ( X m 1 , , X m n ) . The agents independently encode their observations into binary sequences at rates { R 1 , , R M } bits per input symbol, respectively. The decoder with side information Y n = ( Y 1 , , Y n ) aims to recover some task-oriented latent information T n : = ( T 1 , , T n ) which is correlated with ( X 1 n , , X M n ) , but it is not observed directly by any of the encoders. We are interested in the lossy reconstruction of T n with the average distortion measured by E 1 n i = 1 n d ( T i , T ^ i ) for some prescribed single-letter distortion measure d ( · , · ) . A formal ( 2 n R 1 , , 2 n R M , n ) rate-distortion code for this setup consists of the following:
  • M independent encoders, where encoder m M assigns an index s m ( x m n ) 1 , , 2 n R m to each sequence x m n X m n ;
  • A decoder that produces the estimate t ^ n ( s 1 , , s M , y n ) T n to each index tuple ( s 1 , , s M ) and side information y n Y n .
A rate tuple ( R 1 , , R M ) is said to be achievable with the distortion measure d ( · , · ) and the distortion value D if there exists a sequence of ( 2 n R 1 , , 2 n R M , n ) codes that satisfy
lim sup n E 1 n i = 1 n d ( T i , T ^ i ) D .
The rate-distortion region R X 1 , , X m | Y * D for this distributed source coding problem is the closure of the set of all achievable rate tuples ( R 1 , , R M ) that permit the reconstruction of the latent variable T n within the average distortion constraint D.
The problem as illustrated in Figure 1 is motivated by semantic/task-oriented communication and distributed learning problems. In semantic/task-oriented communication [1], the decoder only needs to reconstruct some task-oriented information implied by the sources. For instance, it might extract hidden features from a scene captured by multiple cameras positioned at various angles. Here, T i may also be a deterministic function of the source samples ( X 1 , i , , X M , i ) , which then reduces to the problem of lossy distributed function computation [2,3,4]. A similar problem also arises in distributed training. Consider Y n as the global model available at the server at an iteration of a federated learning process and ( X 1 n , , X M n ) as the independent correlated versions of this model after downlink transmission and local training. The server aims to recover the updated global model, T n , based on the messages received from all M clients. It is often assumed that the global model is transmitted to the clients intact, but in practical scenarios where downlink communication is limited, the clients may receive noisy or compressed versions of the global model [5,6,7].
For the case of M = 1 , the problem reduces to the remote compression in a point-to-point scenario with side information available at the decoder. In [8,9], the authors studied this problem without the correlated side information at the receiver, which is motivated in the context of semantic communication. This problem is known in the literature as the remote rate-distortion problem [10,11], and the rate-distortion trade-off is fully characterized in the general case. The authors studied this trade-off in detail for specific source distributions in [8]. Similarly, the authors of [12] characterized the remote rate-distortion trade-off when correlated side information is available both at the encoder and decoder. Our problem for M = 1 can be solved by combining the remote rate-distortion problem with the classical Wyner–Ziv rate-distortion function [13,14].
The rate-distortion region for the multi-terminal version of the remote rate-distortion problem considered here remains open. Wagner et al. developed an improved outer bound for a general multiterminal source model [15]. Sung et al. proposed an achievable rate region for the distributed lossy computation problem without giving an conclusive rate-distortion function [16]. Gwanmo et al. considered a special case in which the sources are independent and derived a single-letter expression for the rate-distortion region [17]. Gastpar [18] considered the lossy compression of the source sequences in the presence of side information at the receiver. He characterized the rate-distortion region for the special case, in which X i values are conditionally independent given the side information.
To provide a performance reference for practical coding schemes, it is necessary not only to characterize the exact theoretical expression for the rate-distortion region but also to calculate the rate-distortion region for a given distribution and a specific distortion metric. In the traditional single-source direct scenario, determining the rate-distortion function involves solving a convex optimization problem, which can be addressed using the globally convergent iterative Blahut–Arimoto algorithm, as discussed in [17]. In this paper, we are interested in computing the rate-distortion region R X 1 , , X m | Y * D for the general distributed coding problem. We pay particular attention to the special case in which the sources are conditionally independent given the side information, which is motivated by the aforementioned examples. For the sake of brevity of the presentation, we set M = 3 in this paper with the understanding that the results can be readily extended to an arbitrary number of sources. To numerically compute the rate-distortion region, we introduce a distributed iterative optimization framework that generalizes the classical Blahut–Arimoto (BA) algorithm. While the standard BA algorithm is designed for single-source point-to-point settings, our framework extends its alternating minimization structure to a distributed scenario with multiple encoders, indirect source reconstruction, and decoder-side side information. This extension enables the computation of rate-distortion regions in settings that are significantly more general than those considered in the existing literature.
In Section 2, we derive an achievable region R a D R X 1 , X 2 , X 3 | Y * D . In Section 3, we determine a general outer bound R o D R X 1 , X 2 , X 3 | Y * D . In Section 4, we show that the two regions coincide and the region is optimal when the sources ( X 1 , X 2 , X 3 ) are conditionally independent given the side information Y. In Section 5, we develop an alternating minimization framework to calculate the rate-distortion region by generalizing the Blahut–Arimoto (BA) algorithm. In Section 6, we demonstrate the effectiveness of the proposed framework through numerical examples.

2. An Achievable Rate Region

In this section, we introduce an achievable rate region R a D , which is contained within the goal rate-distortion region R a D R X 1 , X 2 , X 3 | Y * D .
Theorem 1.
R a D R X 1 , X 2 , X 3 | Y * D , where R a D is the set of all rate tuples ( R 1 , R 2 , R 3 ) such that there exists a tuples ( W 1 , W 2 , W 3 ) of discrete random variables with p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 , x 2 , x 3 , y , for which the following conditions are satisfied
  R 1 I X 1 ; W 1 I W 1 ; W 2 , W 3 , Y
  R 2 I X 2 ; W 2 I W 2 ; W 1 , W 3 , Y
  R 3 I X 3 ; W 3 I W 3 ; W 1 , W 2 , Y
    R 1 + R 2 I X 1 ; W 1 + I X 2 ; W 2 I W 1 ; W 2 , W 3 , Y I W 2 ; W 1 , W 3 , Y + I W 1 ; W 2 | W 3 , Y
  R 1 + R 3 I X 1 ; W 1 + I X 3 ; W 3 I W 1 ; W 2 , W 3 , Y I W 3 ; W 1 , W 2 , Y + I W 1 ; W 3 | W 2 , Y
  R 2 + R 3 I X 2 ; W 2 + I X 3 ; W 3 I W 2 ; W 1 , W 3 , Y I W 3 ; W 1 , W 2 , Y + I W 2 ; W 3 | W 1 , Y ,
  R 1 + R 2 + R 3 I X 1 ; W 1 + I X 2 ; W 2 + I X 3 ; W 3 I W 1 ; W 2 , W 3 , Y I W 2 ; W 1 , W 3 , Y I W 3 ; W 1 , W 2 , Y + I W 1 ; W 2 | W 3 , Y + I W 1 , W 2 ; W 3 | Y ,
and there exist a decoder g · such that
E d ( T , g ( W 1 , W 2 , W 3 , Y ) ) D .
The auxiliary random variables W 1 , W 2 and W 3 serve as intermediate variables in the encoding process, which was introduced to optimize compression efficiency. Ideally, W 1 , W 2 and W 3 should carry information about the source X 1 , X 2 and X 3 , respectively. But this information should be as independent as possible from the side information Y in order to avoid redundancy and fully exploit the decoder-side knowledge. This helps minimize the required transmission rate R. The rigorous proof of Theorem 1 is provided in Appendix A.
Corollary 1.
The conditions (25) of Theorem 1 can be expressed equivalently as
  R 1 I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y
  R 2 I X 1 , X 2 , X 3 ; W 2 | W 1 , W 3 , Y
  R 3 I X 1 , X 2 , X 3 ; W 3 | W 1 , W 2 , Y
R 1 + R 2 I X 1 , X 2 , X 3 ; W 1 , W 2 | W 3 , Y
R 1 + R 3 I X 1 , X 2 , X 3 ; W 1 , W 3 | W 2 , Y
R 2 + R 3 I X 1 , X 2 , X 3 ; W 2 , W 3 | W 1 , Y
R 1 + R 2 + R 3 I ( X 1 , X 2 , X 3 ; W 1 , W 2 , W 3 | Y )
Proof. 
First, we prove R 1 I X 1 ; W 1 I W 1 ; W 2 , W 3 , Y = I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y . The bound of (4a) can be written as
I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y = I X 1 ; W 1 | W 2 , W 3 , Y + I X 2 , X 3 ; W 1 | X 1 , W 2 , W 3 , Y = 0 ,
where I X 2 , X 3 ; W 1 | X 1 , W 2 , W 3 , Y = 0 because ( X 2 , X 3 , Y ) is conditionally independent of W 1 for given X 1 . For the first term of the right in (5), we have
I X 1 ; W 1 | W 2 , W 3 , Y + I W 1 ; W 2 , W 3 , Y = I W 1 ; X 1 , W 2 , W 3 , Y = I W 1 ; X 1 + I W 2 , W 3 , Y ; W 1 | X 1 = 0 ,
where I W 2 , W 3 , Y ; W 1 | X 1 = 0 because ( W 2 , W 3 , Y ) is conditionally independent of W 1 given X 1 . Then, we have
I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y = I X 1 ; W 1 | W 2 , W 3 , Y = I W 1 ; X 1 I W 1 ; W 2 , W 3 , Y R 1 .
This completes the proof of (4a). (4b) and (4c) can be proved in the same way.
Then, we prove (4d), the bound of the sum rate R 1 + R 2 can be written as
I X 1 , X 2 , X 3 ; W 1 , W 2 | W 3 , Y = I X 1 ; W 1 , W 2 | W 3 , Y + I X 2 , X 3 ; W 1 , W 2 | W 3 , Y , X 1
= I X 1 ; W 2 | W 3 , Y + I X 1 ; W 1 | W 2 , W 3 , Y = I X 1 ; W 1 I W 1 ; W 2 , W 3 , Y R 1 I X 2 , X 3 ; W 1 | W 3 , Y , X 1 = 0 + I ( X 2 , X 3 ; W 2 | W 1 , W 3 , Y , X 1 ) ,
where (8a) and (8b) are obtained by the chain rule of mutual information, I ( X 2 , X 3 ; W 1 | W 3 , Y , X 1 ) = 0 , because ( X 2 , X 3 ) is conditionally independent of W 1 given X 1 . For the last term in (8b), we have
I X 2 , X 3 ; W 2 | W 1 , W 3 , Y , X 1 + I X 1 ; W 2 | W 1 , W 3 , Y
= I X 1 , X 2 , X 3 ; W 2 | W 1 , W 3 , Y
= I X 2 , X 3 ; W 2 | W 1 , W 3 , Y + I X 1 ; W 2 | W 1 , W 3 , Y , X 2 , X 3 = 0
= I X 2 ; W 2 | W 1 , W 3 , Y = I X 2 ; W 2 I W 2 ; W 1 , W 3 , Y R 2 + I X 1 , X 3 ; W 2 | W 1 , W 3 , Y , X 2 = 0
where I X 1 ; W 2 | W 1 , W 3 , Y , X 2 , X 3 = 0 because X 1 is conditionally independent of W 2 given X 2 , and I X 1 , X 3 ; W 2 | W 1 , W 3 , Y , X 2 = 0 because ( X 1 , X 3 ) is conditionally independent of W 2 given X 2 . Thus, the last term in (8b) can be written as
I X 2 , X 3 ; W 2 | W 1 , W 3 , Y , X 1 = I X 2 ; W 2 | W 1 , W 3 , Y = I X 2 ; W 2 I W 2 ; W 1 , W 3 , Y R 2 I X 1 ; W 2 | W 1 , W 3 , Y
For the last term in the right-hand side, we have
I X 1 ; W 2 | W 1 , W 3 , Y + I W 1 ; W 2 | W 3 , Y = I X 1 , W 1 ; W 2 | W 3 , Y = I X 1 ; W 2 | W 3 , Y + I W 1 ; W 2 | X 1 , W 1 , W 3 , Y = 0 ,
thus, the last term in (10) can be written as
I X 1 ; W 2 | W 1 , W 3 , Y = I X 1 ; W 2 | W 3 , Y I W 1 ; W 2 | W 3 , Y
By combining (8), (9), (10) and (12), we have
R 1 + R 2 I X 1 , X 2 , X 3 ; W 1 , W 2 | W 3 , Y = I X 1 ; W 1 + I X 2 ; W 2 I W 1 ; W 2 , W 3 , Y I W 2 ; W 1 , W 3 , Y + I W 1 ; W 2 | W 3 , Y .
The rest sum rate bounds in (4) can be proved in the same way, which is omitted here. □

3. A General Outer Bound

In this section, we derive a region R o D which contains the goal rate-distortion region R o D R X 1 , X 2 , X 3 | Y * D .
Theorem 2.
R o D R X 1 , X 2 , X 3 | Y * D , where R o D is the set of all rate triples ( R 1 , R 2 , R 3 ) such that there exists a triple ( W 1 , W 2 , W 3 ) of discrete random variables with p w 1 | x 1 , x 2 , x 3 , y = p w 1 | x 1 , p w 2 | x 1 , x 2 , x 3 , y = p w 2 | x 2 and p w 3 | x 1 , x 2 , x 3 , y = p w 3 | x 3 , for which the following conditions are satisfied
R 1 I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y R 2 I X 1 , X 2 , X 3 ; W 2 | W 1 , W 3 , Y R 3 I X 1 , X 2 , X 3 ; W 3 | W 1 , W 2 , Y R 1 + R 2 I X 1 , X 2 , X 3 ; W 1 , W 2 | W 3 , Y R 1 + R 3 I X 1 , X 2 , X 3 ; W 1 , W 3 | W 2 , Y R 2 + R 3 I X 1 , X 2 , X 3 ; W 2 , W 3 | W 1 , Y R 1 + R 2 + R 3 I ( X 1 , X 2 , X 3 ; W 1 , W 2 , W 3 | Y )
and there exists a decoding function g · such that
E d ( T , g 1 ( W 1 , W 2 , W 3 , Y ) ) D ,
The rigorous proof of Theorem 2 is provided in Appendix B.
While the expressions of the inner bound (4) and the outer bound (14) are the same, these two regions do not coincide because the marginal constrains p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 , x 2 , x 3 , y in Theorem 1 limit the degree of freedom for choosing the auxiliary random variables ( W 1 , W 2 , W 3 ) compared with the marginal constraints in Theorem 2. In the next section, we will demonstrate that the additional degree of freedom in choosing the auxiliary random variables ( W 1 , W 2 , W 3 ) in Theorem 2 cannot lower the value of the rate-distortion functions.

4. Conclusive Rate-Distortion Results

Corollary 2.
If X 1 , X 2 , X 3 are conditionally independent given the side information Y, R a D R X 1 , X 2 , | Y * D where R a D is the set of all rate triples ( R 1 , R 2 , R 3 ) such that there exists a triple ( W 1 , W 2 , W 3 ) of random variables with p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 | y p x 2 | y p x 3 | y p y , for which the following conditions are satisfied
R 1 I X 1 ; W 1 I W 1 ; Y R 2 I X 2 ; W 2 I W 2 ; Y R 3 I X 3 ; W 3 I W 3 ; Y
and there exist decoding functions g · such that
E d ( T , g ( W 1 , W 2 , W 3 , Y ) ) D .
Proof. 
Since the joint distribution can be written as
p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 , w 2 , w 3 | x 1 , x 2 , x 3 , y p x 1 , x 2 , x 3 | y p y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 | y × p x 2 | y p x 3 | y p y ,
the terms I W 1 ; W 2 | W 3 , Y in the sum rate bound (2d) are 0 because W 2 is conditionally independent of W 1 . Similarly, the terms I W 1 ; W 3 | W 2 , Y , I W 2 ; W 3 | W 1 , Y and I W 1 ; W 2 | W 3 , Y + I W 1 , W 2 ; W 3 | Y in the sum rate bound (2e)–(2g) are all 0. Therefore, the sum rate bound can be expressed as the combination of the side bounds and hence can be omitted. Meanwhile, the term I W 1 ; W 2 , W 3 , Y in the side bound in (2a) can be written as
I W 1 ; W 2 , W 3 , Y = I W 1 ; Y + I W 2 , W 3 ; W 1 | Y = 0 .
Similarly, we have
I W 2 ; W 1 , W 3 , Y = I W 2 ; Y + I W 1 , W 3 ; W 2 | Y = 0 I W 3 ; W 1 , W 2 , Y = I W 3 ; Y + I W 1 , W 2 ; W 3 | Y = 0 .
This completes the proof Corollary 2. □
Corollary 3.
If X 1 , X 2 , X 3 are conditionally independent given the side information Y, R o D R o D , and hence R o D R X 1 , X 2 , W 3 | Y * D where R o D is the set of all rate triples ( R 1 , R 2 , R 3 ) such that there exists a triple ( W 1 , W 2 , W 3 ) of discrete random variables with p w 1 | x 1 , x 2 , w 3 , y = p w 1 | x 1 , p w 2 | x 1 , x 2 , w 3 , y = p w 2 | x 2 and p w 3 | x 1 , x 2 , x 3 , y = p w 3 | x 3 , for which the following conditions are satisfied
R 1 I X 1 ; W 1 I W 1 ; Y R 2 I X 2 ; W 2 I W 2 ; Y R 3 I X 3 ; W 3 I W 3 ; Y ,
and there exists decoding functions g · such that
E d ( T , g 1 ( W 1 , W 2 , W 3 , Y ) ) D ,
Proof. 
First, we can enlarge the region R o D by omitting the sum rate bound in (14). Then, the side rate bounds in (14) can be relaxed as
R 1 I X 1 , X 2 , X 3 ; W 1 | W 2 , W 3 , Y I X 1 ; W 1 | W 2 , W 3 , Y + I X 2 , X 3 ; W 1 | W 2 , W 3 , Y , X 1 I X 1 ; W 1 | W 2 , W 3 , Y = I X 1 ; W 1 , W 2 , W 3 | Y I X 1 ; W 2 , W 3 | Y = 0 .
According to the conditional independence relations, we have I X 1 ; W 2 , W 3 | Y = 0 , and then we have
R 1 I X 1 ; W 1 , W 2 , W 3 | Y
= I X 1 ; W 1 | Y + I X 1 ; W 2 , W 3 | Y , W 1 = 0
= I X 1 , Y ; W 1 I W 1 ; Y
= I X 1 ; W 1 I W 1 ; Y
where (24a) is obtained by the condition that X 1 , X 2 , X 3 are conditionally independent given the side information Y, (24b) follows from the chain rule of mutual information and (24c) is derived by the Markov chain Y X 1 W 1 . The same derivation can be applied to R 2 and R 3 ; this proves Corollary 3. □
Theorem 3.
If X 1 , X 2 , X 3 are conditionally independent given the side information Y,
R a D = R o D = R X 1 , X 2 , X 3 | Y * D .
Proof. 
We note that the only difference between R a D and R o D is the degrees of freedom when choosing the auxiliary random variables ( W 1 , W 2 , W 3 ) , and all of the mutual information functions in (16) and (21) only depend on the marginal distribution ( X 1 , W 1 , Y ) , ( X 2 , W 2 , Y ) and ( X 3 , W 3 , Y ) . Randomly choose a certain rate triple ( R 1 , R 2 , R 3 ) with a auxiliary random variable triple ( W 1 , W 2 , W 3 ) meeting the conditions of Corollary 3, and the corresponding joint distribution is given in (18). Then, we construct the auxiliary random variables ( W 1 , W 2 , W 3 ) such that
p W 1 | X 1 w 1 | s 1 = w 2 , s 2 , w 3 , s 3 p w 1 , w 2 , w 3 | s 1 , s 2 , s 3 p s 2 , s 3 | s 1 p W 2 | X 2 w 2 | s 2 = w 1 , s 1 , w 3 , s 3 p w 1 , w 2 , w 3 | s 1 , s 2 , s 3 p s 1 , s 3 | s 2 p W 3 | X 3 w 3 | s 3 = w 1 , s 1 , w 2 , s 2 p w 1 , w 2 , w 3 | s 1 , s 2 , s 3 p s 1 , s 2 | s 3 .
The joint distribution
p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , Y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 | y p x 2 | y p x 3 | y p y
has the same marginal distributions on ( X 1 , W 1 , Y ) , ( X 2 , W 2 , Y ) and ( X 3 , W 3 , Y ) . Therefore, the additional degree of freedom for choosing the auxiliary random variables ( W 1 , W 2 , W 3 ) in Corollary 3 cannot lower the value of rate-distortion functions. This proves Theorem 3. The arguments leading to Theorem 3 indicate that the result extends to the M sources scenario
R i I ( X i ; W i ) I ( W i ; Y ) for all l { 1 , , M } .

5. Iterative Optimization Framework Based on BA Algorithm

In this section, we present the iterative optimization framework for calculating the rate distortion region. Starting with the standard Lagrange multiplier method, the problem of calculating the rate-distortion region R X 1 , , X m | Y * D is equivalent to minimize
i M : = { 1 , , M } I ( W i ; X i ) I ( W i ; Y ) + λ ( E [ d ( T , T ^ ) ] D )
By the definition of mutual information, we can rewrite (29) as
L λ ( Q , q , q ) = y , t ^ , x i , w i , i M p ( y , x i ) q i ( w i x i ) q ( t ^ y , w ) log q i ( w i x i ) Q i ( w i y ) + λ w , x , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) i M q i ( w i x i ) ,
where Q , q , q represent the distributions that need to be iteratively updated, and the vectorized notation Q represents the conditional distribution of the auxiliary variables given Y, i.e., [ Q i ( w i y ) w i W i , y Y , i M ] , q represents the conditional distribution of the auxiliary variables given sources X i , [ q i ( w i x i ) w i W i , x i X i , i M ] and q represents the conditional distribution of the indirect variable T given Y and auxiliary variables, q ( t ^ y , w 1 , , w M ) .
Lemma 1
(Optimization of Q ). For a fixed Q m , q , q , the Lagrangian L λ ( Q , q , q ) is minimized by
Q m * ( w m y ) x m , t ^ p ( y , x m ) q ( w m x m ) q ( t ^ y , w ) x m , t ^ , w m p ( y , x m ) q ( w m x m ) q ( t ^ y , w ) ,
where Q m = Q i ( w i y ) w i W i , y Y , i M m .
Proof. 
For any Q m
L λ ( Q m * , q , q ) L λ ( Q m , q , q ) = y , t ^ , x m , w m p ( y , x m ) q i ( w m x m ) q ( t ^ y , w ) log q m ( w m x m ) Q m * ( w m y ) y , t ^ , x m , w m p ( y , x m ) q i ( w m x m ) q ( t ^ y , w ) log q m ( w m x m ) Q m ( w m y ) = y , t ^ , x m , w m p ( y , x m ) q i ( w m x m ) q ( t ^ y , w ) log Q m ( w m y ) Q m * ( w m y ) ( a ) y , t ^ , x m , w m p ( y , x m ) q i ( w m x m ) q ( t ^ y , w ) ( Q m ( w m y ) Q m * ( w m y ) 1 ) = 0 ,
where (a) follows from the fact that log ( 1 + x ) x , and the equality is achieved if Q m * = Q m . This completes the proof of Lemma 1. □
Lemma 2
(Optimization of q ). For a fixed Q , q m , q , the Lagrangian L λ ( Q , q , q ) is minimized by
q * ( w m x m ) = exp y p ( y x m ) log Q ( w m y ) λ w m , x m , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) q m ( w m x m ) w m exp y p ( y x m ) log Q ( w m y ) λ w m , x m , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) q m ( w m x m ) ,
and the minimum is given by
L λ ( Q , q m * , q ) = x i , i M p ( x i ) min w i [ y , t ^ p ( y x i ) q ( t ^ y , w ) log q * ( w i x i ) Q ( w i y ) + λ w , x , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) i M q i ( w i x i ) ] .
Proof. 
For a fixed Q , q m , q , the Lagrangian L λ ( Q , q , q ) is minimized by q * ( w m x m ) if and only if the following Kuhn–Tucker (KT) conditions are satisfied
L λ q m q m * = γ , if q * ( w m x m ) > 0 ,
and
L λ q m q m * γ , if q m * ( w m x m ) = 0 .
Since
L λ q m = x m , y p ( x m , y ) log p ( w m x m ) p ( w m y ) + 1 + λ w , x , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) q m ( u m x m ) ,
the first KT condition (35) becomes
γ ˜ = x m , y p ( x m , y ) log p ( w m x m ) log p ( w m y ) + λ w m , x m , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) q m ( w m x m ) ,
where
q m ( w m x m ) = i M m q i ( w i x i )
and we have
q ( w m x m ) = exp γ ˜ p ( x m ) exp ( y p ( y x m ) log Q ( w m y ) λ w m , x m , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) q m ( w m x m ) ) .
Then, (33) can be obtained after normalizing q ( w m x m ) . □
Lemma 3
(Optimization of q ). For a fixed Q , q , the Lagrangian L λ ( Q , q , q ) is minimized by the maximum Bayes detector.
q * ( t ^ w , y ) = α ( w ) , t ^ arg min t ^ T ^ E [ d ( t , t ^ ) W = w ] , 0 , otherwise
where α ( w ) is selected to guarantee
t ^ q ( t ^ w , y ) = 1
and E [ d ( t , t ^ ) W = w ] denotes
w m , x m , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) i M q i ( w i x i )
Proof. 
We note that the distortion term in the Lagrangian L λ that depends on q ( t ^ y , w ) is the mean distortion, which can be minimized by a Bayes detector.
Based on Lemmas 1–3, we have the iterative algorithm for computing the rate-distortion region for the distributed indirect source with decoder side information. For a given i M , we iteratively calculate the following (44) and (45) to alternately update Q i and q i
Q i , k i = x i , t ^ p ( y , x i ) q ( , k i ) ( w i x i ) q ( t ^ y , w ) x i , t ^ , w i p ( y , x i ) q ( , k i ) ( w i x i ) q ( t ^ y , w )
q i ( l , k i + 1 ) = exp y p ( y x i ) log Q ( l , k i ) ( w i y ) λ w i , x i , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( , i ) ( t ^ y , w ) q i ( , i ) ( w i x i ) w i exp y p ( y x i ) log Q ( , k i ) ( w i y ) λ w i , x i , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( , i ) ( t ^ y , w ) q i ( , i ) ( w i x i )
until the Lagrangian L λ converges, and the associated q i is
q i ( , * ) ( w i x i ) = lim k i q i ( , k i ) ( w i x i ) = : q i ( + 1 , 1 ) .
Then, we update q ( t ^ w , y ) according to
q , i = α , i ( w ) , t ^ arg min t ^ T ^ J ( , i ) ( t ^ , w ) , 0 , otherwise
with i = i + 1 and = if i < M and i = 1 and = + 1 if i = M , and where α , i ( w ) is selected to guarantee
t ^ q ( , i ) ( t ^ w , y ) = 1
and
J ( , i ) ( t ^ , w ) = x , t , y d ( t , t ^ ) p ( t , x , y ) × q i ( , i ) ( w i x i ) q i ( , * ) ( w i x i ) .
Next, we repeat the process for the next user, i.e., ( i i + 1 if i < M and i 1 , + 1 if i = M ) .
Convergence analysis:
The algorithm employs an alternating minimization approach that produces Lagrangian values that are monotonically non-increasing and bounded below, thereby generating a convergent sequence of Lagrangians. Since the rate component of the Lagrangian is convex, and the sum of convex functions remains convex, the Lagrangian will also be convex if the expected distortion is a convex function. As a result, the proposed iterative optimization framework is capable of achieving the global minimum [19].
w , x , t , t ^ , y d ( t , t ^ ) p ( t , x , y ) q ( t ^ y , w ) i M q i ( w i x i )
We note that the expected distortion (50) includes a product of variables in the optimization process, it is not a linear function, and the Lagrangian may exhibit non-convex behavior. Even when the problem is non-convex, the authors in [20] demonstrate that the BA-based iterative algorithm initialized randomly and followed by selecting the minimum Lagrangian among all converged values can still provide highly effective information-theoretic inner bounds for the rate-distortion region, serving as a benchmark for practical quantization schemes.

6. Numerical Examples

In this section, we provide an example to illustrate the proposed iterative algorithms for computing the rate-distortion region of a distributed source coding problem with decoder-side information. As in the problem considered in this paper, distributed edge devices compress their observations { X 1 , , X M } and transmit them to a central server (CEO). The central server then aims to recover the indirect information T from the received data, utilizing side information Y. For the convenience of demonstration, we consider a simple case where M = 2 and the sources are binary, i.e., X 1 = X 2 = Y = { 0 , 1 } . The joint distributions, denoted by Q ( x 1 , y ) and Q ( x 2 , y ) , are given by
Q ( x 1 , y ) = ( 1 p 1 ) 2 δ x 1 , y + p 1 2 ( 1 δ x 1 , y ) , Q ( x 2 , y ) = ( 1 p 2 ) 2 δ x 2 , y + p 2 2 ( 1 δ x 2 , y ) ,
where the Kronecker delta function δ x , y equals 1 when x = y and 0 otherwise. We can consider Y as the input to two different binary symmetric channels (BSCs) with crossover probabilities p 1 , p 2 , respectively, where 0 p i 1 2 , i { 1 , 2 } . The corresponding outputs of these channels are X 1 and X 2 . In this example, we set p 1 = p 2 = 0.3 .
We also assume that the information of interest is directly the combination of the two distributed sources, i.e., T = { X 1 , X 2 } . The distortion measure is given by d ( t , t ^ ) = d ( x 1 , x ^ 1 ) + d ( x 2 , x ^ 2 ) , where
d ( x i , x ^ i ) = 0 , x i = x ^ i , 1 , x i x ^ i .
By applying the proposed iterative optimization framework, we can obtain the optimal transition probability distribution Q i ( w i y ) , q i ( w i x i ) and q ( t ^ w , y ) that meets a given distortion constraint D on d ( t , t ^ ) , and the corresponding minimum rate R i can be calculated by
R i = y , x i , w i p ( y , x i ) q i * ( w i x i ) log q i * ( w i x i ) Q i * ( w i y )
The contour plot of the rate-distortion region for this scenario is presented in Figure 2, while Figure 3 displays a surface plot depicting the rate-distortion region. We note that when M = 1 , the considered problem reduces to the traditional point-to-point Wyner–Ziv problem. In Figure 4, we compare the rate-distortion results computed using the proposed approach with the theoretical analysis by Wyner et al. [13]. We observe that the two rate-distortion function curves coincide, demonstrating the effectiveness of the proposed iterative approach for calculating rate distortion.

7. Conclusions

This paper explored a variant of the rate-distortion problem motivated by semantic communication and distributed learning systems, where correlated sources are independently encoded for a central decoder to reconstruct the indirect source of interest. In addition to receiving messages from the encoders, the decoder has access to correlated side information and aims to reconstruct the indirect source under a specified distortion constraint. We derived the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we introduced a distributed iterative optimization framework based on the Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. A numerical example has been provided to demonstrate the effectiveness of the proposed approach.

Author Contributions

Conceptualization, Q.Y.; Methodology, J.T.; Validation, J.T.; Formal analysis, J.T.; Investigation, Q.Y.; Resources, Q.Y.; Writing—original draft, J.T.; Writing—review & editing, Q.Y.; Visualization, J.T.; Project administration, Q.Y.; Funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partly supported by NSFC under grant No. 62293481, No. 62201505, partly by the SUTD-ZJU IDEA Grant (SUTD-ZJU (VP) 202102).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Here, we provide the rigorous proof of Theorem 1.
Lemma A1
(Extended Markov Lemma). Let
p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 , x 2 , x 3 , y .
For a fixed x 1 n , x 2 n , x 3 n y n A ϵ * n , w 1 n , w 2 n and w 3 n are drawn from p ( w 1 | x 1 ) , p ( w 2 | x 2 ) and p ( w 3 | x 3 ) , respectively. Then
lim n P r { w 1 n , w 2 n , w 3 n , x 1 n , x 2 n , x 3 n , y n A ϵ * n } = 1 .
Proof. 
According to (A1), we have several Markov chains W 1 X 1 ( X 2 , Y ) , W 2 X 2 ( X 1 , Y ) , W 1 ( X 1 , X 2 , Y ) W 2 . We use the Markov lemma three times: first to derive that ( ( x 2 n , y n ) , x 1 n , w 1 n ) are jointly typical and ( ( x 1 n , y n ) , x 2 n , w 2 n ) are jointly typical. Then, using the Markov chain W 1 ( X 1 , X 2 , Y ) W 2 , we can demonstrate that w 1 n , w 2 n , x 1 n , x 2 n , y n are joint typical. For a fixed x 1 n , x 2 n , x 3 n , y n A ϵ * n , W 1 n is drawn from p ( w 1 | x 1 ) , which implies that ( x 1 n , W 1 n ) A ϵ * n for sufficiently large n. W 2 n is drawn from p ( w 2 | x 2 ) , which implies that ( x 2 n , W 2 n ) A ϵ * n for sufficiently large n; W 3 n is drawn from p ( w 3 | x 3 ) , which implies that ( x 3 n , W 3 n ) A ϵ * n for sufficiently large n. Then, with high probability, W 1 n , W 2 n , W 3 n , x 1 n , x 2 n , x 3 n , y n A ϵ * n . □
Proof of Theorem 1. 
For m = 1 , 2 , 3 , fix p w m | x m and g ( W 1 , W 2 , W 3 , Y ) such that the distortion constraint E d ( T , T ^ ) D is satisfied. Calculate p w m = x m p x m p w m | x m . □
Generation of codebooks: Generate 2 n R m i.i.d codewords w m n s m i = 1 n p w m , i , and index them by s m 1 , 2 , , 2 n R m . Provide 2 n R m random bins with indices t m 1 , 2 , , 2 n R m . Randomly assign the codewords w m n s m to one of 2 n R m bins using a uniform distribution. Let B m ( t m ) denote the set of codeword indices s m assigned to bin index t m .
Encoding: Given a source sequence X m n , the encoder looks for a codeword W m n ( s m ) such that X m n , W m n s m A ϵ n . The encoder sends the index of the bin t m in which s m belongs.
Decoding: The decoder looks for a pair ( W 1 n ( s 1 ) , W 2 n ( s 2 ) , W 3 n ( s 3 ) ) such that s m B m ( t m ) and ( W 1 n ( s 1 ) , W 2 n ( s 2 ) , W 3 n ( s 3 ) , Y n ) A ϵ * n . If the decoder finds a unique triple ( s 1 , s 2 , s 3 ) , he then calculates T ^ n , where T ^ i = g ( W 1 , i , W 2 , i , W 3 , i , Y i ) .
Analysis of the probability of error:
1. The encoders cannot find the codewords W m n ( s m ) such that ( X m n , W m n ( s m ) ) A ϵ * n . The probability of this event is small if
R m > I X m , W m
2. The pair of sequences ( X 1 n , W 1 n ( s 1 ) ) A ϵ n , ( X 2 n , W 2 n ( s 2 ) ) A ϵ n and ( X 3 n , W 3 n ( s 3 ) ) A ϵ n but the codewords { W 1 n ( s 1 ) , W 2 n ( s 2 ) , W 3 n ( s 3 ) } are not jointly typical with the side information sequences Y n , i.e., ( W 1 n ( s 1 ) , W 2 n ( s 2 ) , W 3 n ( s 3 ) , Y n ) A ϵ * n . We have assumed that
p w 1 , w 2 , w 3 , x 1 , x 2 , x 3 , y = p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 , x 2 , x 3 , y .
Hence, by the Markov lemma, the probability of this event goes to zero if n is large enough.
3. There exists another s with the same bin index that is jointly typical with the side information sequences. The correct codeword indices are denoted by s 1 , s 2 and s 3 . We first consider the situation where the codeword index s 1 is in error. The probability that a randomly chosen W 1 n ( s 1 ) is jointly typical with ( W 2 n ( s 2 ) , W 3 n ( s 3 ) , Y n ) can be bounded as
Pr W 1 n s 1 , W 2 n s 2 , W 3 n s 3 , Y n A ϵ * n 2 n I W 1 ; W 2 , W 3 , Y 3 ϵ .
The probability of this error event is bounded by the number of codewords in the bin t 1 times the probability of joint typicality
Pr s 1 B 1 t 1 , s 1 s 1 : W 1 n s 1 , W 2 n s 2 , W 3 n s 3 , Y n A ϵ * n s 1 s 1 , s 1 B 1 t 1 Pr W 1 n s 1 , W 2 n s 2 , W 3 n s 3 , Y n A ϵ * n 2 n R 1 R 1 2 n I W 1 ; W 2 , W 3 , Y 3 ϵ .
Similarly, the probability that the codeword index s 2 or s 3 is in error can be bounded by
Pr s 2 B 2 t 2 , s 2 s 2 : W 1 n s 1 , W 2 n s 2 , W 3 n s 3 , Y n A ϵ * n 2 n R 2 R 2 2 n I W 2 ; W 1 , W 3 , Y 3 ϵ , Pr s 3 B 3 t 3 , s 3 s 3 : W 1 n s 1 , W 2 n s 2 , W 3 n s 3 , Y n A ϵ * n 2 n R 3 R 3 2 n I W 2 ; W 1 , W 3 , Y 3 ϵ .
We then consider the case that two of the three codeword indices are in error. The probability that the randomly chosen W 1 n ( s 1 ) and W 2 n ( s 2 ) are jointly typical with ( W 3 n ( s 3 ) , Y n ) can be bounded as
Pr W 1 n s 1 , W 2 n s 2 , W 2 n s 3 , Y n A ϵ * n = W 1 , W 2 , W 3 , Y A ϵ * n p w 1 n p w 2 n p w 3 n , y n 2 n I W 1 ; W 2 , W 3 , Y + I W 2 ; W 1 , W 3 , Y I W 1 ; W 2 | W 3 , Y 4 ϵ .
Hence, the error probability can be bounded as
Pr s 1 B 1 t 1 , s 1 s 1 , s 2 B 2 t 2 , s 2 s 2 : W 1 n s 1 , W 2 n s 2 , W 2 n s 3 , y n A ϵ * n 2 n ( R 1 R 1 + R 2 R 2 ) × 2 n I W 1 ; W 2 , W 3 , Y + I W 2 ; W 1 , W 3 , Y I W 1 ; W 2 | W 3 , Y 4 ϵ .
Similarly, we can obtain the probability that the codeword indices ( s 1 , s 3 ) or ( s 2 , s 3 ) are in error, which we omit here.
For the case where all the codeword indices s 1 , s 2 and s 3 are in error, the probability that the randomly chosen W 1 n ( s 1 ) , W 2 n ( s 2 ) and W 3 n ( s 3 ) are jointly typical with Y n can be bounded as
Pr W 1 n s 1 , W 2 n s 2 , W 2 n s 3 , Y n A ϵ n = W 1 , W 2 , W 3 , Y A ϵ n p w 1 n p w 2 n p w 3 n p Y n 2 n I W 1 ; W 2 , W 3 , Y + I W 2 ; W 1 , W 3 , Y × 2 n I W 3 ; W 1 , W 2 , Y I W 1 ; W 2 | W 3 , Y I W 1 , W 2 ; W 3 | Y 5 ϵ .
The probability of the above error events goes to 0 when
R 1 R 1 I W 1 ; W 2 , W 3 , Y
R 1 R 1 + R 2 R 2 + R 3 R 3 I W 1 ; W 2 , W 3 , Y + I W 2 ; W 1 , W 3 , Y + I W 3 ; W 1 , W 2 , Y I W 1 ; W 2 | W 3 , Y I W 1 , W 2 ; W 3 | Y .
Therefore, (2) can be obtained by combining (A3) and (A11).
If ( s 1 , s 2 , s 3 ) are correctly decoded, we have ( X 1 n , X 2 n , X 3 n , W 1 n ( s 1 ) , W 2 n ( s 2 ) , W 3 n ( s 3 ) , Y n ) A ϵ * n . Therefore, the empirical joint distribution is close to the distribution p w 1 | x 1 p w 2 | x 2 p w 3 | x 3 p x 1 , x 2 , x 3 , y that achieves distortion D.

Appendix B

Here, we provide the rigorous proof of Theorem 2. Define a series of encoders f m : X m n 1 , , 2 n R m and decoders g : m M 1 , , 2 n R m × Y n T ^ n that achieve the given distortion D. We can derive the following inequalities
n R 1 H ( f 1 ( X 1 n ) ) (A12a) H ( f 1 ( X 1 n ) | f 2 ( X 2 n ) , f 3 ( X 3 n ) , Y n ) H ( f 1 ( X 1 n ) | f 2 ( X 2 n ) , f 3 ( X 3 n ) , Y n ) H ( f 1 ( X 1 n ) | f 2 ( X 2 n ) , f 3 ( X 3 n ) , Y n , X 1 n , X 2 n , X 3 n ) (A12b) = I ( X 1 n , X 2 n , X 3 n ; f 1 ( X 1 n ) | f 2 ( X 2 n ) , f 3 ( X 3 n ) , Y n ) = i = 1 n I ( X 1 , i , X 2 , i , X 3 , i ; f 1 ( X 1 n ) | f 2 ( X 2 n ) , f 3 ( X 3 n ) , (A12c) Y n , X 1 , 1 i 1 , X 2 , 1 i 1 , X 3 , 1 i 1 ) = i = 1 n H ( X 1 , i , X 2 , i | f 2 ( X 2 n ) , Y n , X 1 , 1 i 1 , X 2 , 1 i 1 ) (A12d) H ( X 1 , i , X 2 , i | f 1 ( X 1 n ) , f 2 ( X 2 n ) , Y n , X 1 , 1 i 1 , X 2 , 1 i 1 ) (A12e) = i = 1 n H ( X 1 , i , X 2 , i | W 2 , i , Y i ) H ( X 1 , i , X 2 , i | W 1 , i , W 2 , i , Y i ) = i = 1 n I ( X 1 , i , X 2 , i ; W 1 , i | W 2 , i , Y i ) ,
where (A12a) follows from the fact that conditioning reduces entropy, (A12b) and (A12d) are obtained by the definition of conditional mutual information and (A12c) is the chain rule of mutual information. In (A12e), we let W 1 , i = f 1 ( X 1 n ) , X 1 , 1 i 1 , X 2 , 1 i 1 , Y 1 i 1 , Y i + 1 n , and W 2 , i = f 2 ( X 2 n ) , X 1 , 1 i 1 , X 2 , 1 i 1 , Y 1 i 1 , Y i + 1 n . Similarly, we have
n R 2 i = 1 n I ( X 1 , i , X 2 , i , X 3 , i ; W 2 , i | W 1 , i , W 3 , i , Y i ) , n R 3 i = 1 n I ( X 1 , i , X 2 , i , X 3 , i ; W 3 , i | W 2 , i , W 3 , i , Y i ) .
For the sum rate part, we have
n ( R 1 + R 2 ) H ( f 1 ( X 1 n ) , f 2 ( X 2 n ) ) (A14a) H ( f 1 ( X 1 n ) , f 2 ( X 2 n ) | Y n ) H ( f 1 ( X 1 n ) , f 2 ( X 2 n ) | Y n ) H ( f 1 ( X 1 n ) , f 2 ( X 2 n ) | Y n , X 1 n , X 2 n ) (A14b) = I ( f 1 ( X 1 n ) , f 2 ( X 2 n ) ; X 1 n , X 2 n | Y n ) (A14c) = i = 1 n I ( f 1 ( X 1 , i ) , f 2 ( X 1 , i ) ; X 1 n , X 2 n | Y n , X 1 i 1 , X 2 i 1 ) = i = 1 n H ( X 1 , i , X 2 , i | Y n , X 1 i 1 , X 2 i 1 ) (A14d) H ( X 1 , i , X 2 , i | f 1 ( X 1 n ) , f 2 ( X 2 n ) , X 1 , 1 i 1 , X 2 , 1 i 1 , Y n ) = i = 1 n H ( X 1 , i , X 2 , i | Y i ) (A14e) H ( X 1 , i , X 2 , i | f 1 ( X 1 n ) , f 2 ( X 2 n ) , X 1 , 1 i 1 , X 2 , 1 i 1 , Y n ) (A14f) = i = 1 n H ( X 1 , i , X 2 , i | Y i ) H ( X 1 , i , X 2 , i | W 1 , i , W 2 , i , Y i ) = i = 1 n I ( X 1 , i , X 2 , i ; W 1 , i , W 2 , i | Y i ) ,
where (A14a) follows from the fact that conditioning reduces entropy, (A14b) and (A14d) are obtained by the definition of conditional mutual information, (A14c) is the chain rule of mutual information and (A14e) follows from the fact that X 1 , i , X 2 , i is conditionally independent of the past and feature of X 1 , X 2 , Y given Y i . In (A14f), the definition of W 1 , i , W 2 , i is used again.

References

  1. Han, T.; Yang, Q.; Shi, Z.; He, S.; Zhang, Z. Semantic-preserved communication system for highly efficient speech transmission. IEEE J. Sel. Areas Commun. 2022, 41, 245–259. [Google Scholar] [CrossRef]
  2. Adikari, T.; Draper, S. Two-terminal source coding with common sum reconstruction. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1420–1424. [Google Scholar]
  3. Korner, J.; Marton, K. How to encode the modulo-two sum of binary sources (corresp.). IEEE Trans. Inf. Theory 1979, 25, 219–221. [Google Scholar] [CrossRef]
  4. Pastore, A.; Lim, S.H.; Feng, C.; Nazer, B.; Gastpar, M. Distributed Lossy Computation with Structured Codes: From Discrete to Continuous Sources. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1681–1686. [Google Scholar]
  5. Amiri, M.M.; Gunduz, D.; Kulkarni, S.R.; Poor, H.V. Federated Learning with Quantized Global Model Updates. arXiv 2020, arXiv:2006.10672. [Google Scholar] [CrossRef]
  6. Gruntkowska, K.; Tyurin, A.; Richtárik, P. Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity. arXiv 2024, arXiv:2402.06412. [Google Scholar] [CrossRef]
  7. Amiri, M.M.; Gündüz, D.; Kulkarni, S.R.; Poor, H.V. Convergence of Federated Learning Over a Noisy Downlink. IEEE Trans. Wirel. Commun. 2022, 21, 1422–1437. [Google Scholar] [CrossRef]
  8. Stavrou, P.A.; Kountouris, M. The Role of Fidelity in Goal-Oriented Semantic Communication: A Rate Distortion Approach. IEEE Trans. Commun. 2023, 71, 3918–3931. [Google Scholar] [CrossRef]
  9. Liu, J.; Zhang, W.; Poor, H.V. A rate-distortion framework for characterizing semantic information. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Virtual, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2894–2899. [Google Scholar]
  10. Dobrushin, R.; Tsybakov, B. Information transmission with additional noise. IRE Trans. Inf. Theory 1962, 8, 293–304. [Google Scholar] [CrossRef]
  11. Wolf, J.; Ziv, J. Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Trans. Inf. Theory 1970, 16, 406–411. [Google Scholar] [CrossRef]
  12. Guo, T.; Wang, Y.; Han, J.; Wu, H.; Bai, B.; Han, W. Semantic compression with side information: A rate-distortion perspective. arXiv 2022, arXiv:2208.06094. [Google Scholar] [CrossRef]
  13. Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
  14. Slepian, D.; Wolf, J. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
  15. Wagner, A.B.; Anantharam, V. An improved outer bound for multiterminal source coding. IEEE Trans. Inf. Theory 2008, 54, 1919–1937. [Google Scholar] [CrossRef]
  16. Lim, S.H.; Feng, C.; Pastore, A.; Nazer, B.; Gastpar, M. Towards an algebraic network information theory: Distributed lossy computation of linear functions. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1827–1831. [Google Scholar]
  17. Cheng, S.; Stankovic, V.; Xiong, Z. Computing the channel capacity and rate-distortion function with two-sided state information. IEEE Trans. Inf. Theory 2005, 51, 4418–4425. [Google Scholar] [CrossRef]
  18. Gastpar, M. The Wyner-Ziv problem with multiple sources. IEEE Trans. Inf. Theory 2004, 50, 2762–2768. [Google Scholar] [CrossRef]
  19. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
  20. Ku, G.; Ren, J.; Walsh, J.M. Computing the rate distortion region for the CEO problem with independent sources. IEEE Trans. Signal Process. 2014, 63, 567–575. [Google Scholar] [CrossRef]
Figure 1. Distributed remote compression of a latent variable with M correlated sources at distributed transmitters and side information at the receiver.
Figure 1. Distributed remote compression of a latent variable with M correlated sources at distributed transmitters and side information at the receiver.
Entropy 27 00844 g001
Figure 2. Contour plot of the rate distortion region with two distributed binary sources { X 1 , X 2 } , where the labels on the contours represent the distortion values D on d ( t , t ^ ) .
Figure 2. Contour plot of the rate distortion region with two distributed binary sources { X 1 , X 2 } , where the labels on the contours represent the distortion values D on d ( t , t ^ ) .
Entropy 27 00844 g002
Figure 3. Surface plot of the rate-distortion region.
Figure 3. Surface plot of the rate-distortion region.
Entropy 27 00844 g003
Figure 4. The rate-distortion function for the case when M = 1 , i.e., the Wyner–Ziv problem.
Figure 4. The rate-distortion function for the case when M = 1 , i.e., the Wyner–Ziv problem.
Entropy 27 00844 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, J.; Yang, Q. Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy 2025, 27, 844. https://doi.org/10.3390/e27080844

AMA Style

Tang J, Yang Q. Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy. 2025; 27(8):844. https://doi.org/10.3390/e27080844

Chicago/Turabian Style

Tang, Jiancheng, and Qianqian Yang. 2025. "Rate-Distortion Analysis of Distributed Indirect Source Coding" Entropy 27, no. 8: 844. https://doi.org/10.3390/e27080844

APA Style

Tang, J., & Yang, Q. (2025). Rate-Distortion Analysis of Distributed Indirect Source Coding. Entropy, 27(8), 844. https://doi.org/10.3390/e27080844

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop