4.1. Proof of Theorem 1
Converse: Assume that the pair 
 is 
D-achievable. For 
 with the convention 
, define the RV’s
        
 The rate 
R is lower bounded as follows
        
Here,  follows since  is memoryless, and  follows since  forms a Markov chain.
We may now lower bound 
 as follows
        
        where 
 follows since 
 is a deterministic function of 
.
The sum-rate can be lower bounded as follows
        
        where 
 follows by equality (
37) and inequality (
38).
Draw 
J uniformly from 
 independently of 
, and define the RV’s 
, 
, 
, 
, and 
. Using 
J, we may express (
37) as follows
        
        and we may express (
38) as follows
        
With regard to the expected distortion, we may write
        
Step 
 is justified as follows: Since 
 are deterministic functions of 
        is a Markov chain and, given 
, the tuple 
 is independent of 
. As a consequence of that, Lemma 1 in ([
12] Section II.B) guarantees the existence of a reconstruction 
 which dominates 
 in the sense that
        
This observation interpreted as the “data processing inequality” for estimation has already been made in ([
12] Lemma 1).
By (
1) and the memoryless property of the sequence 
 one can verify the Markov relation 
 which implies the Markov relation 
. Similarly, the definitions of 
 and 
 and the memoryless property of 
 imply that, conditioned on 
, 
 forms a Markov chain.
Thus, conditioned on 
U, 
 forms a Markov chain, hence
        
        where 
 denotes the conditional law induced by the marginal law 
. The combination of (
40)–(
42) and (
44) together with the latter Markov relations establish the converse.
We shall now obtain an alternative characterization for the lower bound (
38). For a law
        
        and its induced conditional law 
, let
        
        and let 
 denote the lower convex envelope of 
.
Define
        
        then by ([
13] Section III.C, Lemma 1) 
 is convex.
The integrand on the RHS of (
46) is the entropy of the scalar quantizer of 
 conditioned on 
 where 
 is defined in (36c). Now, conditioned on 
, Lemma 1 in ([
12] Section II.B) ensures that
        
Consequently, we may lower bound the RHS of (
46) as follows
        
        where in 
 we used the definition of 
Q and in 
 its convexity. The lower bound (
47) may be interpreted as follows. Fix 
 and consider, for each 
, time shraing of at most two scalar quantizers for the “source” 
 attaining a distortion level 
. The optimal helper time shares side-information-dependent scalar quantizers of 
 (at most two per each side-information symbol 
), while the reconstruction at the decoder is a function of 
.
Direct: To establish the achievability of 
, consider the codebook construction as follows. The codebook 
 is obtained by drawing the 
n-length sequences 
 independently of 
. (For the definition of 
—the set of 
-strongly typical 
n-sequences corresponding to a marginal law 
 and a few properties of these sequences see [
14,
15,
16]).
 Given the source sequence x,  is defined as follows.
        
- If  the encoder searches for the first sequence  in  such that (s.t.)  and sets . 
- If , or  s.t. , an encoding error is declared. 
Given , the helper forms the sequence of descriptions  that is sent causally to the decoder.
Decoding: Given  as well as the sub-sequence , the decoder forms the reconstruction sequence .
Given that 
 are jointly typical since 
 is memoryless, the Markov lemma guarantees that, for large 
n, with high probability 
 are also jointly typical. Since 
 forms a Markov chain, by the Markov lemma, with high probability, 
 as well as 
 are jointly typical. Thus, with high probability, 
 are jointly typical, hence the distortion constraint (
10) is fulfilled for large 
n. That the sequence 
 can be described at a conditional entropy rate satisfying (12b)–(
47) can be established along similar lines as the proof of the direct part of Theorem 2 in ([
13] Section III.C). Finally, standard error probability analysis verifies that, with high probability, 
 are jointly typical as long as (
12a) holds.
  4.2. Proof of Lemma 1
If 
, then
        
        where 
 and 
 are independent and the pair 
 is equal in distribution to the pair 
. Thus, the unitary transformation 
 preserves the Gaussian nature of the channel and factors according to (see (
53) ahead)
        
To show (
22a), consider the sequence of identities
        
Moreover, to show (22b), consider the sequence of identities
        
Starting with (
50), consider the difference
        
        under a law of the form
        
        i.e., that 
 forms a Markov chain (see also ([
9] Section VI.A, Lemma 4)).
We distinguish between the two cases:
- (1)
- In case that  - , the non-negativity of mutual information implies that the expression ( 52- ) is non-negative, hence the inequalities ( 63- ) ahead hold for any  - . 
- (2)
- In case that  - , we prove first that for the set of laws which are feasible for the optimization problem ( 20- ), the expression ( 52- ) is strictly positive. 
Observe that with the choice of 
 and 
 where 
 and 
 are two independent copies of 
 such that 
 and 
 i.e., 
, we have
        
Thus, the unitary transformation 
 picks a pair of independent copies of a law 
 and “creates” a pair of laws, 
 and 
, which factor jointly according to (
53) with
        
        hence are symmetric w.r.t. the inputs 
 and 
. We shall define the latter set of laws by 
 and, as shown above, 
.
Remark 2. Suppose that  with  then, for both laws  and , we have  and . Since the image of the mapis a convex set, standard dimensionality reduction argument can be used to establish the existence of a law  where  is supported on a finite set and achieves any point in the image of the map (55) (See ([9] Section IV, Remark 2)).  By rate-distortion theory, the constraint 
 with both 
U and 
V non-void implies that
        
Now, for any 
 as per (
49), conditioned on 
, the random variable 
V is dependent on 
 i.e. 
. As a consequence of that, since the mutual information 
 is strictly positive and, conditioned on 
, 
 forms a Markov chain
        
Since a law of the form (
53) dictates
        
        the combination of (
56)–(
58) yields that
        
Thus, the conditional mutual information  is non-increasing under the conditioning on .
By the symmetry of the pair of laws 
 and 
 induced by 
, conditioned on 
, the random variable 
V is dependent on 
, hence
        
However, with the latter factorization, if indeed 
, i.e.,
        
        then, since the conditional mutual information 
 is non-increasing under the conditioning on 
, it follows that 
, which is in contradiction with (
59). Consequently, for 
, 
, thus establishing that the expression  (
52) is positive.
Consequently, there exists some 
, such that for any 
The combination of (
50) and (
62) implies that, for any 
,
        
Here, 
 follows by (
50) and (
62), and 
 is true, since the set of laws over which the infimum on the RHS of (
63) is evaluated is not empty. Indeed, the choice of 
 and 
 where 
 and 
 are two independent copies of 
 such that 
 and 
 –i.e., 
 (hence it is feasible for 
 in (
19)), satisfies (
54), hence it belongs to the feasible set.
On the other hand, since both mappings 
 and 
 are invertible, then with 
Here,  follows since, conditioned on ,  and  are independent, and  follows since, conditioned on ,  and  are independent.
When 
 and 
 where 
 and 
 are two independent copies of 
, such that 
 and 
, then 
 forms a Markov chain. Thus, 
 and the RHS of (
64) becomes 
 thus establishing that,
        
        where the inequality follows, since the infimum on the LHS is taken over a larger set than that on the RHS. The combination of (
63) and (
65) establishes (
22a) for 
.
Now, returning to (
51), consider the difference
        
        under the law (
53) i.e., that 
 forms a Markov chain. By rate-distortion theory, the constraint 
 with both 
U and 
V non-void implies that
        
Moreover, an argument similar to that leading to the conclusion that (
52) is non-negative establishes that (
66) is non-negative.
Consequently, there exist some 
 such that for any 
In addition, the combination of (
51) and (
68) implies that, for any 
,
        
The combination of (
69) and (
65) establishes (22b) for 
.