Open Access
This article is

- freely available
- re-usable

*Information*
**2016**,
*7*(1),
15;
doi:10.3390/info7010015

Article

Information Extraction Under Privacy Constraints †

Department of Mathematics and Statistics, Queen’s University, Kingston, Canada

^{†}

Parts of the results in this paper were presented at the 52nd Allerton Conference on Communications, Control and Computing [1] and the 14th Canadian Workshop on Information Theory [2].

^{*}

Author to whom correspondence should be addressed.

Academic Editors:
Mikael Skoglund,
Lars K. Rasmussen
and
Tobias Oechtering

Received: 1 November 2015 / Accepted: 3 March 2016 / Published: 10 March 2016

## Abstract

**:**

A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables $(X,Y)$ governed by a given joint distribution, an agent observes Y and wants to convey to a potentially public user as much information about Y as possible while limiting the amount of information revealed about X. To this end, the so-called rate-privacy function is investigated to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from Y under a privacy constraint between X and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and its information-theoretic and estimation-theoretic interpretations are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of $(X,Y)$. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where $(X,Y)$ has a joint probability density function by studying the problem where the extracted information is a uniform quantization of Y corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.

Keywords:

data privacy; equivocation; rate-privacy function; information theory; minimum mean-squared error estimation; additive channels; mutual information; maximal correlation## 1. Introduction

With the emergence of user-customized services, there is an increasing desire to balance between the need to share data and the need to protect sensitive and private information. For example, individuals who join a social network are asked to provide information about themselves which might compromise their privacy. However, they agree to do so, to some extent, in order to benefit from the customized services such as recommendations and personalized searches. As another example, a participatory technology for estimating road traffic requires each individual to provide her start and destination points as well as the travel time. However, most participating individuals prefer to provide somewhat distorted or false information to protect their privacy. Furthermore, suppose a software company wants to gather statistical information on how people use its software. Since many users might have used the software to handle some personal or sensitive information -for example, a browser for anonymous web surfing or a financial management software- they may not want to share their data with the company. On the other hand, the company cannot legally collect the raw data either, so it needs to entice its users. In all these situations, a tradeoff in a conflict between utility advantage and privacy breach is required and the question is how to achieve this tradeoff. For example, how can a company collect high-quality aggregate information about users while strongly guaranteeing to its users that it is not storing user-specific information?

To deal with such privacy considerations, Warner [3] proposed the randomized response model in which each individual user randomizes her own data using a local randomizer (i.e., a noisy channel) before sharing the data to an untrusted data collector to be aggregated. As opposed to conditional security, see, e.g., [4,5,6], the randomized response model assumes that the adversary can have unlimited computational power and thus it provides unconditional privacy. This model, in which the control of private data remains in the users’ hands, has been extensively studied since Warner. As a special case of the randomized response model, Duchi et al. [7], inspired by the well-known privacy guarantee called differential privacy introduced by Dwork et al. [8,9,10], introduced locally differential privacy (LDP). Given a random variable $X\in \mathcal{X}$, another random variable $Z\in \mathcal{Z}$ is said to be the ε-LDP version of X if there exists a channel $Q:X\to Z$ such that $\frac{Q\left(B\right|x)}{Q\left(B\right|{x}^{\prime})}\le exp\left(\epsilon \right)$ for all measurable $B\subset \mathcal{Z}$ and all $x,{x}^{\prime}\in \mathcal{X}$. The channel Q is then called as the ε-LDP mechanism. Using Jensen’s inequality, it is straightforward to see that any ε-LDP mechanism leaks at most ε bits of private information, i.e., the mutual information between X and Z satisfies $I(X,Z)\le \epsilon $.

There have been numerous studies on the tradeoff between privacy and utility for different examples of randomized response models with different choices of utility and privacy measures. For instance, Duchi et al. [7] studied the optimal ε-LDP mechanism $\mathcal{M}:X\to Z$ which minimizes the risk of estimation of a parameter θ related to ${P}_{X}$. Kairouz et al. [11] studied an optimal ε-LDP mechanism in the sense of mutual information, where an individual would like to release an ε-LDP version Z of X that preserves as much information about X as possible. Calmon et al. [12] proposed a novel privacy measure (which includes maximal correlation and chi-square correlation) between X and Z and studied the optimal privacy mechanism (according to their privacy measure) which minimizes the error probability $Pr(\widehat{X}\left(Z\right)\ne X)$ for any estimator $\widehat{X}:Z\to X$.

In all above examples of randomized response models, given a private source, denoted by X, the mechanism generates Z which can be publicly displayed without breaching the desired privacy level. However, in a more realistic model of privacy, we can assume that for any given private data X, nature generates Y, via a fixed channel ${P}_{Y|X}$. Now we aim to release a public display Z of Y such that the amount of information in Y is preserved as much as possible while Z satisfies a privacy constraint with respect to X. Consider two communicating agents Alice and Bob. Alice collects all her measurements from an observation into a random variable Y and ultimately wants to reveal this information to Bob in order to receive a payoff. However, she is worried about her private data, represented by X, which is correlated with Y. For instance, X might represent her precise location and Y represents measurement of traffic load of a route she has taken. She wants to reveal these measurements to an online road monitoring system to received some utility. However, she does not want to reveal too much information about her exact location. In such situations, the utility is measured with respect to Y and privacy is measured with respect to X. The question raised in this situation then concerns the maximum payoff Alice can get from Bob (by revealing Z to him) without compromising her privacy. Hence, it is of interest to characterize such competing objectives in the form of a quantitative tradeoff. Such a characterization provides a controllable balance between utility and privacy.

This model of privacy first appears in Yamamoto’s work [13] in which the rate-distortion-equivocation function is defined as the tradeoff between a distortion-based utility and privacy. Recently, Sankar et al. [14], using the quantize-and-bin scheme [15], generalized Yamamoto’s model to study privacy in databases from an information-theoretic point of view. Calmon and Fawaz [16] and Monedero et al. [17] also independently used distortion and mutual information for utility and privacy, respectively, to define a privacy-distortion function which resembles the classical rate-distortion function. More recently, Makhdoumi et al. [18] proposed to use mutual information for both utility and privacy measures and defined the privacy funnel as the corresponding privacy-utility tradeoff, given by
where $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$ denotes that $X,Y$ and Z form a Markov chain in this order. Leveraging well-known algorithms for the information bottleneck problem [19], they provided a locally optimal greedy algorithm to evaluate ${t}_{R}(X;Y)$. Asoodeh et al. [1], independently, defined the rate-privacy function, ${g}_{\epsilon}(X;Y)$, as the maximum achievable $I(Y;Z)$ such that Z satisfies $I(X;Z)\le \epsilon $, which is a dual representation of the privacy funnel (1), and showed that for discrete X and Y, ${g}_{0}(X;Y)>0$ if and only if X is weakly independent of Y (cf, Definition 9). Recently, Calmon et al. [20] proved an equivalent result for ${t}_{R}(X;Y)$ using a different approach. They also obtained lower and upper bounds for ${t}_{R}(X;Y)$ which can be easily translated to bounds for ${g}_{\epsilon}(X;Y)$ (cf. Lemma 1). In this paper, we develop further properties of ${g}_{\epsilon}(X;Y)$ and also determine necessary and sufficient conditions on ${P}_{XY}$, satisfying some symmetry conditions, for ${g}_{\epsilon}(X;Y)$ to achieve its upper and lower bounds.

$${t}_{R}(X;Y):=\underset{\begin{array}{c}{P}_{Z|Y}:X\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Y\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Z\\ I(Y;Z)\ge R\end{array}}{min}I(X;Z)$$

The problem treated in this paper can also be contrasted with the better-studied concept of secrecy following the pioneering work of Wyner [21]. While in secrecy problems the aim is to keep information secret only from wiretappers, in privacy problems the aim is to keep the private information (not necessarily all the information) secret from everyone including the intended receiver.

#### 1.1. Our Model and Main Contributions

Using mutual information as measure of both utility and privacy, we formulate the corresponding privacy-utility tradeoff for discrete random variables X and Y via the rate-privacy function, ${g}_{\epsilon}(X;Y)$, in which the mutual information between Y and displayed data (i.e., the mechanism’s output), Z, is maximized over all channels ${P}_{Z|Y}$ such that the mutual information between Z and X is no larger than a given ε. We also formulate a similar rate-privacy function ${\widehat{g}}_{\epsilon}(X;Y)$ where the privacy is measured in terms of the squared maximal correlation, ${\rho}_{m}^{2}$, between, X and Z. In studying ${g}_{\epsilon}(X;Y)$ and ${\widehat{g}}_{\epsilon}(X;Y)$, any channel $Q:Y\to Z$ that satisfies $I(X;Z)\le \epsilon $ and ${\rho}_{m}^{2}(X;Z)\le \epsilon $, preserves the desired level of privacy and is hence called a privacy filter. Interpreting $I(Y;Z)$ as the number of bits that a privacy filter can reveal about Y without compromising privacy, we present the rate-privacy function as a formulation of the problem of maximal privacy-constrained information extraction from Y.

We remark that using maximal correlation as a privacy measure is by no means new as it appears in other works, see, e.g., [22,23] and [12] for different utility functions. We do not put any likelihood constraints on the privacy filters as opposed to the definition of LDP. In fact, the optimal privacy filters that we obtain in this work induce channels ${P}_{Z|X}$ that do not satisfy the LDP property.

The quantity ${g}_{\epsilon}(X;Y)$ is related to a notion of the reverse strong data processing inequality as follows. Given a joint distribution ${P}_{XY}$, the strong data processing coefficient was introduced in [24,25], as the smallest $s(X;Y)\le 1$ such that $I(X;Z)\le s(X;Y)I(Y;Z)$ for all ${P}_{Z|Y}$ satisfying the Markov condition $X\phantom{\rule{-2.84544pt}{0ex}}--\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$. In the rate-privacy function, we instead seek an upper bound on the maximum achievable rate at which Y can display information, $I(Y;Z)$, while meeting the privacy constraint $I(X;Z)\le \epsilon $. The connection between the rate-privacy function and the strong data processing inequality is further studied in [20] to mirror all the results of [25] in the context of privacy.

The contributions of this work are as follows:

- We study lower and upper bounds of ${g}_{\epsilon}(X;Y)$. The lower bound, in particular, establishes a multiplicative bound on $I(Y;Z)$ for any optimal privacy filter. Specifically, we show that for a given $(X,Y)$ and $\epsilon >0$ there exists a channel $Q:Y\to Z$ such that $I(X;Z)\le \epsilon $ and$$I(Y;Z)\ge \lambda (X;Y)\epsilon $$
- We propose an information-theoretic setting in which ${g}_{\epsilon}(X;Y)$ appears as a natural upper-bound for the achievable rate in the so-called "dependence dilution" coding problem. Specifically, we examine the joint-encoder version of an amplification-masking tradeoff, a setting recently introduced by Courtade [26] and we show that the dual of ${g}_{\epsilon}(X;Y)$ upper bounds the masking rate. We also present an estimation-theoretic motivation for the privacy measure ${\rho}_{m}^{2}(X;Z)\le \epsilon $. In fact, by imposing ${\rho}_{m}^{2}(X;Y)\le \epsilon $, we require that an adversary who observes Z cannot efficiently estimate $f\left(X\right)$, for any function f. This is reminiscent of semantic security [27] in the cryptography community. An encryption mechanism is said to be semantically secure if the adversary’s advantage for correctly guessing any function of the privata data given an observation of the mechanism’s output (i.e., the ciphertext) is required to be negligible. This, in fact, justifies the use of maximal correlation as a measure of privacy. The use of mutual information as privacy measure can also be justified using Fano’s inequality. Note that $I(X;Z)\le \epsilon $ can be shown to imply that $Pr(\widehat{X}\left(Z\right)\ne X)\ge \frac{H\left(X\right)-1-\epsilon}{log\left(\right|\mathcal{X}\left|\right)}$ and hence the probability of adversary correctly guessing X is lower-bounded.
- We also study the rate of increase ${g}_{0}^{\prime}(X;Y)$ of ${g}_{\epsilon}(X;Y)$ at $\epsilon =0$ and show that this rate can characterize the behavior of ${g}_{\epsilon}(X;Y)$ for any $\epsilon \ge 0$ provided that ${g}_{0}(X;Y)=0$. This again has connections with the results of [25]. Letting$$\Gamma \left(R\right):=\underset{\genfrac{}{}{0pt}{}{{P}_{Z|Y}:X\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Y\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Z}{I(Y;Z)\le R}}{max}I(X;Z)$$
- Finally, we generalize the rate-privacy function to the continuous case where X and Y are both continuous and show that some of the properties of ${g}_{\epsilon}(X;Y)$ in the discrete case do not carry over to the continuous case. In particular, we assume that the privacy filter belongs to a family of additive noise channels followed by an M-level uniform scalar quantizer and give asymptotic bounds as $M\to \infty $ for the rate-privacy function.

#### 1.2. Organization

The rest of the paper is organized as follows. In Section 2, we define and study the rate-privacy function for discrete random variables for two different privacy measures, which, respectively, lead to the information-theoretic and estimation-theoretic interpretations of the rate-privacy function. In Section 3, we provide such interpretations for the rate-privacy function in terms of quantities from information and estimation theory. Having obtained lower and upper bounds of the rate-privacy function, in Section 4 we determine the conditions on ${P}_{XY}$ such that these bounds are tight. The rate-privacy function is then generalized and studied in Section 5 for continuous random variables.

## 2. Utility-Privacy Measures: Definitions and Properties

Consider two random variables X and Y, defined over finite alphabets $\mathcal{X}$ and $\mathcal{Y}$, respectively, with a fixed joint distribution ${P}_{XY}$. Let X represent the private data and let Y be the observable data, correlated with X and generated by the channel ${P}_{Y|X}$ predefined by nature, which we call the observation channel. Suppose there exists a channel ${P}_{Z|Y}$ such that Z, the displayed data made available to public users, has limited dependence with X. Such a channel is called the privacy filter. This setup is shown in Figure 1. The objective is then to find a privacy filter which gives rise to the highest dependence between Y and Z. To make this goal precise, one needs to specify a measure for both utility (dependence between Y and Z) and also privacy (dependence between X and Z).

#### 2.1. Mutual Information as Privacy Measure

Adopting mutual information as a measure of both privacy and utility, we are interested in characterizing the following quantity, which we call the rate-privacy function (since mutual information is adopted for utility, the privacy-utility tradeoff characterizes the optimal rate for a given privacy level, where rate indicates the precision of the displayed data Z with respect to the observable data Y for a privacy filter, which suggests the name),
where $(X,Y)$ has fixed distribution ${P}_{XY}=P$ and
(here $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$ means that $X,Y,$ and Z form a Markov chain in this order). Equivalently, we call ${g}_{\epsilon}(X;Y)$ the privacy-constrained information extraction function, as Z can be thought of as the extracted information from Y under privacy constraint $I(X;Z)\le \epsilon $.

$${g}_{\epsilon}(X;Y):=\underset{{P}_{Z|Y}\in {\mathcal{D}}_{\epsilon}\left(P\right)}{sup}I(Y;Z)$$

$${\mathcal{D}}_{\epsilon}\left(P\right):=\{{P}_{Z|Y}:\phantom{\rule{3.33333pt}{0ex}}X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z,\phantom{\rule{3.33333pt}{0ex}}I(X;Z)\le \epsilon \}$$

Note that since $I(Y;Z)$ is a convex function of ${P}_{Z|Y}$ and furthermore the constraint set ${\mathcal{D}}_{\epsilon}\left(P\right)$ is convex, [28, Theorem 32.2] implies that we can restrict ${\mathcal{D}}_{\epsilon}\left(P\right)$ in (3) to $\{{P}_{Z|Y}:\phantom{\rule{3.33333pt}{0ex}}X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z,\phantom{\rule{3.33333pt}{0ex}}I(X;Z)=\epsilon \}$ whenever $\epsilon \le I(X;Y)$ . Note also that since for finite $\mathcal{X}$ and $\mathcal{Y}$, ${P}_{Z|Y}\to I(Y;Z)$ is a continuous map, therefore ${\mathcal{D}}_{\epsilon}\left(P\right)$ is compact and the supremum in (3) is indeed a maximum. In this case, using the Support Lemma [29], one can readily show that it suffices that the random variable Z is supported on an alphabet $\mathcal{Z}$ with cardinality $\left|\mathcal{Z}\right|\le \left|\mathcal{Y}\right|+1$. Note further that by the Markov condition $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, we can always restrict $\epsilon \ge 0$ to only $0\le \epsilon <I(X;Y)$, because $I(X;Z)\le I(X;Y)$ and hence for $\epsilon \ge I(X;Y)$ the privacy constraint is removed and thus by setting $Z=Y$, we obtain ${g}_{\epsilon}(X;Y)=H\left(Y\right)$.

As mentioned earlier, a dual representation of ${g}_{\epsilon}(X;Y)$, the so called privacy funnel, is introduced in [18,20], defined in (1), as the least information leakage about X such that the communication rate is greater than a positive constant; $I(Y;Z)\ge R$ for some $R>0$. Note that if ${t}_{R}(X;Y)=\epsilon $ then ${g}_{\epsilon}(X;Y)=R$.

Given ${\epsilon}_{1}<{\epsilon}_{2}$ and a joint distribution $P={P}_{X}\times {P}_{Y|X}$, we have ${\mathcal{D}}_{{\epsilon}_{1}}\left(P\right)\subset {\mathcal{D}}_{{\epsilon}_{2}}\left(P\right)$ and hence $\epsilon \to {g}_{\epsilon}(X;Y)$ is non-decreasing, i.e., ${g}_{{\epsilon}_{1}}(X;Y)\le {g}_{{\epsilon}_{2}}(X;Y)$. Using a similar technique as in [30, Lemma 1], Calmon et al. [20] showed that the mapping $R\mapsto \frac{{t}_{R}(X;Y)}{R}$ is non-decreasing for $R>0$. This, in fact, implies that $\epsilon \mapsto \frac{{g}_{\epsilon}(X;Y)}{\epsilon}$ is non-increasing for $\epsilon >0$. This observation leads to a lower bound for the rate privacy function ${g}_{\epsilon}(X;Y)$ as described in the following lemma.

**Lemma 1**

([20])
for $\epsilon \in (0,I(X;Y\left)\right)$.

**.**For a given joint distribution P defined over $\mathcal{X}\times \mathcal{Y}$, the mapping $\epsilon \mapsto \frac{{g}_{\epsilon}(X;Y)}{\epsilon}$ is non-increasing on $\epsilon \in (0,\infty )$ and ${g}_{\epsilon}(X;Y)$ lies between two straight lines as follows:
$$\epsilon \frac{H\left(Y\right)}{I(X;Y)}\le {g}_{\epsilon}(X;Y)\le H\left(Y\right|X)+\epsilon $$

Using a simple calculation, the lower bound in (4) can be shown to be achieved by the privacy filter depicted in Figure 2 with the erasure probability
In light of Lemma 1, the possible range of the map $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is as depicted in Figure 3.

$$\delta =1-\frac{\epsilon}{I(X;Y)}$$

We next show that $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is concave and continuous.

**Lemma 2.**

For any given pair of random variables $(X,Y)$ over $\mathcal{X}\times \mathcal{Y}$, the mapping $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is concave for $\epsilon \ge 0$.

**Proof.**

It suffices to show that for any $0\le {\epsilon}_{1}<{\epsilon}_{2}<{\epsilon}_{3}\le I(X;Y)$, we have
which, in turn, is equivalent to

$$\frac{{g}_{{\epsilon}_{3}}(X;Y)-{g}_{{\epsilon}_{1}}(X;Y)}{{\epsilon}_{3}-{\epsilon}_{1}}\le \frac{{g}_{{\epsilon}_{2}}(X;Y)-{g}_{{\epsilon}_{1}}(X;Y)}{{\epsilon}_{2}-{\epsilon}_{1}}$$

$$\left(\frac{{\epsilon}_{2}-{\epsilon}_{1}}{{\epsilon}_{3}-{\epsilon}_{1}}\right){g}_{{\epsilon}_{3}}(X;Y)+\left(\frac{{\epsilon}_{3}-{\epsilon}_{2}}{{\epsilon}_{3}-{\epsilon}_{1}}\right){g}_{{\epsilon}_{1}}(X;Y)\le {g}_{{\epsilon}_{2}}(X;Y)$$

Let ${P}_{{Z}_{1}|Y}:Y\to {Z}_{1}$ and ${P}_{{Z}_{3}|Y}:Y\to {Z}_{3}$ be two optimal privacy filters in ${\mathcal{D}}_{{\epsilon}_{1}}\left(P\right)$ and ${\mathcal{D}}_{{\epsilon}_{3}}\left(P\right)$ with disjoint output alphabets ${\mathcal{Z}}_{1}$ and ${\mathcal{Z}}_{3}$, respectively.

We introduce an auxiliary binary random variable $U\sim \mathsf{Bernoulli}\left(\lambda \right)$, independent of $(X,Y)$, where $\lambda :=\frac{{\epsilon}_{2}-{\epsilon}_{1}}{{\epsilon}_{3}-{\epsilon}_{1}}$ and define the following random privacy filter ${P}_{{Z}_{\lambda}|Y}$: We pick ${P}_{{Z}_{3}|Y}$ if $U=1$ and ${P}_{{Z}_{1}|Y}$ if $U=0$, and let ${Z}_{\lambda}$ be the output of this random channel which takes values in ${\mathcal{Z}}_{1}\cup {\mathcal{Z}}_{3}$. Note that $(X,Y)\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}U$. Then we have
which implies that ${P}_{{Z}_{\lambda}|Y}\in {\mathcal{D}}_{{\epsilon}_{2}}\left(P\right)$. On the other hand, we have
which, according to (7), completes the proof. ☐

$$\begin{array}{ccc}\hfill I(X;{Z}_{\lambda})& =& I(X;{Z}_{\lambda},U)=I(X;{Z}_{\lambda}|U)=\lambda I(X;{Z}_{3})+(1-\lambda )I(X;{Z}_{1}),\hfill \\ & \le & {\epsilon}_{2}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {g}_{{\epsilon}_{2}}(X;Y)\ge I(Y;{Z}_{\lambda})& =& I(Y;{Z}_{\lambda},U)=I(Y;{Z}_{\lambda}|U)=\lambda I(Y;{Z}_{3})+(1-\lambda )I(Y;{Z}_{1})\hfill \\ & =& \left(\frac{{\epsilon}_{2}-{\epsilon}_{1}}{{\epsilon}_{3}-{\epsilon}_{1}}\right){g}_{{\epsilon}_{3}}(X;Y)+\left(\frac{{\epsilon}_{3}-{\epsilon}_{2}}{{\epsilon}_{3}-{\epsilon}_{1}}\right){g}_{{\epsilon}_{1}}(X;Y)\hfill \end{array}$$

**Remark 1.**

By the concavity of $\epsilon \mapsto {g}_{\epsilon}(X;Y)$, we can show that ${g}_{\epsilon}(X;Y)$ is a strictly increasing function of $\epsilon \le I(X;Y)$. To see this, assume there exists ${\epsilon}_{1}<{\epsilon}_{2}\le I(X;Y)$ such that ${g}_{{\epsilon}_{1}}(X;Y)={g}_{{\epsilon}_{2}}(X;Y)$. Since $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is concave, then it follows that for all $\epsilon \ge {\epsilon}_{2}$, ${g}_{\epsilon}(X;Y)={g}_{{\epsilon}_{2}}(X;Y)$ and since for $\epsilon =I(X;Y)$, ${g}_{I(X;Y)}(X;Y)=H\left(Y\right)$, implying that for any $\epsilon \ge {\epsilon}_{2}$, we must have ${g}_{\epsilon}(X;Y)=H\left(Y\right)$ which contradicts the upper bound shown in (4).

**Corollary 3.**

For any given pair of random variables $(X,Y)$ over $\mathcal{X}\times \mathcal{Y}$, the mapping $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is continuous for $\epsilon \ge 0$.

**Proof.**

Concavity directly implies that the mapping $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is continuous on $(0,\infty )$ (see for example [31, Theorem 3.2]). Continuity at zero follows from the continuity of mutual information. ☐

**Remark 2.**

Using the concavity of the map $\epsilon \mapsto {g}_{\epsilon}(X;Y)$, we can provide an alternative proof for the lower bound in (4). Note that point $\left(I\right(X;Y),H(Y\left)\right)$ is always on the curve ${g}_{\epsilon}(X;Y)$, and hence by concavity, the straight line $\epsilon \mapsto \epsilon \frac{H\left(Y\right)}{I(X;Y)}$ is always below the lower convex envelop of ${g}_{\epsilon}(X;Y)$, i.e., the chord connecting $(0,{g}_{0}(X;Y))$ to $\left(I\right(X;Y),H(Y\left)\right)$, and hence ${g}_{\epsilon}(X;Y)\ge \epsilon \frac{H\left(Y\right)}{I(X;Y)}$. In fact, this chord yields a better lower bound for ${g}_{\epsilon}(X;Y)$ on $\epsilon \in [0,I(X;Y]$ as
which reduces to the lower bound in (4) only if ${g}_{0}(X;Y)=0$.

$${g}_{\epsilon}(X;Y)\ge \epsilon \frac{H\left(Y\right)}{I(X;Y)}+{g}_{0}(X;Y)\left[1-\frac{\epsilon}{I(X;Y)}\right]$$

#### 2.2. Maximal Correlation as Privacy Measure

By adopting the mutual information as the privacy measure between the private and the displayed data, we make sure that only limited bits of private information is revealed during the process of transferring Y. In order to have an estimation theoretic guarantee of privacy, we propose alternatively to measure privacy using a measure of correlation, the so-called maximal correlation.

Given the collection $\mathcal{C}$of all pairs of random variables $(U,V)\in \mathcal{U}\times \mathcal{V}$ where $\mathcal{U}$ and $\mathcal{V}$ are general alphabets, a mapping $T:\mathcal{C}\to [0,1]$ defines a measure of correlation [32] if $T(U,V)=0$ if and only if U and V are independent (in short, $U\perp \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\perp V$) and $T(U,V)$ attains its maximum value if $X=f\left(Y\right)$ or $Y=g\left(X\right)$ almost surely for some measurable real-valued functions f and g. There are many different examples of measures of correlation including the Hirschfeld-Gebelein-Rényi maximal correlation [32,33,34], the information measure [35], mutual information and f-divergence [36].

**Definition 4**

([34])
where $\mathcal{S}$ is the collection of pairs of real-valued random variables $f\left(X\right)$ and $g\left(Y\right)$ such that $\mathbb{E}f\left(X\right)=\mathbb{E}g\left(Y\right)=0$ and $\mathbb{E}{f}^{2}\left(X\right)=\mathbb{E}{g}^{2}\left(Y\right)=1$. If $\mathcal{S}$ is empty (which happens precisely when at least one of X and Y is constant almost surely) then one defines ${\rho}_{m}(X;Y)$ to be 0. Rényi [34] derived an equivalent characterization of maximal correlation as follows:

**.**Given random variables X and Y, the maximal correlation ${\rho}_{m}(X;Y)$ is defined as follows (recall that the correlation coefficient between U and V, is defined as $\rho (U;V):=\frac{\text{cov}(U;V)}{{\sigma}_{U}{\sigma}_{V}}$, where $\text{cov}(U;V),{\sigma}_{U}$ and ${\sigma}_{V}$ are the covariance between U and V, the standard deviations of U and V, respectively):
$${\rho}_{m}(X;Y):=\underset{f,g}{sup}\rho (f\left(X\right),g\left(Y\right))=\underset{\left(f\right(X),g(Y\left)\right)\in \mathcal{S}}{sup}\mathbb{E}\left[f\left(X\right)g\left(Y\right)\right]$$

$${\rho}_{m}^{2}(X;Y)=\underset{f:\mathbb{E}f\left(X\right)=0,\mathbb{E}{f}^{2}\left(X\right)=1}{sup}\mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|Y]\right].$$

Measuring privacy in terms of maximal correlation, we propose
as the corresponding rate-privacy tradeoff, where

$${\widehat{g}}_{\epsilon}(X;Y):=\underset{{P}_{Z|Y}\in {\widehat{\mathcal{D}}}_{\epsilon}\left(P\right)}{sup}I(Y;Z)$$

$${\widehat{\mathcal{D}}}_{\epsilon}\left(P\right):=\{{P}_{Z|Y}:\phantom{\rule{3.33333pt}{0ex}}X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z,\phantom{\rule{3.33333pt}{0ex}}{\rho}_{m}^{2}(X;Z)\le \epsilon ,{P}_{XY}=P\}$$

Again, we equivalently call ${\widehat{g}}_{\epsilon}(X;Y)$ as the privacy-constrained information extraction function, where here the privacy is guaranteed by ${\rho}_{m}^{2}(X;Z)\le \epsilon $.

Setting $\epsilon =0$ corresponds to the case where X and Z are required to be statistically independent, i.e., absolutely no information leakage about the private source X is allowed. This is called perfect privacy. Since the independence of X and Z is equivalent to $I(X;Z)={\rho}_{m}(X;Z)=0$, we have ${\widehat{g}}_{0}(X;Y)={g}_{0}(X;Y)$. However, for $\epsilon >0$, both ${g}_{\epsilon}(X;Y)\le {\widehat{g}}_{\epsilon}(X;Y)$ and ${g}_{\epsilon}(X;Y)\ge {\widehat{g}}_{\epsilon}(X;Y)$ might happen in general. For general $\epsilon \ge 0$, it directly follows using [23, Proposition 1] that
where ${\epsilon}^{\prime}:=log(k\epsilon +1)$ and $k:=\left|\mathcal{X}\right|-1$

$${\widehat{g}}_{\epsilon}(X;Y)\le {g}_{{\epsilon}^{\prime}}(X;Y)$$

Similar to ${g}_{\epsilon}(X;Y)$, we see that for ${\epsilon}_{1}\le {\epsilon}_{2}$, ${\widehat{\mathcal{D}}}_{{\epsilon}_{1}}\left(P\right)\subset {\widehat{\mathcal{D}}}_{{\epsilon}_{2}}\left(P\right)$ and hence $\epsilon \to {\widehat{g}}_{\epsilon}(X;Y)$ is non-decreasing. The following lemma is a counterpart of Lemma 1 for ${\widehat{g}}_{\epsilon}(X;Y)$.

**Lemma 5.**

For a given joint distribution ${P}_{XY}$ defined over $\mathcal{X}\times \mathcal{Y}$, $\epsilon \mapsto \frac{{\widehat{g}}_{\epsilon}(X;Y)}{\epsilon}$ is non-increasing on $(0,\infty )$.

**Proof.**

Like Lemma 1, the proof is similar to the proof of [30, Lemma 1]. We, however, give a brief proof for the sake of completeness.

For a given channel ${P}_{Z|Y}\in {\widehat{\mathcal{D}}}_{\epsilon}\left(P\right)$ and $\delta \ge 0$, we can define a new channel with an additional symbol e as follows
It is easy to check that $I(Y;{Z}^{\prime})=(1-\delta )I(Y;Z)$ and also ${\rho}_{m}^{2}(X;{Z}^{\prime})=(1-\delta ){\rho}_{m}^{2}(X;Z)$; see [37, Page 8], which implies that ${P}_{{Z}^{\prime}|Y}\in {\widehat{\mathcal{D}}}_{{\epsilon}^{\prime}}\left(P\right)$ where ${\epsilon}^{\prime}=(1-\delta )\epsilon $. Now suppose that ${P}_{Z|Y}$ achieves ${\widehat{g}}_{\epsilon}(X;Y)$, that is, ${\widehat{g}}_{\epsilon}(X;Y)=I(Y;Z)$ and ${\rho}_{m}^{2}(X;Z)=\epsilon $. We can then write
Therefore, for ${\epsilon}^{\prime}\le \epsilon $ we have $\frac{{g}_{{\epsilon}^{\prime}}(X;Y)}{{\epsilon}^{\prime}}\ge \frac{{g}_{\epsilon}(X;Y)}{\epsilon}$. ☐

$${P}_{{Z}^{\prime}|Y}\left({z}^{\prime}\right|y)=\left\{\begin{array}{cc}(1-\delta ){P}_{Z|Y}\left({z}^{\prime}\right|y)\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}{z}^{\prime}\ne e\hfill \\ \delta \hfill & \text{if}\phantom{\rule{4.pt}{0ex}}{z}^{\prime}=e\phantom{\rule{4.pt}{0ex}}\hfill \end{array}\right.$$

$$\frac{{\widehat{g}}_{\epsilon}(X;Y)}{\epsilon}=\frac{I(Y;Z)}{\epsilon}=\frac{I(Y;{Z}^{\prime})}{{\epsilon}^{\prime}}\le \frac{{g}_{{\epsilon}^{\prime}}(X;Y)}{{\epsilon}^{\prime}}$$

Similar to the lower bound for ${g}_{\epsilon}(X;Y)$ obtained from Lemma 1, we can obtain a lower bound for ${\widehat{g}}_{\epsilon}(X;Y)$ using Lemma 5. Before we get to the lower bound, we need a data processing lemma for maximal correlation. The following lemma proves a version of strong data processing inequality for maximal correlation from which the typical data processing inequality follows, namely, ${\rho}_{m}(X;Z)\le min\{{\rho}_{m}(Y;Z),{\rho}_{m}(X;Y)\}$ for $X,Y$ and Z satisfying $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$.

**Lemma 6.**

For random variables X and Y with a joint distribution ${P}_{XY}$, we have

$$\underset{\begin{array}{c}X\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Y\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Z\\ {\rho}_{m}(Y;Z)\ne 0\end{array}}{sup}\frac{{\rho}_{m}(X;Z)}{{\rho}_{m}(Y;Z)}={\rho}_{m}(X;Y)$$

**Proof.**

For arbitrary zero-mean and unit variance measurable functions $f\in {\mathcal{L}}^{2}\left(\mathcal{X}\right)$ and $g\in {\mathcal{L}}^{2}\left(\mathcal{Z}\right)$ and $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, we have
where the inequality follows from the Cauchy-Schwartz inequality and (9). Thus we obtain ${\rho}_{m}(X;Z)\le {\rho}_{m}(X;Y){\rho}_{m}(Y;Z)$.

$$\mathbb{E}\left[f\left(X\right)g\left(Z\right)\right]=\mathbb{E}\left[\mathbb{E}\left[f\right(X\left)\right|Y\left]\mathbb{E}\right[g\left(Z\right)\left|Y\right]\right]\le {\rho}_{m}(X;Y){\rho}_{m}(Y;Z)$$

This bound is tight for the special case of $X\to Y\to {X}^{\prime}$, where ${P}_{{X}^{\prime}|Y}$ is the backward channel associated with ${P}_{Y|X}$. In the following, we shall show that ${\rho}_{m}(X;Y){\rho}_{m}(Y;{X}^{\prime})={\rho}_{m}(X;{X}^{\prime})$.

To this end, first note that the above implies that ${\rho}_{m}(X;Y){\rho}_{m}(Y;{X}^{\prime})\ge {\rho}_{m}(X;{X}^{\prime})$. Since ${P}_{XY}={P}_{{X}^{\prime}Y}$, it follows that ${\rho}_{m}(X;Y)={\rho}_{m}(Y;{X}^{\prime})$ and hence the above implies that ${\rho}_{m}^{2}(X;Y)\ge {\rho}_{m}(X;{X}^{\prime})$. One the other hand, we have
which together with (9) implies that
Thus, ${\rho}_{m}^{2}(X;Y)={\rho}_{m}(X;{X}^{\prime})$ which completes the proof. ☐

$$\mathbb{E}\left[{\left[\mathbb{E}\left[f\left(X\right)\right|Y]\right]}^{2}\right]=\mathbb{E}\left[\mathbb{E}\left[f\left(X\right)\right|Y]\mathbb{E}\left[f\left({X}^{\prime}\right)\right|Y]\right]=\mathbb{E}\left[\mathbb{E}\left[f\left(X\right)f\left({X}^{\prime}\right)\right|Y]\right]=\mathbb{E}\left[f\left(X\right)f\left({X}^{\prime}\right)\right]$$

$${\rho}_{m}^{2}(X;Y)\le \underset{f:\mathbb{E}f\left(X\right)=0,\mathbb{E}{f}^{2}\left(X\right)=1}{sup}\mathbb{E}\left[f\left(X\right)f\left({X}^{\prime}\right)\right]\le {\rho}_{m}(X;{X}^{\prime})$$

Now a lower bound of ${\widehat{g}}_{\epsilon}(X;Y)$ can be readily obtained.

**Corollary 7.**

For a given joint distribution ${P}_{XY}$ defined over $\mathcal{X}\times \mathcal{Y}$, we have for any $\epsilon >0$

$${\widehat{g}}_{\epsilon}(X;Y)\ge \frac{H\left(Y\right)}{{\rho}_{m}^{2}(X;Y)}min\{\epsilon ,{\rho}_{m}^{2}(X;Y)\}$$

**Proof.**

By Lemma 6, we know that for any Markov chain $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, we have ${\rho}_{m}(X;Z)\le {\rho}_{m}(X;Y)$ and hence for $\epsilon \ge {\rho}_{m}^{2}(X;Y)$, the privacy constraint ${\rho}_{m}^{2}(X;Z)\le \epsilon $ is not restrictive and hence ${\widehat{g}}_{\epsilon}(X;Y)=H\left(Y\right)$ by setting $Y=Z$. For $0<\epsilon \le {\rho}_{m}^{2}(X;Y)$, Lemma 5 implies that
from which the result follows. ☐

$$\frac{{\widehat{g}}_{\epsilon}(X;Y)}{\epsilon}\ge \frac{H\left(Y\right)}{{\rho}_{m}^{2}(X;Y)}$$

A loose upper bound of ${\widehat{g}}_{\epsilon}(X;Y)$ can be obtained using an argument similar to the one used for ${g}_{\epsilon}(X;Y)$. For the Markov chain $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, we have
where $k:=\left|\mathcal{X}\right|-1$ and $\left(a\right)$ comes from [23, Proposition 1]. We can, therefore, conclude from (11) and Corollary 7 that

$$\begin{array}{ccc}\hfill I(Y;Z)& =& I(X;Z)+I(Y;Z|X)\le I(X;Z)+H(Y\left|X\right)\hfill \\ & \stackrel{\left(a\right)}{\le}& log\left(k{\rho}_{m}^{2}(X;Z)+1\right)+H\left(Y\right|X)\hfill \end{array}$$

$$\epsilon \frac{H\left(Y\right)}{{\rho}_{m}^{2}(X;Y)}\le {\widehat{g}}_{\epsilon}(X;Y)\le log\left(k\epsilon +1\right)+H\left(Y\right|X)$$

Similar to Lemma 2, the following lemma shows that the ${\widehat{g}}_{\epsilon}(X;Y)$ is a concave function of ε.

**Lemma 8.**

For any given pair of random variables $(X,Y)$ with distribution P over $\mathcal{X}\times \mathcal{Y}$, the mapping $\epsilon \mapsto {\widehat{g}}_{\epsilon}(X;Y)$ is concave for $\epsilon \ge 0$.

**Proof.**

The proof is similar to that of Lemma 2 except that here for two optimal filters ${P}_{{Z}_{1}|Y}:Y\to {Z}_{1}$ and ${P}_{{Z}_{3}|Y}:Y\to {Z}_{3}$ in ${\widehat{\mathcal{D}}}_{{\epsilon}_{1}}\left(P\right)$ and ${\widehat{\mathcal{D}}}_{{\epsilon}_{3}}\left(P\right)$, respectively, and the random channel ${P}_{{Z}_{\lambda}|Y}:Y\to Z$ with output alphabet ${\mathcal{Z}}_{1}\cup {\mathcal{Z}}_{3}$ constructed using a coin flip with probability γ, we need to show that ${P}_{{Z}_{\lambda}|Y}\in {\widehat{\mathcal{D}}}_{{\epsilon}_{2}}\left(P\right)$, where $0\le {\epsilon}_{1}<{\epsilon}_{2}<{\epsilon}_{3}\le {\rho}_{m}^{2}(X;Y)$. To show this, consider $f:\mathcal{X}\to \mathbb{R}$ such that $\mathbb{E}\left[f\right(X\left)\right]=0$ and $\mathbb{E}\left[{f}^{2}\left(X\right)\right]=1$ and let U be a binary random variable as in the proof of Lemma 2. We then have
where $\left(a\right)$ comes from the fact that U is independent of X. We can then conclude from (13) and the alternative characterization of maximal correlation (9) that
from which we can conclude that ${P}_{{Z}_{\lambda}|Y}\in {\widehat{\mathcal{D}}}_{{\epsilon}_{2}}\left(P\right)$. ☐

$$\begin{array}{ccc}\hfill \mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{\lambda}]\right]& =& \mathbb{E}\left[\mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{\lambda}]\right|U]\right]\hfill \\ & \stackrel{\left(a\right)}{=}& \gamma \mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{3}]\right]+(1-\gamma )\mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{1}]\right]\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {\rho}_{m}^{2}(X;{Z}_{\lambda})& =& \underset{f:\mathbb{E}\left[f\left(X\right)\right]=0,\mathbb{E}\left[{f}^{2}\left(X\right)\right]=1}{sup}\mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{\lambda}]\right]\hfill \\ & =& \underset{f:\mathbb{E}\left[f\left(X\right)\right]=0,\mathbb{E}\left[{f}^{2}\left(X\right)\right]=1}{sup}\left[\gamma \mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{3}]\right]+(1-\gamma )\mathbb{E}\left[{\mathbb{E}}^{2}\left[f\left(X\right)\right|{Z}_{1}]\right]\right]\hfill \\ & \le & \gamma {\rho}_{m}^{2}(X;{Z}_{3})+(1-\gamma ){\rho}_{m}^{2}(X;{Z}_{1})\le \gamma {\epsilon}_{3}+(1-\gamma ){\epsilon}_{1}\hfill \end{array}$$

#### 2.3. Non-Trivial Filters For Perfect Privacy

As it becomes clear later, requiring that ${g}_{0}(X;Y)=0$ is a useful assumption for the analysis of ${g}_{\epsilon}(X;Y)$. Thus, it is interesting to find a necessary and sufficient condition on the joint distribution ${P}_{XY}$ which results in ${g}_{0}(X;Y)=0$.

**Definition 9**

([38])

**.**The random variable X is said to be weakly independent of Y if the rows of the transition matrix ${P}_{X|Y}$, i.e., the set of vectors $\{{P}_{X|Y}(\xb7|y),\phantom{\rule{3.33333pt}{0ex}}y\in \mathcal{Y}\}$, are linearly dependent.The following lemma provides a necessary and sufficient condition for ${g}_{0}(X;Y)>0$.

**Lemma 10.**

For a given $(X,Y)$ with a given joint distribution ${P}_{XY}={P}_{Y}\times {P}_{X|Y}$, ${g}_{0}(X;Y)>0$ (and equivalently ${\widehat{g}}_{0}(X;Y)>0$) if and only if X is weakly independent of Y.

**Proof.**

⇒ direction:

Assuming that ${g}_{0}(X;Y)>0$ implies that there exists a random variable Z over an alphabet $\mathcal{Z}$ such that the Markov condition $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$ is satisfied and $Z\perp \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\perp X$ while $I(Y;Z)>0$. Hence, for any ${z}_{1}$ and ${z}_{2}$ in $\mathcal{Z}$, we must have ${P}_{X|Z}\left(x\right|{z}_{1})={P}_{X|Z}\left(x\right|{z}_{2})$ for all $x\in \mathcal{X}$, which implies that
and hence

$$\sum _{y\in \mathcal{Y}}{P}_{X|Y}\left(x\right|y){P}_{Y|Z}\left(y\right|{z}_{1})=\sum _{y\in \mathcal{Y}}{P}_{X|Y}\left(x\right|y){P}_{Y|Z}\left(y\right|{z}_{2})$$

$$\sum _{y\in \mathcal{Y}}{P}_{X|Y}\left(x\right|y)\left[{P}_{Y|Z}\left(y\right|{z}_{1})-{P}_{Y|Z}\left(y\right|{z}_{2})\right]=0$$

Since Y is not independent of Z, there exist ${z}_{1}$ and ${z}_{2}$ such that ${P}_{Y|Z}\left(y\right|{z}_{1})\ne {P}_{Y|Z}\left(y\right|{z}_{2})$ and hence the above shows that the set of vectors ${P}_{X|Y}(\xb7|y)$, $y\in \mathcal{Y}$ is linearly dependent.

⇐ direction:

Berger and Yeung [38, Appendix II], in a completely different context, showed that if X being weakly independent of Y, one can always construct a binary random variable Z correlated with Y which satisfies $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$ and $X\perp \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\perp Z$, and hence ${g}_{0}(X;Y)>0$. ☐

**Remark 3.**

Lemma 10 first appeared in [1]. However, Calmon et al. [20] studied (1), the dual version of ${g}_{\epsilon}(X;Y)$, and showed an equivalent result for ${t}_{R}(X;Y)$. In fact, they showed that for a given ${P}_{XY}$, one can always generate Z such that $I(X;Z)=0$, $I(Y;Z)>0$ and $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, or equivalently ${g}_{0}(X;Y)>0$, if and only if the smallest singular value of the conditional expectation operator $f\mapsto \mathbb{E}\left[f\right(X\left)\right|Y]$ is zero. This condition can, in fact, be shown to be equivalent to the condition in Lemma 10.

**Remark 4.**

It is clear that, according to Definition 9, X is weakly independent of Y if $\left|\mathcal{Y}\right|>\left|\mathcal{X}\right|$. Hence, Lemma 10 implies that ${g}_{0}(X;Y)>0$ if Y has strictly larger alphabet than X.

In light of the above remark, in the most common case of $\left|\mathcal{Y}\right|=\left|\mathcal{X}\right|$, one might have ${g}_{0}(X;Y)=0$, which corresponds to the most conservative scenario as no privacy leakage implies no broadcasting of observable data. In such cases, the rate of increase of ${g}_{\epsilon}(X;Y)$ at $\epsilon =0$, that is ${g}_{0}^{\prime}(X;Y):=\frac{\text{d}}{\text{d}\epsilon}{g}_{\epsilon}{(X;Y)|}_{\epsilon =0}$, which corresponds to the initial efficiency of privacy-constrained information extraction, proves to be very important in characterizing the behavior of ${g}_{\epsilon}(X;Y)$ for all $\epsilon \ge 0$. This is because, for example, by concavity of $\epsilon \mapsto {g}_{\epsilon}(X;Y)$, the slope of ${g}_{\epsilon}(X;Y)$ is maximized at $\epsilon =0$ and so
and hence ${g}_{\epsilon}(X;Y)\le \epsilon {g}_{0}^{\prime}(X;Y)$ for all $\epsilon \le I(X;Y)$ which, together with (4), implies that ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$ if ${g}_{0}^{\prime}(X;Y)\le \frac{H\left(Y\right)}{I(X;Y)}$. In the sequel, we always assume that X is not weakly independent of Y, or equivalently ${g}_{0}(X;Y)=0$. For example, in light of Lemma 10 and Remark 4, we can assume that $\left|\mathcal{Y}\right|\le \left|\mathcal{X}\right|$.

$${g}_{0}^{\prime}(X;Y)=\underset{\epsilon \to 0}{lim}\frac{{g}_{\epsilon}(X;Y)}{\epsilon}=\underset{\epsilon >0}{sup}\frac{{g}_{\epsilon}(X;Y)}{\epsilon}$$

It is easy to show that, X is weakly independent of binary Y if and only if X and Y are independent (see, e.g., [38, Remark 2]). The following corollary, therefore, immediately follows from Lemma 10.

**Corollary 11.**

Let Y be a non-degenerate binary random variable correlated with X. Then ${g}_{0}(X;Y)=0$.

## 3. Operational Interpretations of the Rate-Privacy Function

In this section, we provide a scenario in which ${g}_{\epsilon}(X;Y)$ appears as a boundary point of an achievable rate region and thus giving an information-theoretic operational interpretation for ${g}_{\epsilon}(X;Y)$. We then proceed to present an estimation-theoretic motivation for ${\widehat{g}}_{\epsilon}(X;Y)$.

#### 3.1. Dependence Dilution

Inspired by the problems of information amplification [39] and state masking [40], Courtade [26] proposed the information-masking tradeoff problem as follows. The tuple $({R}_{u},{R}_{v},{\Delta}_{A},{\Delta}_{M})\in {\mathbb{R}}^{4}$ is said to be achievable if for two given separated sources $U\in \mathcal{U}$ and $V\in \mathcal{V}$ and any $\epsilon >0$ there exist mappings $f:{\mathcal{U}}^{n}\to \{1,2,\cdots ,{2}^{n{R}_{u}}\}$ and $g:{\mathcal{V}}^{n}\to \{1,2,\cdots ,{2}^{n{R}_{v}}\}$ such that $I({U}^{n};f\left({U}^{n}\right),g\left({V}^{n}\right))\le n({\Delta}_{M}+\epsilon )$ and $I({V}^{n};f\left({U}^{n}\right),g\left({V}^{n}\right))\ge n({\Delta}_{A}-\epsilon )$. In other words, $({R}_{u},{R}_{v},{\Delta}_{A},{\Delta}_{M})$ is achievable if there exist indices K and J of rates ${R}_{u}$ and ${R}_{v}$ given ${U}^{n}$ and ${V}^{n}$, respectively, such that the receiver in possession of $(K,J)$ can recover at most $n{\Delta}_{M}$ bits about ${U}^{n}$ and at least $n{\Delta}_{A}$ about ${V}^{n}$. The closure of the set of all achievable tuple $({R}_{u},{R}_{v},{\Delta}_{A},{\Delta}_{M})$ is characterized in [26]. Here, we look at a similar problem but for a joint encoder. In fact, we want to examine the achievable rate of an encoder observing both ${X}^{n}$ and ${Y}^{n}$ which masks ${X}^{n}$ and amplifies ${Y}^{n}$ at the same time, by rates ${\Delta}_{M}$ and ${\Delta}_{A}$, respectively.

We define a $({2}^{nR},n)$ dependence dilution code by an encoder
and a list decoder
where ${2}^{{\mathcal{Y}}^{n}}$ denotes the power set of ${\mathcal{Y}}^{n}$. A dependence dilution triple$(R,{\Delta}_{A},{\Delta}_{M})\in {\mathbb{R}}_{+}^{3}$ is said to be achievable if, for any $\delta >0$, there exists a $({2}^{nR},n)$ dependence dilution code that for sufficiently large n satisfies the utility constraint:
having a fixed list size
where $J:={f}_{n}({X}^{n},{Y}^{n})$ is the encoder’s output, and satisfies the privacy constraint:

$${f}_{n}:{\mathcal{X}}^{n}\times {\mathcal{Y}}^{n}\to \{1,2,\cdots ,{2}^{nR}\}$$

$${g}_{n}:\{1,2,\cdots ,{2}^{nR}\}\to {2}^{{\mathcal{Y}}^{n}}$$

$$Pr\left({Y}^{n}\notin {g}_{n}\left(J\right)\right)<\delta $$

$$|{g}_{n}\left(J\right)|={2}^{n(H\left(Y\right)-{\Delta}_{A})},\phantom{\rule{2.em}{0ex}}\forall J\in \{1,2,\cdots ,{2}^{nR}\}$$

$$\frac{1}{n}I({X}^{n};J)\le {\Delta}_{M}+\delta $$

Intuitively speaking, upon receiving J, the decoder is required to construct list ${g}_{n}\left(J\right)\subset {\mathcal{Y}}^{n}$ of fixed size which contains likely candidates of the actual sequence ${Y}^{n}$. Without any observation, the decoder can only construct a list of size ${2}^{nH\left(Y\right)}$ which contains ${Y}^{n}$ with probability close to one. However, after J is observed and the list ${g}_{n}\left(J\right)$ is formed, the decoder’s list size can be reduced to ${2}^{n(H\left(Y\right)-{\Delta}_{A})}$ and thus reducing the uncertainty about ${Y}^{n}$ by $0\le n{\Delta}_{A}\le nH\left(Y\right)$. This observation led Kim et al. [39] to show that the utility constraint (14) is equivalent to the amplification requirement
which lower bounds the amount of information J carries about ${Y}^{n}$. The following lemma gives an outer bound for the achievable dependence dilution region.

$$\frac{1}{n}I({Y}^{n};J)\ge {\Delta}_{A}-\delta $$

**Theorem 12.**

Any achievable dependence dilution triple $(R,{\Delta}_{A},{\Delta}_{M})$ satisfies
for some auxiliary random variable $U\in \mathcal{U}$ with a finite alphabet and jointly distributed with X and Y.

$$\left\{\begin{array}{c}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{0.166667em}{0ex}}R\ge {\Delta}_{A}\hfill \\ {\Delta}_{A}\le I(Y;U)\hfill \\ {\Delta}_{M}\ge I(X;U)-I(Y;U)+{\Delta}_{A}\hfill \end{array}\right.$$

Before we prove this theorem, we need two preliminary lemmas. The first lemma is an extension of Fano’s inequality for list decoders and the second one makes use of a single-letterization technique to express $I({X}^{n};J)-I({Y}^{n};J)$ in a single-letter form in the sense of Csiszár and Körner [29].

**Lemma 13**

([39,41])
where ${p}_{e}:=Pr(V\notin g\left(U\right))$ and ${h}_{b}:[0,1]\to [0,1]$ is the binary entropy function.

**.**Given a pair of random variables $(U,V)$ defined over $\mathcal{U}\times \mathcal{V}$ for finite $\mathcal{V}$ and arbitrary $\mathcal{U}$, any list decoder $g:\mathcal{U}\to {2}^{\mathcal{V}}$, $U\mapsto g\left(U\right)$ of fixed list size m (i.e., $\left|g\right(u\left)\right|=m,\phantom{\rule{3.33333pt}{0ex}}\forall u\in \mathcal{U}$), satisfies
$$H\left(V\right|U)\le {h}_{b}\left({p}_{e}\right)+{p}_{e}log\left|\mathcal{V}\right|+(1-{p}_{e})logm$$

This lemma, applied to J and ${Y}^{n}$ in place of U and V, respectively, implies that for any list decoder with the property (14), we have
where ${\epsilon}_{n}:=\frac{1}{n}+(log|\mathcal{Y}|-\frac{1}{n}log|{g}_{n}\left(J\right)\left|\right){p}_{e}$ and hence ${\epsilon}_{n}\to 0$ as $n\to \infty $.

$$H\left({Y}^{n}\right|J)\le log\left|{g}_{n}\left(J\right)\right|+n{\epsilon}_{n}$$

**Lemma 14.**

Let $({X}^{n},{Y}^{n})$ be n i.i.d. copies of a pair of random variables $(X,Y)$. Then for a random variable J jointly distributed with $({X}^{n},{Y}^{n})$, we have
where ${U}_{i}:=(J,{X}_{i+1}^{n},{Y}^{i-1})$.

$$I({X}^{n};J)-I({Y}^{n};J)=\sum _{i=1}^{n}[I({X}_{i};{U}_{i})-I({Y}_{i};{U}_{i})]$$

**Proof.**

Using the chain rule for the mutual information, we can express $I({X}^{n};J)$ as follows
Similarly, we can expand $I({Y}^{n};J)$ as
Subtracting (20) from (19), we get
where $\left(a\right)$ follows from the Csiszár sum identity [42]. ☐

$$\begin{array}{ccc}\hfill I({X}^{n};J)& =& \sum _{i=1}^{n}I({X}_{i};J|{X}_{i+1}^{n})=\sum _{i=1}^{n}I({X}_{i};J,{X}_{i+1}^{n})\hfill \\ & =& \sum _{i=1}^{n}[I({X}_{i};J,{X}_{i+1}^{n},{Y}^{i-1})-I({X}_{i};{Y}^{i-1}|J,{X}_{i+1}^{n})]\hfill \\ & =& \sum _{i=1}^{n}I({X}_{i};{U}_{i})-\sum _{i=1}^{n}I({X}_{i};{Y}^{i-1}|J,{X}_{i+1}^{n})\hfill \end{array}$$

$$\begin{array}{ccc}\hfill I({Y}^{n};J)& =& \sum _{i=1}^{n}I({Y}_{i};J|{Y}^{i-1})=\sum _{i=1}^{n}I({Y}_{i};J,{Y}^{i-1})\hfill \\ & =& \sum _{i=1}^{n}[I({Y}_{i};J,{X}_{i+1}^{n},{Y}^{i-1})-I({Y}_{i};{X}_{i+1}^{n}|J,{Y}^{i-1})]\hfill \\ & =& \sum _{i=1}^{n}I({Y}_{i};{U}_{i})-\sum _{i=1}^{n}I({Y}_{i};{X}_{i+1}^{n}|J,{Y}^{i-1})\hfill \end{array}$$

$$\begin{array}{ccc}\hfill I({X}^{n};J)-I({Y}^{n};J)& =& \sum _{i=1}^{n}[I({X}_{i};{U}_{i})-I({Y}_{i};{U}_{i})]-\sum _{i=1}^{n}[I({X}_{i};{Y}^{i-1}|J,{X}_{i+1}^{n})-I({X}_{i+1}^{n};{Y}_{i}|J,{Y}^{i-1})]\hfill \\ & \stackrel{\left(a\right)}{=}& \sum _{i=1}^{n}[I({X}_{i};{U}_{i})-I({Y}_{i};{U}_{i})]\hfill \end{array}$$

**Proof of Theorem 12.**

The rate R can be bounded as
where $\left(a\right)$ follows from Fano’s inequality (18) with ${\epsilon}_{n}\to 0$ as $n\to \infty $ and $\left(b\right)$ is due to (15). We can also upper bound ${\Delta}_{A}$ as
where $\left(a\right)$ follows from (15), $\left(b\right)$ follows from (18), and in the last equality the auxiliary random variable ${U}_{i}:=({Y}^{i-1},{X}_{i+1}^{n},J)$ is introduced.

$$\begin{array}{ccc}\hfill nR& \ge & H\left(J\right)\ge I({Y}^{n};J)\hfill \\ & =& nH\left(Y\right)-H\left({Y}^{n}\right|J)\hfill \\ & \stackrel{\left(a\right)}{\ge}& nH\left(Y\right)-log|{g}_{n}\left(J\right)|-n{\epsilon}_{n}\hfill \end{array}$$

$$\begin{array}{ccc}& \stackrel{\left(b\right)}{=}& n{\Delta}_{A}-n{\epsilon}_{n}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {\Delta}_{A}& \stackrel{\left(a\right)}{=}& H\left({Y}^{n}\right)-log\left|{g}_{n}\left(J\right)\right|\hfill \\ & \stackrel{\left(b\right)}{\le}& H\left({Y}^{n}\right)-H\left({Y}^{n}\right|J)+n{\epsilon}_{n}\hfill \\ & =& \sum _{i=1}^{n}H\left({Y}_{i}\right)-H\left({Y}_{i}\right|{Y}^{i-1},J)+n{\epsilon}_{n}\hfill \\ & \le & \sum _{i=1}^{n}H\left({Y}_{i}\right)-H\left({Y}_{i}\right|{Y}^{i-1},{X}_{i+1}^{n},J)+n{\epsilon}_{n}\hfill \\ & =& \sum _{i=1}^{n}I({Y}_{i};{U}_{i})+n{\epsilon}_{n}\hfill \end{array}$$

We shall now lower bound $I({X}^{n};J)$:
where $\left(a\right)$ follows from Lemma 14 and $\left(b\right)$ is due to Fano’s inequality and (15) (or equivalently from (17)).

$$\begin{array}{ccc}\hfill \phantom{\rule{-14.22636pt}{0ex}}n({\Delta}_{M}+\delta )& \ge & I({X}^{n};J)\hfill \\ & \stackrel{\left(a\right)}{=}& I({Y}^{n};J)+\sum _{i=1}^{n}[I({X}_{i};{U}_{i})-I({Y}_{i};{U}_{i})]\hfill \\ & \stackrel{\left(b\right)}{\ge}& n{\Delta}_{A}+\sum _{i=1}^{n}[I({X}_{i};{U}_{i})-I({Y}_{i};{U}_{i})]-n{\epsilon}_{n}\hfill \end{array}$$

Combining (22), (23) and (24), we can write
where ${\epsilon}_{n}^{\prime}:={\epsilon}_{n}+\delta $ and Q is a random variable distributed uniformly over $\{1,2,\cdots ,n\}$ which is independent of $(X,Y)$ and hence $I({Y}_{Q};{U}_{Q}|Q)=\frac{1}{n}{\sum}_{i=1}^{n}I({Y}_{i};{U}_{i})$. The results follow by denoting $U:=({U}_{Q},Q)$ and noting that ${Y}_{Q}$ and ${X}_{Q}$ have the same distributions as Y and X, respectively. ☐

$$\begin{array}{ccc}\hfill R& \ge & {\Delta}_{A}-{\epsilon}_{n}\hfill \\ \hfill {\Delta}_{A}& \le & I({Y}_{Q};{U}_{Q}|Q)+{\epsilon}_{n}=I({Y}_{Q};{U}_{Q},Q)+{\epsilon}_{n}\hfill \\ \hfill {\Delta}_{M}& \ge & {\Delta}_{A}+I({X}_{Q};{U}_{Q}|Q)-I({Y}_{Q};{U}_{Q}|Q)-{\epsilon}_{n}^{\prime}\hfill \\ & =& {\Delta}_{A}+I({X}_{Q};{U}_{Q},Q)-I({Y}_{Q};{U}_{Q},Q)-{\epsilon}_{n}^{\prime}\hfill \end{array}$$

If the encoder does not have direct access to the private source ${X}^{n}$, then we can define the encoder mapping as ${f}_{n}:{\mathcal{Y}}^{n}\to \{1,2,\cdots ,{s}^{nR}\}$. The following corollary is an immediate consequence of Theorem 12.

**Corollary 15.**

If the encoder does not see the private source, then for all achievable dependence dilution triple $(R,{\Delta}_{A},{\Delta}_{M})$, we have
$$\left\{\begin{array}{c}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{0.166667em}{0ex}}R\ge {\Delta}_{A}\hfill \\ {\Delta}_{A}\le I(Y;U)\hfill \\ {\Delta}_{M}\ge I(X;U)-I(Y;U)+{\Delta}_{A}\hfill \end{array}\right.$$
for some joint distribution ${P}_{XYU}={P}_{XY}{P}_{U|Y}$ where the auxiliary random variable $U\in \mathcal{U}$ satisfies $\left|\mathcal{U}\right|\le \left|\mathcal{Y}\right|+1$.

**Remark 5.**

If source Y is required to be amplified (according to (17)) at maximum rate, that is, ${\Delta}_{A}=I(Y;U)$ for an auxiliary random variable U which satisfies $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -U$, then by Corollary 15, the best privacy performance one can expect from the dependence dilution setting is
which is equal to the dual of ${g}_{\epsilon}(X;Y)$ evaluated at ${\Delta}_{A}$, ${t}_{{\Delta}_{A}}(X;Y)$, as defined in (1).

$${\Delta}_{M}^{*}=\underset{\begin{array}{c}U:X\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}Y\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}U\\ I(Y;U)\ge {\Delta}_{A}\end{array}}{min}I(X;U)$$

The dependence dilution problem is closely related to the discriminatory lossy source coding problem studied in [15]. In this problem, an encoder f observes $({X}^{n},{Y}^{n})$ and wants to describe this source to a decoder, g, such that g recovers ${Y}^{n}$ within distortion level D and $I(f({X}^{n},{Y}^{n});{X}^{n})\le n{\Delta}_{M}$. If the distortion level is Hamming measure, then the distortion constraint and the amplification constraint are closely related via Fano’s inequality. Moreover, dependence dilution problem reduces to a secure lossless (list decoder of fixed size 1) source coding problem by setting ${\Delta}_{A}=H\left(H\right)$, which is recently studied in [43].

#### 3.2. MMSE Estimation of Functions of Private Information

In this section, we provide a justification for the privacy guarantee ${\rho}_{m}^{2}(X;Z)\le \epsilon $. To this end, we recall the definition of the minimum mean squared error estimation.

**Definition 16.**

Given random variables U and V, $\mathsf{mmse}\left(U\right|V)$ is defined as the minimum error of an estimate, $g\left(V\right)$, of U based on V, measured in the mean-square sense, that is
where $\mathsf{var}\left(U\right|V)$ denotes the conditional variance of U given V.

$$\mathsf{mmse}\left(U\right|V):=\underset{g\in {\mathcal{L}}^{2}\left(\mathcal{V}\right)}{inf}\mathbb{E}\left[{\left(U-g\left(V\right)\right)}^{2}\right]=\mathbb{E}\left[{\left(U-\mathbb{E}\left[U\right|V]\right)}^{2}\right]=\mathbb{E}\left[\mathsf{var}\left(U\right|V)\right]$$

It is easy to see that $\mathsf{mmse}\left(U\right|V)=0$ if and only if $U=f\left(V\right)$ for some measurable function f and $\mathsf{mmse}\left(U\right|V)=\mathsf{var}(U)$ if and only if $U\perp \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\perp V$. Hence, unlike for the case of maximal correlation, a small value of $\mathsf{mmse}\left(U\right|V)$ implies a strong dependence between U and V. Hence, although it is not a "proper" measure of correlation, in a certain sense it measures how well one random variable can be predicted from another one.

Given a non-degenerate measurable function $f:\mathcal{X}\to \mathbb{R}$, consider the following constraint on $\mathsf{mmse}\left(f\right(X\left)\right|Y)$
This guarantees that no adversary knowing Z can efficiently estimate $f\left(X\right)$. First consider the case where f is an identity function, i.e., $f\left(x\right)=x$. In this case, a direct calculation shows that
where $\left(a\right)$ follows from (26) and $\left(b\right)$ is due to the definition of maximal correlation. Having imposed ${\rho}_{m}^{2}(X;Z)\le \epsilon $, we, can therefore conclude that the MMSE of estimating X given Z satisfies
which shows that ${\rho}_{m}^{2}(X;Z)\le \epsilon $ implies (27) for $f\left(x\right)=x$. However, in the following we show that the constraint ${\rho}_{m}^{2}(X;Z)\le \epsilon $ is, indeed, equivalent to (27) for any non-degenerate measurable $f:\mathcal{X}\to \mathbb{R}$.

$$(1-\epsilon )\mathsf{var}\left(f\right(X\left)\right)\le \mathsf{mmse}\left(f\right(X\left)\right|Z)\le \mathsf{var}(f\left(X\right)).$$

$$\begin{array}{ccc}\hfill \mathsf{mmse}\left(X\right|Z)& \stackrel{\left(a\right)}{=}& \mathbb{E}\left[{(X-\mathbb{E}\left[X\right|Z])}^{2}\right]=\mathbb{E}\left[{X}^{2}\right]-\mathbb{E}\left[{\left(\mathbb{E}\left[X\right|Z]\right)}^{2}\right]\hfill \\ & =& \mathsf{var}\left(X\right)(1-{\rho}^{2}(X;\mathbb{E}\left[X\right|Z]))\hfill \\ & \stackrel{\left(b\right)}{\ge}& \mathsf{var}\left(X\right)(1-{\rho}_{m}^{2}(X;Z))\hfill \end{array}$$

$$(1-\epsilon )\mathsf{var}\left(X\right)\le \mathsf{mmse}\left(X\right|Z)\le \mathsf{var}(X)$$

**Definition 17**

([44])
and the Poincaré constant for ${P}_{UV}$ is defined as

**.**A joint distribution ${P}_{UV}$ satisfies a Poincaré inequality with constant $c\le 1$ if for all $f:\mathcal{U}\to \mathbb{R}$
$$c\xb7\mathsf{var}\left(f\right(U\left)\right)\le \mathsf{mmse}\left(f\right(U\left)\right|V)$$

$$\vartheta (U;V):=\underset{f}{inf}\frac{\mathsf{mmse}\left(f\right(U\left)\right|V)}{\mathsf{var}\left(f\right(U\left)\right)}$$

The privacy constraint (27) can then be viewed as

$$\vartheta (X;Z)\ge 1-\epsilon .$$

**Theorem 18**

In light of Theorem 18 and (29), the privacy constraint (27) is equivalent to ${\rho}_{m}^{2}(X;Z)\le \epsilon $, that is,
for any non-degenerate measurable functions $f:\mathcal{X}\to \mathbb{R}$.

$${\rho}_{m}^{2}(X;Z)\le \epsilon \u27fa(1-\epsilon )\mathsf{var}\left(f\left(X\right)\right)\le \mathsf{mmse}\left(f\left(X\right)\right|Z)\le \mathsf{var}\left(f\left(X\right)\right)$$

Hence, ${\widehat{g}}_{\epsilon}(X;Y)$ characterizes the maximum information extraction from Y such that no (non-trivial) function of X can be efficiently estimated, in terms of MMSE (27), given the extracted information.

## 4. Observation Channels for Minimal and Maximal ${\mathbf{g}}_{\epsilon}(\mathbf{X};\mathbf{Y})$

In this section, we characterize the observation channels which achieve the lower or upper bounds on the rate-privacy function in (4). We first derive general conditions for achieving the lower bound and then present a large family of observation channels ${P}_{Y|X}$ which achieve the lower bound. We also give a family of ${P}_{Y|X}$ which attain the upper bound on ${g}_{\epsilon}(X;Y)$.

#### 4.1. Conditions for Minimal ${g}_{\epsilon}(X;Y)$

Assuming that ${g}_{0}(X;Y)=0$, we seek a set of conditions on ${P}_{XY}$ such that ${g}_{\epsilon}(X;Y)$ is linear in ε, or equivalently, ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$. In order to do this, we shall examine the slope of ${g}_{\epsilon}(X;Y)$ at zero. Recall that by concavity of ${g}_{\epsilon}(X;Y)$, it is clear that ${g}_{0}^{\prime}(X;Y)\ge \frac{H\left(Y\right)}{I(X;Y)}$. We strengthen this bound in the following lemmas. For this, we need to recall the notion of Kullback-Leibler divergence. Given two probability distribution P and Q supported over a finite alphabet $\mathcal{U}$,

$$D\left(P\right|\left|Q\right):=\sum _{u\in \mathcal{U}}P\left(u\right)log\left(\frac{P\left(u\right)}{Q\left(u\right)}\right)$$

**Lemma 19.**

For a given joint distribution ${P}_{XY}={P}_{Y}\times {P}_{X|Y}$, if ${g}_{0}(X;Y)=0$, then for any $\epsilon \ge 0$

$${g}_{0}^{\prime}(X;Y)\ge \underset{y\in \mathcal{Y}}{max}\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)}$$

**Proof.**

The proof is given in Appendix A. ☐

**Remark 6.**

Note that if for a given joint distribution ${P}_{XY}$, there exists ${y}_{0}\in \mathcal{Y}$ such that $D\left({P}_{X|Y}(\xb7|{y}_{0})\right|\left|{P}_{X}(\xb7)\right)=0$, it implies that ${P}_{X|Y}(\xb7|{y}_{0})={P}_{X}\left(x\right)$. Consider the binary random variable $Z\in \{1,\text{e}\}$ constructed according to the distribution ${P}_{Z|Y}\left(1\right|{y}_{0})=1$ and ${P}_{Z|Y}\left(\text{e}\right|y)=1$ for $y\in \mathcal{Y}\backslash \left\{{y}_{0}\right\}$. We can now claim that Z is independent of X, because ${P}_{X|Z}\left(x\right|1)={P}_{X|Y}\left(x\right|{y}_{0})={P}_{X}\left(x\right)$ and

$$\begin{array}{ccc}\hfill {P}_{X|Z}\left(x\right|\mathrm{e})& =& \sum _{y\ne {y}_{0}}{P}_{X|Y}\left(x\right|y){P}_{Y|Z}\left(y\right|\mathrm{e})=\sum _{y\ne {y}_{0}}{P}_{X|Y}\left(x\right|y)\frac{{P}_{Y}\left(y\right)}{1-{P}_{Y}\left({y}_{0}\right)}\hfill \\ & =& \frac{1}{1-{P}_{Y}\left({y}_{0}\right)}\sum _{y\ne {y}_{0}}{P}_{XY}(x,y)={P}_{X}\left(x\right)\hfill \end{array}$$

Clearly, Z and Y are not independent, and hence ${g}_{0}(X;Y)>0$. This implies that the right-hand side of inequality in Lemma 19 can not be infinity.

In order to prove the main result, we need the following simple lemma.

**Lemma 20.**

For any joint distribution ${P}_{XY}$, we have
where equality holds if and only if there exists a constant $c>0$ such that $-log{P}_{Y}\left(y\right)=cD({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}\left(x\right))$ for all $y\in \mathcal{Y}$.

$$\frac{H\left(Y\right)}{I(X;Y)}\le \underset{y\in \mathcal{Y}}{max}\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}$$

**Proof.**

It is clear that
where the inequality follows from the fact that for any three sequences of positive numbers ${\left\{{a}_{i}\right\}}_{i=1}^{n}$, ${\left\{{b}_{i}\right\}}_{i=1}^{n}$ and ${\left\{{\lambda}_{i}\right\}}_{i=1}^{n}$ we have $\frac{{\sum}_{i=1}^{n}{\lambda}_{i}{a}_{i}}{{\sum}_{i=1}^{n}{\lambda}_{i}{b}_{i}}\le {max}_{1\le i\le n}\frac{{a}_{i}}{{b}_{i}}$ where equality occurs if and only if $\frac{{a}_{i}}{{b}_{i}}=c$ for all $1\le i\le n$. ☐

$$\frac{H\left(Y\right)}{I(X;Y)}=\frac{-{\sum}_{y\in \mathcal{Y}}{P}_{Y}\left(y\right)log{P}_{Y}\left(y\right)}{{\sum}_{y\in \mathcal{Y}}{P}_{Y}\left(y\right)D({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}\left(x\right))}\le \underset{y\in \mathcal{Y}}{max}\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}$$

Now we are ready to state the main result of this subsection.

**Theorem 21.**

For a given $(X,Y)$ with joint distribution ${P}_{XY}={P}_{Y}\times {P}_{X|Y}$, if ${g}_{0}(X;Y)=0$ and $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is linear for $0\le \epsilon \le I(X;Y)$, then for any $y\in \mathcal{Y}$

$$\frac{H\left(Y\right)}{I(X;Y)}=\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)}$$

**Proof.**

Note that the fact that ${g}_{0}(X;Y)=0$ and ${g}_{\epsilon}(X;Y)$ is linear in ε is equivalent to ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$. It is, therefore, immediate from Lemmas 19 and 20 that we have
where $\left(a\right)$ follows from the fact that ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$ and $\left(b\right)$ and $\left(c\right)$ are due to Lemmas 20 and 19, respectively. The inequality in (31) shows that

$$\begin{array}{ccc}\hfill {g}_{0}^{\prime}(X;Y)& \stackrel{\left(a\right)}{=}& \frac{H\left(Y\right)}{I(X;Y)}\stackrel{\left(b\right)}{\le}\underset{y\in \mathcal{Y}}{max}\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}\hfill \\ & \stackrel{\left(c\right)}{\le}& {g}_{0}^{\prime}(X;Y)\hfill \end{array}$$

$$\frac{H\left(Y\right)}{I(X;Y)}=\underset{y\in \mathcal{Y}}{max}\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}$$

According to Lemma 20, (32) implies that the ratio of $\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}$ does not depend on $y\in \mathcal{Y}$ and hence the result follows. ☐

This theorem implies that if there exists $y={y}_{1}$ and $y={y}_{2}$ such that $\frac{log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}\left(x\right)\right)}$ results in two different values, then $\epsilon \mapsto {g}_{\epsilon}(X,Y)$ cannot achieve the lower bound in (4), or equivalently

$${g}_{\epsilon}(X;Y)>\epsilon \frac{H\left(Y\right)}{I(X;Y)}$$

This, therefore, gives a necessary condition for the lower bound to be achievable. The following corollary simplifies this necessary condition.

**Corollary 22.**

For a given joint distribution ${P}_{XY}={P}_{Y}\times {P}_{X|Y}$, if ${g}_{0}(X;Y)=0$ and $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is linear, then the following are equivalent:

- (i)
- Y is uniformly distributed,
- (ii)
- $D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)$ is constant for all $y\in \mathcal{Y}$.

**Proof.**

$\left(i\right)\Rightarrow \left(ii\right)$:

From Theorem 21, we have for all $y\in \mathcal{Y}$

$$\frac{H\left(Y\right)}{I(X;Y)}=\frac{-log\left({P}_{Y}\left(y\right)\right)}{D\left({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}(\xb7)\right)}$$

Letting $D:=D\left({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}(\xb7)\right)$ for any $y\in \mathcal{Y}$, we have ${\sum}_{y}{P}_{Y}\left(y\right)D=I(X;Y)$ and hence $D=I(X;Y)$, which together with (33) implies that $H\left(Y\right)=-log\left({P}_{Y}\left(y\right)\right)$ for all $y\in \mathcal{Y}$ and hence Y is uniformly distributed.

$\left(ii\right)\Rightarrow \left(i\right)$:

When Y is uniformly distributed, we have from (33) that $I(X;Y)=D\left({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}(\xb7)\right)$ which implies that $D\left({P}_{X|Y}(\xb7|y)\left|\right|{P}_{X}(\xb7)\right)$ is constant for all $y\in \mathcal{Y}$. ☐

**Example 1.**

Suppose ${P}_{Y|X}$ is a binary symmetric channel (BSC) with crossover probability $0<\alpha <1$ and ${P}_{X}=\mathsf{Bernoulli}\left(0.5\right)$. In this case, ${P}_{X|Y}$ is also a BSC with input distribution ${P}_{Y}=\mathsf{Bernoulli}\left(0.5\right)$. Note that Corollary 11 implies that ${g}_{0}(X;Y)=0$. We will show that ${g}_{\epsilon}(X;Y)$ is linear as a function of $\epsilon \ge 0$ for a larger family of symmetric channels (including BSC) in Corollary 24. Hence, the BSC with uniform input nicely illustrates Corollary 22, because $D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)=1-h\left(\alpha \right)$ for $y\in \{0,1\}$.

**Example 2.**

Now suppose ${P}_{X|Y}$ is a binary asymmetric channel such that ${P}_{X|Y}(\xb7|0)=\mathsf{Bernoulli}\left({\alpha}_{0}\right)$, ${P}_{X|Y}(\xb7|1)=\mathsf{Bernoulli}\left({\alpha}_{1}\right)$ for some $0<{\alpha}_{0},{\alpha}_{1}<1$ and input distribution ${P}_{Y}=\mathsf{Bernoulli}\left(p\right)$, $0<p\le 0.5$. It is easy to see that if ${\alpha}_{0}+{\alpha}_{1}=1$ then $D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)$ does not depend on y and hence we can conclude from Corollary 22 (noticing that ${g}_{0}(X;Y)=0$) that in this case for any $p<0.5$, ${g}_{\epsilon}(X;Y)$ is not linear and hence for $0<\epsilon <I(X;Y)$

$${g}_{\epsilon}(X;Y)>\epsilon \frac{H\left(Y\right)}{I(X;Y)}$$

In Theorem 21, we showed that when ${g}_{\epsilon}(X;Y)$ achieves its lower bound, illustrated in (4), the slope of the mapping $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ at zero is equal to $\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)}$ for any $y\in \mathcal{Y}$. We will show in the next section that the reverse direction is also true at least for a large family of binary-input symmetric output channels, for instance when ${P}_{Y|X}$ is a BSC, and thus showing that in this case,

$${g}_{0}^{\prime}(X;Y)=\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\forall y\in \mathcal{Y}\u27fa{g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}0\le \epsilon \le I(X;Y)$$

#### 4.2. Special Observation Channels

In this section, we apply the results of last section to different joint distributions ${P}_{XY}$. In the first family of channels from X to Y, we look at the case where Y is binary and the reverse channel ${P}_{X|Y}$ has symmetry in a particular sense, which will be specified later. One particular case of this family of channels is when ${P}_{X|Y}$ is a BSC. As a family of observation channels which achieves the upper bound of ${g}_{\epsilon}(X;Y)$, stated in (4), we look at the class of erasure channels from $X\to Y$, i.e., Y is an erasure version of X.

#### 4.2.1. Observation Channels With Symmetric Reverse

The first example of ${P}_{XY}$ that we consider for binary Y is the so-called Binary Input Symmetric Output (BISO) ${P}_{X|Y}$, see for example [45,46]. Suppose $\mathcal{Y}=\{0,1\}$ and $\mathcal{X}=\{0,\pm 1,\pm 2,\cdots ,\pm k\}$, and for any $x\in \mathcal{X}$ we have ${P}_{X|Y}\left(x\right|1)={P}_{X|Y}(-x|0)$. This clearly implies that ${p}_{0}:={P}_{X|Y}\left(0\right|0)={P}_{X|Y}\left(0\right|1)$. We notice that with this definition of symmetry, we can always assume that the output alphabet $\mathcal{X}=\{\pm 1,\pm 2,\cdots ,\pm k\}$ has even number of elements because we can split $X=0$ into two outputs, $X={0}^{+}$ and $X={0}^{-}$, with ${P}_{X|Y}\left({0}^{-}\right|0)={P}_{X|Y}\left({0}^{+}\right|0)=\frac{{p}_{0}}{2}$ and ${P}_{X|Y}\left({0}^{-}\right|1)={P}_{X|Y}\left({0}^{+}\right|1)=\frac{{p}_{0}}{2}$. The new channel is clearly essentially equivalent to the original one, see [46] for more details. This family of channels can also be characterized using the definition of quasi-symmetric channels [47, Definition 4.17]. A channel $\mathsf{W}$ is BISO if (after making $\left|\mathcal{X}\right|$ even) the transition matrix ${P}_{X|Y}$ can be partitioned along its columns into binary-input binary-output sub-arrays in which rows are permutations of each other and the column sums are equal. It is clear that binary symmetric channels and binary erasure channels are both BISO. The following lemma gives an upper bound for ${g}_{\epsilon}(X,Y)$ when ${P}_{X|Y}$ belongs to such a family of channels.

**Lemma 23.**

If the channel ${P}_{X|Y}$ is BISO, then for $\epsilon \in [0,I(X;Y\left)\right]$,
where $C\left({P}_{X|Y}\right)$ denotes the capacity of ${P}_{X|Y}$.

$$\epsilon \frac{H\left(Y\right)}{I(X;Y)}\le {g}_{\epsilon}(X;Y)\le H\left(Y\right)-\frac{I(X;Y)-\epsilon}{C\left({P}_{X|Y}\right)}$$

**Proof.**

The lower bound has already appeared in (4). To prove the upper bound note that by Markovity $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$, we have for any $x\in \mathcal{X}$ and $z\in \mathcal{Z}$

$${P}_{X|Z}\left(x\right|z)={P}_{X|Y}\left(x\right|0){P}_{Y|Z}\left(0\right|z)+{P}_{X|Y}\left(x\right|1){P}_{Y|Z}\left(1\right|z)$$

Now suppose ${\mathcal{Z}}_{0}:=\{z:{P}_{Y|Z}\left(0\right|z)\le {P}_{Y|Z}\left(1\right|z)\}$ and similarly ${\mathcal{Z}}_{1}:=\{z:{P}_{Y|Z}\left(1\right|z)\le {P}_{Y|Z}\left(0\right|z)\}$. Then (34) allows us to write for $z\in {\mathcal{Z}}_{0}$
where ${h}_{b}^{-1}:[0,1]\to [0,0.5]$ is the inverse of binary entropy function, and for $z\in {\mathcal{Z}}_{1}$,

$${P}_{X|Z}\left(x\right|z)={P}_{X|Y}\left(x\right|0){h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right)+{P}_{X|Y}\left(x\right|1)(1-{h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right))$$

$${P}_{X|Z}\left(x\right|z)={P}_{X|Y}\left(x\right|0)(1-{h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right))+{P}_{X|Y}\left(x\right|1){h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right)$$

Letting $P\otimes {h}_{b}^{-1}\left(H\left(Y\right|z)\right)$ and $\tilde{P}\otimes {h}_{b}^{-1}\left(H\left(Y\right|z)\right)$ denote the right-hand sides of (35) and (36), respectively, we can, hence, write
where $H\left({X}_{\text{unif}}\right)$ denotes the entropy of X when Y is uniformly distributed. Here, $\left(a\right)$ is due to (35) and (36), $\left(b\right)$ follows form convexity of $u\mapsto H(P\otimes {h}_{b}^{-1}\left(u\right)))$ for all $u\in [0,1]$[48] and Jensen’s inequality. In $\left(c\right)$, we used the symmetry of channel ${P}_{X|Y}$ to show that $H\left(X\right|Y=0)=H(X|Y=1)=H\left(X\right|Y)$. Hence, we obtain
where the equality follows from the fact that for BISO channel (and in general for any quasi-symmetric channel) the uniform input distribution is the capacity-achieving distribution [47, Lemma 4.18]. Since ${g}_{\epsilon}(X;Y)$ is attained when $I(X;Z)=\epsilon $, the conclusion immediately follows. ☐

$$\begin{array}{ccc}\hfill H\left(X\right|Z)& =& \sum _{z\in \mathcal{Z}}{P}_{Z}\left(z\right)H\left(X\right|Z=z)\hfill \\ & \stackrel{\left(a\right)}{=}& \sum _{z\in {\mathcal{Z}}_{0}}{P}_{Z}\left(z\right)H(P\otimes {h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right))+\sum _{z\in {\mathcal{Z}}_{1}}{P}_{Z}\left(z\right)H(\tilde{P}\otimes {h}_{b}^{-1}\left(H\left(Y\right|Z=z)\right))\hfill \\ & \stackrel{\left(b\right)}{\le}& \sum _{z\in {\mathcal{Z}}_{0}}{P}_{Z}\left(z\right)\left[(1-H\left(Y\right|Z=z))H(P\otimes {h}_{b}^{-1}\left(0\right))+H\left(Y\right|Z=z)H(P\otimes {h}_{b}^{-1}\left(1\right))\right]\hfill \\ & & +\sum _{z\in {\mathcal{Z}}_{1}}{P}_{Z}\left(z\right)\left[(1-H\left(Y\right|Z=z))H(\tilde{P}\otimes {h}_{b}^{-1}\left(0\right))+H\left(Y\right|Z=z)H(\tilde{P}\otimes {h}_{b}^{-1}\left(1\right))\right]\hfill \\ & \stackrel{\left(c\right)}{=}& \sum _{z\in {\mathcal{Z}}_{0}}{P}_{Z}\left(z\right)\left[(1-H\left(Y\right|Z=z))H\left(X\right|Y)+H\left(Y\right|Z=z)H\left({X}_{\text{unif}}\right)\right]\hfill \\ & & +\sum _{z\in {\mathcal{Z}}_{1}}{P}_{Z}\left(z\right)\left[(1-H\left(Y\right|Z=z))H\left(X\right|Y)+H\left(Y\right|Z=z)H\left({X}_{\text{unif}}\right)\right]\hfill \\ & =& H\left(X\right|Y)[1-H\left(Y\right|Z)]+H\left(Y\right|Z)H\left({X}_{\text{unif}}\right)\hfill \end{array}$$

$$H\left(Y\right|Z)\ge \frac{H\left(X\right|Z)-H(X\left|Y\right)}{H\left({X}_{\text{unif}}\right)-H\left(X\right|Y)}=\frac{I(X;Y)-I(X;Z)}{C\left({P}_{X|Y}\right)}$$

This lemma then shows that the larger the gap between $I(X;Y)$ and $I(X;{Y}^{\prime})$ is for ${Y}^{\prime}\sim \mathsf{Bernoulli}\left(0.5\right)$, the more ${g}_{\epsilon}(X;Y)$ deviates from its lower bound. When $Y\sim \mathsf{Bernoulli}\left(0.5\right)$, then $C\left({P}_{Y|X}\right)=I(X;Y)$ and $H\left(Y\right)=1$ and hence Lemma 23 implies that
and hence we have proved the following corollary.

$$\frac{\epsilon}{I(X;Y)}\le {g}_{\epsilon}(X;Y)\le 1-\frac{I(X;Y)-\epsilon}{I(X;Y)}=\frac{\epsilon}{I(X;Y)}$$

**Corollary 24.**

If the channel ${P}_{X|Y}$ is BISO and $Y\sim \mathsf{Bernoulli}\left(0.5\right)$, then for any $\epsilon \ge 0$

$${g}_{\epsilon}(X;Y)=\frac{1}{I(X;Y)}min\{\epsilon ,I(X;Y)\}$$

This corollary now enables us to prove the reverse direction of Theorem 21 for the family of BISO channels.

**Theorem 25.**

If ${P}_{X|Y}$ is a BISO channel, then the following statements are equivalent:

- (i)
- ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$ for $0\le \epsilon \le I(X;Y)$.
- (ii)
- The initial efficiency of privacy-constrained information extraction is$${g}_{0}^{\prime}(X;Y)=\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\forall y\in \mathcal{Y}$$

**Proof.**

(i)⇒ (ii).

This follows from Theorem 21.

(ii)⇒ (i).

Let $Y\sim \mathsf{Bernoulli}\left(p\right)$ for $0<p<1$, and, as before, $\mathcal{X}=\{\pm 1,\pm 2,\cdots ,\pm k\}$, so that ${P}_{X|Y}$ is determined by a $2\times \left(2k\right)$ matrix. We then have
and
The hypothesis implies that (37) is equal to (38), that is,

$$\frac{-log{P}_{Y}\left(0\right)}{D\left({P}_{X|Y}(\xb7|0)\right|\left|{P}_{X}(\xb7)\right)}=\frac{log(1-p)}{H\left(X\right|Y)+{\sum}_{x=-k}^{k}{P}_{X|Y}\left(x\right|0)log\left({P}_{X}\left(x\right)\right)}$$

$$\frac{-log{P}_{Y}\left(1\right)}{D\left({P}_{X|Y}(\xb7|1)\right|\left|{P}_{X}(\xb7)\right)}=\frac{log\left(p\right)}{H\left(X\right|Y)+{\sum}_{x=-k}^{k}{P}_{X|Y}\left(x\right|1)log\left({P}_{X}\left(x\right)\right)}.$$

$$\frac{log(1-p)}{H\left(X\right|Y)+{\sum}_{x=-k}^{k}{P}_{X|Y}\left(x\right|0)log\left({P}_{X}\left(x\right)\right)}=\frac{log\left(p\right)}{H\left(X\right|Y)+{\sum}_{x=-k}^{k}{P}_{X|Y}\left(x\right|1)log\left({P}_{X}\left(x\right)\right)}$$

It is shown in Appendix B that (39) holds if and only if $p=0.5$. Now we can invoke Corollary 24 to conclude that ${g}_{\epsilon}(X;Y)=\epsilon \frac{H\left(Y\right)}{I(X;Y)}$. ☐

This theorem shows that for any BISO ${P}_{X|Y}$ channel with uniform input, the optimal privacy filter is an erasure channel depicted in Figure 2. Note that if ${P}_{X|Y}$ is a BSC with uniform input ${P}_{Y}=\mathsf{Bernoulli}\left(0.5\right)$, then ${P}_{Y|X}$ is also a BSC with uniform input ${P}_{X}=\mathsf{Bernoulli}\left(0.5\right)$. The following corollary specializes Corollary 24 for this case.

**Corollary 26.**

For the joint distribution ${P}_{X}{P}_{Y|X}=\mathsf{Bernoulli}\left(0.5\right)\times \text{BSC}\left(\alpha \right)$, the binary erasure channel with erasure probability (shown in Figure 4)
for $0\le \epsilon \le I(X;Y)$, is the optimal privacy filter in (3). In other words, for $\epsilon \ge 0$

$$\delta (\epsilon ,\alpha ):=1-\frac{\epsilon}{I(X;Y)}$$

$${g}_{\epsilon}(X;Y)=\frac{1}{I(X;Y)}min\{\epsilon ,I(X;Y)\}$$

Moreover, for a given $0<\alpha <\frac{1}{2}$, ${P}_{X}=\mathsf{Bernoulli}\left(0.5\right)$ is the only distribution for which $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is linear. That is, for ${P}_{X}{P}_{Y|X}=\mathsf{Bernoulli}\left(p\right)\times \text{BSC}\left(\alpha \right)$, $0<p<0.5$, we have

$${g}_{\epsilon}(X;Y)>\epsilon \frac{H\left(Y\right)}{I(X;Y)}$$

**Proof.**

As mentioned earlier, since ${P}_{X}=\mathsf{Bernoulli}\left(0.5\right)$ and ${P}_{Y|X}$ is $\text{BSC}\left(\alpha \right)$, it follows that ${P}_{X|Y}$ is also a BSC with uniform input and hence from Corollary 24, we have ${g}_{\epsilon}(X;Y)=\frac{\epsilon}{I(X;Y)}$. As in this case ${g}_{\epsilon}(X;Y)$ achieves the lower bound given in Lemma 1, we conclude from Figure 2 that BEC($\delta (\epsilon ,\alpha )$), where $\delta (\epsilon ,\alpha )=1-\frac{\epsilon}{I(X;Y)}$, is an optimal privacy filter. The fact that ${P}_{X}=\mathsf{Bernoulli}\left(0.5\right)$ is the only input distribution for which $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is linear follows from the proof of Theorem 25. In particular, we saw that a necessary and sufficient condition for ${g}_{\epsilon}(X;Y)$ being linear is that the ratio $\frac{-log{P}_{Y}\left(y\right)}{D\left({P}_{X|Y}(\xb7|y)\right|\left|{P}_{X}(\xb7)\right)}$ is constant for all $y\in \mathcal{Y}$. As shown before, this is equivalent to $Y\sim \mathsf{Bernoulli}\left(0.5\right)$. For the binary symmetric channel, this is equivalent to $X\sim \mathsf{Bernoulli}\left(0.5\right)$. ☐

The optimal privacy filter for BSC(α) and uniform X is shown in Figure 4. In fact, this corollary immediately implies that the general lower-bound given in (4) is tight for the binary symmetric channel with uniform X.

#### 4.2.2. Erasure Observation Channel

Combining (8) and Lemma 1, we have for $\epsilon \le I(X;Y)$

$$\epsilon \frac{H\left(Y\right)}{I(X;Y)}+{g}_{0}(X;Y)\left[1-\frac{\epsilon}{I(X;Y)}\right]\le {g}_{\epsilon}(X;Y)\le H\left(Y\right|X)+\epsilon $$

In the following we show that the above upper and lower bound coincide when ${P}_{Y|X}$ is an erasure channel, i.e., ${P}_{Y|X}\left(x\right|x)=1-\delta $ and ${P}_{Y|X}\left(\mathrm{e}\right|x)=\delta $ for all $x\in \mathcal{X}$ and $0\le \delta \le 1$.

**Lemma 27.**

For any given $(X,Y)$, if ${P}_{Y|X}$ is an erasure channel (as defined above), then
for any $\epsilon \ge 0$.

$${g}_{\epsilon}(X;Y)=H\left(Y\right|X)+min\{\epsilon ,I(X;Y)\}$$

**Proof.**

It suffices to show that if ${P}_{Y|X}$ is an erasure channel, then ${g}_{0}(X;Y)=H\left(Y\right|X)$. This follows, since if ${g}_{0}(X;Y)=H\left(Y\right|X)$, then the lower bound in (41) becomes $H\left(Y\right|X)+\epsilon $ and thus ${g}_{\epsilon}(X;Y)=H\left(Y\right|X)+\epsilon $.

Let $\left|\mathcal{X}\right|=m$ and $\mathcal{Y}=\mathcal{X}\cup \left\{\text{e}\right\}$ where e denotes the erasure symbol. Consider the following privacy filter to generate $Z\in \mathcal{Y}$:

$${P}_{Z|Y}\left(z\right|y)=\left\{\begin{array}{cc}\frac{1}{m}\hfill & \text{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}y\ne \text{e},z\ne \text{e},\hfill \\ 1\hfill & \text{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}y=z=\text{e}.\hfill \end{array}\right.$$

For any $x\in \mathcal{X}$, we have
which implies $Z\perp \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\perp X$ and thus $I(X;Z)=0$. On the other hand, ${P}_{Z}\left(z\right)=\left[\frac{1-\delta}{m}\right]{1}_{\{z\ne \text{e}\}}+\delta {1}_{\{z=\text{e}\}}$, and therefore we have

$${P}_{Z|X}\left(z\right|x)={P}_{Z|Y}\left(z\right|x){P}_{Y|X}\left(x\right|x)+{P}_{Z|Y}\left(z\right|\mathrm{e}){P}_{Y|X}\left(\mathrm{e}\right|x)=\left[\frac{1-\delta}{m}\right]{1}_{\{z\ne \mathrm{e}\}}+\delta {1}_{\{z=\mathrm{e}\}}$$

$$\begin{array}{ccc}\hfill {g}_{0}(X;Y)& \ge & I(Y;Z)=H\left(Z\right)-H\left(Z\right|Y)=H\left(\frac{1-\delta}{m},\cdots ,\frac{1-\delta}{m},\delta \right)-(1-\delta )log\left(m\right)\hfill \\ & =& h\left(\delta \right)=H\left(Y\right|X)\hfill \end{array}$$

It then follows from Lemma 1 that ${g}_{0}(X;Y)=H\left(Y\right|X)$, which completes the proof. ☐

**Example 3.**

In light of this lemma, we can conclude that if ${P}_{Y|X}=\text{BEC}\left(\delta \right)$, then the optimal privacy filter is a combination of an identity channel and a BSC($\alpha (\epsilon ,\delta )$), as shown in Figure 5, where $0\le \alpha (\epsilon ,\delta )\le \phantom{\rule{3.33333pt}{0ex}}\frac{1}{2}$ is the unique solution of
where $X\sim \mathsf{Bernoulli}\left(p\right)$, $p\le 0.5$ and $a*b=a(1-b)+b(1-a)$. Note that it is easy to check that $I(X;Z)=(1-\delta )[{h}_{b}(\alpha *p)-{h}_{b}\left(\alpha \right)]$. Therefore, in order for this channel to be a valid privacy filter, the crossover probability, $\alpha (\epsilon ,\delta )$, must be chosen such that $I(X;Z)=\epsilon $. We note that for fixed $0<\delta <1$ and $0<p<0.5$, the map $\alpha \mapsto (1-\delta )[{h}_{b}(\alpha *p)-{h}_{b}\left(\alpha \right)]$ is monotonically decreasing on $[0,\frac{1}{2}]$ ranging over $[0,(1-\delta ){h}_{b}\left(p\right)]$ and since $\epsilon \le I(X;Y)=(1-\delta ){h}_{b}\left(p\right)$, the solution of the above equation is unique.

$$(1-\delta )[{h}_{b}(\alpha *p)-{h}_{b}\left(\alpha \right)]=\epsilon $$

Combining Lemmas 1 and 27 with Corollary 26, we can show the following extremal property of the BEC and BSC channels, which is similar to other existing extremal properties of the BEC and the BSC, see, e.g., [46] and [45]. For $X\sim \mathsf{Bernoulli}\left(0.5\right)$, we have for any channel ${P}_{Y|X}$,
where ${g}_{\epsilon}\left(\text{BSC}\left(\alpha \right)\right)$ is the rate-privacy function corresponding to ${P}_{XY}=\mathsf{Bernoulli}\left(0.5\right)\times \text{BSC}\left(\alpha \right)$ and $\widehat{\alpha}:={h}_{b}^{-1}\left(H\left(X\right|Y)\right)$. Similarly, if $X\sim \mathsf{Bernoulli}\left(p\right)$, we have for any channel ${P}_{Y|X}$ with $H\left(Y\right|X)\le 1$,
where ${g}_{\epsilon}\left(\text{BEC}\left(\delta \right)\right)$ is the rate-privacy function corresponding to ${P}_{XY}=\mathsf{Bernoulli}\left(p\right)\times \text{BEC}\left(\delta \right)$ and $\widehat{\delta}:={h}_{b}^{-1}\left(H\left(Y\right|X)\right)$.

$${g}_{\epsilon}(X;Y)\ge \frac{\epsilon}{I(X;Y)}={g}_{\epsilon}\left(\text{BSC}\left(\widehat{\alpha}\right)\right)$$

$${g}_{\epsilon}(X;Y)\le H\left(Y\right|X)+\epsilon ={g}_{\epsilon}\left(\text{BEC}\left(\widehat{\delta}\right)\right)$$

## 5. Rate-Privacy Function for Continuous Random Variables

In this section we extend the rate-privacy function ${g}_{\epsilon}(X;Y)$ to the continuous case. Specifically, we assume that the private and observable data are continuous random variables and that the filter is composed of two stages: first Gaussian noise is added and then the resulting random variable is quantized using an M-bit accuracy uniform scalar quantizer (for some positive integer $M\in \mathbb{N}$). These filters are of practical interest as they can be easily implemented. This section is divided in two subsections, in the first we discuss general properties of the rate-privacy function and in the second we study the Gaussian case in more detail. Some observations on ${\widehat{g}}_{\epsilon}(X;Y)$ for continuous X and Y are also given.

#### 5.1. General Properties of the Rate-Privacy Function

Throughout this section we assume that the random vector $(X,Y)$ is absolutely continuous with respect to the Lebesgue measure on ${\mathbb{R}}^{2}$. Additionally, we assume that its joint density ${f}_{X,Y}$ satisfies the following.

- (a)
- There exist constants ${C}_{1}>0$, $p>1$ and bounded function ${C}_{2}:\mathbb{R}\to \mathbb{R}$ such that$${f}_{Y}\left(y\right)\le {C}_{1}{\left|y\right|}^{-p}$$$${f}_{Y|X}\left(y\right|x)\le {C}_{2}\left(x\right){\left|y\right|}^{-p}$$
- (b)
- $\mathbb{E}\left[{X}^{2}\right]$ and $\mathbb{E}\left[{Y}^{2}\right]$ are both finite,
- (c)
- the differential entropy of $(X,Y)$ satisfies $h(X,Y)>-\infty $,
- (d)
- $H(\lfloor Y\rfloor )<\infty $, where $\lfloor a\rfloor $ denotes the largest integer ℓ such that $\ell \le a$.

Note that assumptions (b) and (c) together imply that $h(X,Y)$, $h\left(X\right)$ and $h\left(Y\right)$ are finite, i.e., the maps $x\mapsto {f}_{X}\left(x\right)|log{f}_{X}\left(x\right)|,\phantom{\rule{0.277778em}{0ex}}y\mapsto {f}_{Y}\left(y\right)|log{f}_{Y}\left(y\right)|$ and $(x,y)\mapsto {f}_{X,Y}(x,y)|log\left({f}_{X,Y}(x,y)\right)|$ are integrable. We also assume that X and Y are not independent, since otherwise the problem to characterize ${g}_{\epsilon}(X;Y)$ becomes trivial by assuming that the displayed data Z can equal the observable data Y.

We are interested in filters of the form ${\mathcal{Q}}_{M}(Y+\gamma N)$ where $\gamma \ge 0$, $N\sim N(0,1)$ is a standard normal random variable which is independent of X and Y, and for any positive integer M, ${\mathcal{Q}}_{M}$ denotes the M-bit accuracy uniform scalar quantizer, i.e., for all $x\in \mathbb{R}$

$${\mathcal{Q}}_{M}\left(x\right)=\frac{1}{{2}^{M}}\u230a{2}^{M}x\u230b$$

Let ${Z}_{\gamma}=Y+\gamma N$ and ${Z}_{\gamma}^{M}={\mathcal{Q}}_{M}\left({Z}_{\gamma}\right)={\mathcal{Q}}_{M}(Y+\gamma N)$. We define, for any $M\in \mathbb{N}$,
and similarly

$${g}_{\epsilon ,M}(X;Y):=\underset{\begin{array}{c}\gamma \ge 0,\\ I(X;{Z}_{\gamma}^{M})\le \epsilon \end{array}}{sup}I(Y;{Z}_{\gamma}^{M})$$

$${g}_{\epsilon}(X;Y):=\underset{\begin{array}{c}\gamma \ge 0,\\ I(X;{Z}_{\gamma})\le \epsilon \end{array}}{sup}I(Y;{Z}_{\gamma})$$

The next theorem shows that the previous definitions are closely related.

**Theorem 28.**

Let $\epsilon >0$ be fixed. Then $\underset{M\to \infty}{lim}{g}_{\epsilon ,M}(X;Y)={g}_{\epsilon}(X;Y)$.

**Proof.**

See Appendix C. ☐

In the limit of large M, ${g}_{\epsilon}(X;Y)$ approximates ${g}_{\epsilon ,M}(X;Y)$. This becomes relevant when ${g}_{\epsilon}(X;Y)$ is easier to compute than ${g}_{\epsilon ,M}(X;Y)$, as demonstrated in the following subsection. The following theorem summarizes some general properties of ${g}_{\epsilon}(X;Y)$.

**Theorem 29.**

The function $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is non-negative, strictly-increasing, and satisfies

$$\underset{\epsilon \to 0}{lim}{g}_{\epsilon}(X;Y)=0\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}and\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}{g}_{I(X;Y)}(X;Y)=\infty $$

**Proof.**

See Appendix C. ☐

As opposed to the discrete case, in the continuous case ${g}_{\epsilon}(X;Y)$ is no longer bounded. In the following section we show that $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ can be convex, in contrast to the discrete case where it is always concave.

We can also define ${\widehat{g}}_{\epsilon ,M}(X;Y)$ and ${\widehat{g}}_{\epsilon}(X;Y)$ for continuous X and Y, similar to (43) and (44), but where the privacy constraints are replaced by ${\rho}_{m}^{2}(X;{Z}_{\gamma}^{M})\le \epsilon $ and ${\rho}_{m}^{2}(X;{Z}_{\gamma})\le \epsilon $, respectively. It is clear to see from Theorem 29 that ${\widehat{g}}_{0}(X;Y)={g}_{0}(X;Y)=0$ and ${\widehat{g}}_{{\rho}^{2}(X;Y)}(X;Y)=\infty $. However, although we showed that ${g}_{\epsilon}(X;Y)$ is indeed the asymptotic approximation of ${g}_{\epsilon ,M}(X;Y)$ for M large enough, it is not clear that the same statement holds for ${\widehat{g}}_{\epsilon}(X;Y)$ and ${\widehat{g}}_{\epsilon ,M}(X;Y)$.

#### 5.2. Gaussian Information

The rate-privacy function for Gaussian Y has an interesting interpretation from an estimation theoretic point of view. Given the private and observable data $(X,Y)$, suppose an agent is required to estimateY based on the output of the privacy filter. We wish to know the effect of imposing a privacy constraint on the estimation performance.

The following lemma shows that ${g}_{\epsilon}(X;Y)$ bounds the best performance of the predictability of Y given the output of the privacy filter. The proof provided for this lemma does not use the Gaussianity of the noise process, so it holds for any noise process.

**Lemma 30.**

For any given private data X and Gaussian observable data Y, we have for any $\epsilon \ge 0$

$$\underset{\begin{array}{c}\gamma \ge 0,\\ I(X;{Z}_{\gamma})\le \epsilon \end{array}}{inf}\mathsf{mmse}\left(Y\right|{Z}_{\gamma})\ge \mathsf{var}\left(Y\right){2}^{-2{g}_{\epsilon}(X;Y)}$$

**Proof.**

It is a well-known fact from rate-distortion theory that for a Gaussian Y and its reconstruction $\widehat{Y}$
and hence by setting $\widehat{Y}=\mathbb{E}\left[Y\right|{Z}_{\gamma}]$, where ${Z}_{\gamma}$ is an output of a privacy filter, and noting that $I(Y;\widehat{Y})\le I(Y;{Z}_{\gamma})$, we obtain
from which the result follows immediately. ☐

$$I(Y;\widehat{Y})\ge \frac{1}{2}log\frac{\mathsf{var}\left(Y\right)}{\mathbb{E}[{(Y-\widehat{Y})}^{2}]}$$

$$\mathsf{mmse}\left(Y\right|{Z}_{\gamma})\ge \mathsf{var}\left(Y\right){2}^{-2I(Y;{Z}_{\gamma})}$$

According to Lemma 30, the quantity ${\lambda}_{\epsilon}\left(X\right):={2}^{-2{g}_{\epsilon}(X;Y)}$ is a parameter that bounds the difficulty of estimating Gaussian Y when observing an additive perturbation Z with privacy constraint $I(X;Z)\le \epsilon $. Note that $0<{\lambda}_{\epsilon}\left(X\right)\le 1$, and therefore, provided that the privacy threshold is not trivial (i.e, $\epsilon <I(X;Y)$), the mean squared error of estimating Y given the privacy filter output is bounded away from zero, however the bound decays exponentially at rate of ${g}_{\epsilon}(X;Y)$.

To finish this section, assume that X and Y are jointly Gaussian with correlation coefficient ρ. The value of ${g}_{\epsilon}(X;Y)$ can be easily obtained in closed form as demonstrated in the following theorem.

**Theorem 31.**

Let $(X,Y)$ be jointly Gaussian random variables with correlation coefficient ρ. For any $\epsilon \in [0,I(X;Y\left)\right)$ we have

$${g}_{\epsilon}(X;Y)=\frac{1}{2}log\left(\frac{{\rho}^{2}}{{2}^{-2\epsilon}+{\rho}^{2}-1}\right)$$

**Proof.**

One can always write $Y=aX+{N}_{1}$ where ${a}^{2}={\rho}^{2}\frac{\mathsf{var}\left(Y\right)}{\mathsf{var}\left(X\right)}$ and ${N}_{1}$ is a Gaussian random variable with mean 0 and variance ${\sigma}^{2}=(1-{\rho}^{2})\mathsf{var}\left(Y\right)$ which is independent of $(X,Y)$. On the other hand, we have ${Z}_{\gamma}=Y+\gamma N$ where N is the standard Gaussian random variable independent of $(X,Y)$ and hence ${Z}_{\gamma}=aX+{N}_{1}+\gamma N$. In order for this additive channel to be a privacy filter, it must satisfy
which implies
and hence

$$I(X;{Z}_{\gamma})\le \epsilon $$

$$\frac{1}{2}log\left(\frac{\mathsf{var}\left(Y\right)+{\gamma}^{2}}{{\sigma}^{2}+{\gamma}^{2}}\right)\le \epsilon $$

$${\gamma}^{2}\ge \frac{{2}^{-2\epsilon}+{\rho}^{2}-1}{1-{2}^{-2\epsilon}}\mathsf{var}\left(Y\right)=:{\gamma}^{*}$$

Since $\gamma \mapsto I(Y;{Z}_{\gamma})$ is strictly decreasing (cf., Appendix C), we obtain

$$\begin{array}{ccc}\hfill {g}_{\epsilon}(X;Y)& =& I(Y;{Z}_{{\gamma}^{*}})=\frac{1}{2}log\left(1+\frac{\mathsf{var}\left(Y\right)}{{\gamma}^{2}}\right)\hfill \\ & =& \frac{1}{2}log\left(1+\frac{1-{2}^{-2\epsilon}}{{2}^{-2\epsilon}+{\rho}^{2}-1}\right)\hspace{1em}\u2610\hfill \end{array}$$

According to (46), we conclude that the optimal privacy filter for jointly Gaussian $(X,Y)$ is an additive Gaussian channel with signal to noise ratio $\frac{1-{2}^{-2\epsilon}}{{2}^{-2\epsilon}+{\rho}^{2}-1}$, which shows that if perfect privacy is required, then the displayed data is independent of the observable data Y, i.e., ${g}_{0}(X;Y)=0$.

**Remark 7.**

We could assume that the privacy filter adds non-Gaussian noise to the observable data and define the rate-privacy function accordingly. To this end, we define
where ${Z}_{\gamma}^{f}=Y+\gamma {M}_{f}$ and ${M}_{f}$ is a noise process that has stable distribution with density f and is independent of $(X,Y)$. In this case, we can use a technique similar to Oohama [49] to lower bound ${g}_{\epsilon}^{f}(X;Y)$ for jointly Gaussian $(X,Y)$. Since X and Y are jointly Gaussian, we can write $X=aY+bN$ where ${a}^{2}={\rho}^{2}\frac{\mathsf{var}\left(X\right)}{\mathsf{var}\left(Y\right)}$, $b=\sqrt{(1-{\rho}^{2})\mathsf{var}X}$, and N is a standard Gaussian random variable that is independent of Y. We can apply the conditional entropy power inequality (cf., [42, Page 22]) for a random variable Z that is independent of N, to obtain
and hence

$${g}_{\epsilon}^{f}(X;Y):=\underset{\genfrac{}{}{0pt}{}{\gamma \ge 0,}{I(X;{Z}_{\gamma}^{f})}}{sup}I(Y;{Z}_{\gamma}^{f})$$

$${2}^{2h\left(X\right|Z)}\ge {2}^{2h\left(aY\right|Z)}+{2}^{2h\left(N\right)}={a}^{2}{2}^{2h\left(Y\right|Z)}+2\pi e(1-{\rho}^{2})\mathsf{var}\left(X\right)$$

$${2}^{-2I(X;Z)}{2}^{2h\left(X\right)}\ge {a}^{2}{2}^{2h\left(Y\right)}{2}^{-2I(Y;Z)}+2\pi e(1-{\rho}^{2})\mathsf{var}\left(X\right)$$

Assuming $Z={Z}_{\gamma}^{f}$ and taking infimum from both sides of above inequality over γ such that $I(X;{Z}_{\gamma}^{f})\le \epsilon $, we obtain
which shows that for Gaussian $(X,Y)$, Gaussian noise is the worst stable additive noise in the sense of privacy-constrained information extraction.

$${g}_{\epsilon}^{f}(X;Y)\ge \frac{1}{2}log\left(\frac{{\rho}^{2}}{{2}^{-2\epsilon}+{\rho}^{2}-1}\right)={g}_{\epsilon}(X;Y)$$

We can also calculate ${\widehat{g}}_{\epsilon}(X;Y)$ for jointly Gaussian $(X,Y)$.

**Theorem 32.**

Let $(X,Y)$ be jointly Gaussian random variables with correlation coefficient ρ. For any $\epsilon \in [0,{\rho}^{2})$ we have that

$${\widehat{g}}_{\epsilon}(X;Y)=\frac{1}{2}log\left(\frac{{\rho}^{2}}{{\rho}^{2}-\epsilon}\right)$$

**Proof.**

Since for the correlation coefficient between Y and ${Z}_{\gamma}$ we have for any $\gamma \ge 0$,
we can conclude that

$${\rho}^{2}(Y;{Z}_{\gamma})=\frac{\mathsf{var}\left(Y\right)}{\mathsf{var}\left(Y\right)+{\gamma}^{2}}$$

$${\rho}^{2}(X;{Z}_{\gamma})=\frac{{\rho}^{2}\mathsf{var}\left(Y\right)}{\mathsf{var}\left(Y\right)+{\gamma}^{2}}$$

Since ${\rho}_{m}^{2}(X;Z)={\rho}^{2}(X;Z)$ (see, e.g., [34]), the privacy constraint ${\rho}_{m}^{2}(X;Z)\le \epsilon $ implies that
and hence

$$\frac{{\rho}^{2}\mathsf{var}\left(Y\right)}{\mathsf{var}\left(Y\right)+{\gamma}^{2}}\le \epsilon $$

$${\gamma}^{2}\ge \frac{({\rho}^{2}-\epsilon )\mathsf{var}\left(Y\right)}{\epsilon}=:{\widehat{\gamma}}_{\epsilon}^{2}$$

By monotonicity of the map $\gamma \mapsto I(Y;{Z}_{\gamma})$, we have

$${\widehat{g}}_{\epsilon}(X;Y)=I(Y;{Z}_{{\widehat{\gamma}}_{\epsilon}})=\frac{1}{2}log\left(1+\frac{\mathsf{var}\left(Y\right)}{{\widehat{\gamma}}_{\epsilon}^{2}}\right)=\frac{1}{2}log\left(\frac{{\rho}^{2}}{{\rho}^{2}-\epsilon}\right)\hspace{1em}\u2610$$

Theorems 31 and 32 show that unlike to the discrete case (cf. Lemmas 2 and 8), $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ and $\epsilon \mapsto {\widehat{g}}_{\epsilon}(X;Y)$ are convex.

## 6. Conclusions

In this paper, we studied the problem of determining the maximal amount of information that one can extract by observing a random variable Y, which is correlated with another random variable X that represents sensitive or private data, while ensuring that the extracted data Z meets a privacy constraint with respect to X. Specifically, given two correlated discrete random variables X and Y, we introduced the rate-privacy function as the maximization of $I(Y;Z)$ over all stochastic ”privacy filters” ${P}_{Z|Y}$ such that $pm(X;Z)\le \u03f5$, where $pm(\xb7;\xb7)$ is a privacy measure and $\u03f5\ge 0$ is a given privacy threshold. We considered two possible privacy measure functions, $pm(X;Z)=I(X;Z)$ and $pm(X;Z)={\rho}_{m}^{2}(X;Z)$ where ${\rho}_{m}$ denotes maximal correlation, resulting in the rate-privacy functions ${g}_{\u03f5}(X;Y)$ and ${\widehat{g}}_{\u03f5}(X;Y)$, respectively. We analyzed these two functions, noting that each function lies between easily evaluated upper and lower bounds, and derived their monotonicity and concavity properties. We next provided an information-theoretic interpretation for ${g}_{\u03f5}(X;Y)$ and an estimation-theoretic characterization for ${\widehat{g}}_{\u03f5}(X;Y)$. In particular, we demonstrated that the dual function of ${g}_{\u03f5}(X;Y)$ is a corner point of an outer bound on the achievable region of the dependence dilution coding problem. We also showed that ${\widehat{g}}_{\u03f5}(X;Y)$ constitutes the largest amount of information that can be extracted from Y such that no meaningful MMSE estimation of any function of X can be realized by just observing the extracted information Z. We then examined conditions on ${P}_{XY}$ under which the lower bound on ${g}_{\u03f5}(X;Y)$ is tight, hence determining the exact value of ${g}_{\u03f5}(X;Y)$. We also showed that for any given Y, if the observation channel ${P}_{Y|X}$ is an erasure channel, then ${g}_{\u03f5}(X;Y)$ attains its upper bound. Finally, we extended the notions of the rate-privacy functions ${g}_{\u03f5}(X;Y)$ and ${\widehat{g}}_{\u03f5}(X;Y)$ to the continuous case where the observation channel consists of an additive Gaussian noise channel followed by uniform scalar quantization.

## Acknowledgments

This work was supported in part by Natural Sciences and Engineering Council (NSERC) of Canada.

## Author Contributions

All authors of this paper contributed equally. All authors have read and approved the final manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A. Proof of Lemma 19

Given a joint distribution ${P}_{XY}$ defined over $\mathcal{X}\times \mathcal{Y}$ where $\mathcal{X}=\{1,2,\cdots ,m\}$ and $\mathcal{Y}=\{1,2,\cdots ,n\}$ with $n\le m$, we consider a privacy filter specified by the following distribution for $\delta >0$ and $\mathcal{Z}=\{k,e\}$
where ${1}_{\{\xb7\}}$ denotes the indicator function. The system of $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -\phantom{\rule{-2.84544pt}{0ex}}Z$ in this case is depicted in Figure 6 for the case of $k=1$.

$$\begin{array}{ccc}\hfill {P}_{Z|Y}\left(k\right|y)& =& \delta {1}_{\{y=k\}}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {P}_{Z|Y}\left(\mathrm{e}\right|y)& =& 1-\delta {1}_{\{y=k\}}\hfill \end{array}$$

We clearly have ${P}_{Z}\left(k\right)=\delta {P}_{Y}\left(k\right)$ and ${P}_{Z}\left(\mathrm{e}\right)=1-\delta {P}_{Y}\left(k\right)$, and hence
and also,
It, therefore, follows that for $k\in \{1,2,\cdots ,n\}$
and

$${P}_{X|Z}\left(x\right|k)=\frac{{P}_{XZ}(x,k)}{\delta {P}_{Y}\left(k\right)}=\frac{{P}_{XYZ}(x,k,k)}{\delta {P}_{Y}\left(k\right)}=\frac{\delta {P}_{XY}(x,k)}{\delta {P}_{Y}\left(k\right)}={P}_{X|Y}\left(x\right|k)$$

$$\begin{array}{ccc}\hfill {P}_{X|Z}\left(x\right|\mathrm{e})& =& \frac{{P}_{XZ}(x,\mathrm{e})}{1-\delta {P}_{Y}\left(k\right)}=\frac{{\sum}_{y}{P}_{XYZ}(x,y,\mathrm{e})}{1-\delta {P}_{Y}\left(k\right)}\hfill \\ & =& \frac{{\sum}_{y\ne k}{P}_{XYZ}(x,y)+(1-\delta ){P}_{XY}(x,k)}{1-\delta {P}_{Y}\left(k\right)}=\frac{{P}_{X}\left(x\right)-\delta {P}_{XY}(x,k)}{1-\delta {P}_{Y}\left(k\right)}\hfill \end{array}$$

$$H\left(X\right|Z=k)=H(X|Y=k)$$

$$H\left(X\right|Z=\mathrm{e})=H\left(\frac{{P}_{X}\left(1\right)-\delta {P}_{XY}(1,k)}{1-\delta {P}_{Y}\left(k\right)},\cdots ,\frac{{P}_{X}\left(m\right)-\delta {P}_{XY}(m,k)}{1-\delta {P}_{Y}\left(k\right)}\right)=:{\mathcal{h}}_{X}\left(\delta \right)$$

We then write
and hence,
where

$$I(X;Z)=H\left(X\right)-H\left(X\right|Z)=H\left(X\right)-\delta {P}_{Y}\left(k\right)H\left(X\right|Y=k)-(1-\delta {P}_{Y}\left(k\right)){\mathcal{h}}_{X}\left(\delta \right)$$

$$\frac{\text{d}}{\text{d}\delta}I(X;Z)=-{P}_{Y}\left(k\right)H\left(X\right|Y=k)+{P}_{Y}\left(k\right){\mathcal{h}}_{X}\left(\delta \right)-(1-\delta {P}_{Y}\left(k\right)){\mathcal{h}}_{X}^{\prime}\left(\delta \right)$$

$${\mathcal{h}}_{X}^{\prime}\left(\delta \right)=\frac{\text{d}}{\text{d}\delta}{\mathcal{h}}_{X}\left(\delta \right)=-\sum _{x=1}^{m}\frac{{P}_{X}\left(x\right){P}_{Y}\left(k\right)-{P}_{XY}(x,k)}{{[1-\delta {P}_{Y}\left(k\right)]}^{2}}log\left(\frac{{P}_{X}\left(x\right)-\delta {P}_{XY}(x,y)}{1-\delta {P}_{Y}\left(k\right)}\right)$$

Using the first-order approximation of mutual information for $\delta =0$, we can write
Similarly, we can write
where $\Psi \left(x\right):=xlogx$ which yields

$$\begin{array}{ccc}\hfill I(X;Z)& =& \frac{\text{d}}{\text{d}\delta}{I(X;Z)|}_{\delta =0}\delta +o\left(\delta \right)\hfill \\ & =& \delta \left[\sum _{x=1}^{m}{P}_{XY}(x,k)log\left(\frac{{P}_{XY}(x,k)}{{P}_{X}\left(x\right){P}_{Y}\left(k\right)}\right)\right]+o\left(\delta \right)\hfill \\ & =& \delta {P}_{Y}\left(k\right)D({P}_{X|Y}(\xb7|k)\left|\right|{P}_{X}(\xb7))+o\left(\delta \right)\hfill \end{array}$$

$$\begin{array}{ccc}\hfill I(Y;Z)& =& h\left(Z\right)-\sum _{y=1}^{n}{P}_{Y}\left(y\right)h\left(Z\right|Y=y)=h\left(Z\right)-{P}_{Y}\left(k\right)h\left(\delta \right)=h\left(\delta {P}_{Y}\left(k\right)\right)-{P}_{Y}\left(k\right)h\left(\delta \right)\hfill \\ & =& -\delta {P}_{Y}\left(k\right)log\left({P}_{Y}\left(k\right)\right)-\Psi (1-\delta {P}_{Y}\left(k\right))+{P}_{Y}\left(k\right)\Psi (1-\delta )\hfill \end{array}$$

$$\frac{\text{d}}{\text{d}\delta}I(Y;Z)=-\Psi \left({P}_{Y}\left(k\right)\right)+{P}_{Y}\left(k\right)log\left(\frac{1-\delta {P}_{Y}\left(k\right)}{1-\delta}\right)$$

From the above, we obtain

$$\begin{array}{ccc}\hfill I(Y;Z)& =& \frac{\text{d}}{\text{d}\delta}{I(Y;Z)|}_{\delta =0}\delta +o\left(\delta \right)\hfill \\ & =& -\delta \Psi \left({P}_{Y}\left(k\right)\right)+o\left(\delta \right)\hfill \end{array}$$

Clearly from (A3), in order for the filter ${P}_{Z|Y}$ specified in (A1) and (A2) to belong to ${\mathcal{D}}_{\epsilon}\left({P}_{XY}\right)$, we must have
and hence from (A4), we have

$$\frac{\epsilon}{\delta}={P}_{Y}\left(k\right)D({P}_{X|Y}(\xb7|k)\left|\right|{P}_{X}(\xb7))+\frac{o\left(\delta \right)}{\delta}$$

$$I(Y;Z)=\frac{-\Psi \left({P}_{Y}\left(k\right)\right)}{{P}_{Y}\left(k\right)D({P}_{X|Y}(\xb7|k)\left|\right|{P}_{X}(\xb7))}\epsilon +o\left(\delta \right)$$

This immediately implies that
where we have used the assumption ${g}_{0}(X,Y)=0$ in the first equality.

$${g}_{0}^{\prime}(X;Y)=\underset{\epsilon \downarrow 0}{lim}\frac{{g}_{\epsilon}(X;Y)}{\epsilon}\ge \frac{-\Psi \left({P}_{Y}\left(k\right)\right)}{{P}_{Y}\left(k\right)D({P}_{X|Y}(\xb7|k)\left|\right|{P}_{X}(\xb7))}=\frac{-log\left({P}_{Y}\left(k\right)\right)}{D\left({P}_{X|Y}(\xb7|k)\left|\right|{P}_{X}(\xb7)\right)}$$

## Appendix B. Completion of Proof of Theorem 25

To prove that the equality (39) has only one solution $p=\frac{1}{2}$, we first show the following lemma.

**Lemma 33.**

Let P and Q be two distributions over $\mathcal{X}=\{\pm 1,\pm 2,\cdots ,\pm k\}$ which satisfy $P\left(x\right)=Q(-x)$. Let ${R}_{\lambda}:=\lambda P+(1-\lambda )Q$ for $\lambda \in (0,1)$. Then
for $\lambda \in (0,\frac{1}{2})$ and
for $\lambda \in (\frac{1}{2},1)$.

$$\frac{D\left(P\right|\left|{R}_{1-\lambda}\right)}{D\left(P\right|\left|{R}_{\lambda}\right)}<\frac{log(1-\lambda )}{log\left(\lambda \right)}$$

$$\frac{D\left(P\right|\left|{R}_{1-\lambda}\right)}{D\left(P\right|\left|{R}_{\lambda}\right)}>\frac{log(1-\lambda )}{log\left(\lambda \right)}$$

Note that it is easy to see that the map $\lambda \mapsto D\left(P\right|\left|{R}_{\lambda}\right)$ is convex and strictly decreasing and hence $D\left(P\right||{R}_{\lambda})>D(P\left|\right|{R}_{1-\lambda})$ when $\lambda \in (0,\frac{1}{2})$ and $D\left(P\right||{R}_{\lambda})<D(P\left|\right|{R}_{1-\lambda})$ when $\lambda \in (\frac{1}{2},1)$. Inequality (A6) and (A7) strengthen these monotonic behavior and show that $D\left(P\right||{R}_{\lambda})>\frac{log\left(\lambda \right)}{log(1-\lambda )}D\left(P\right||{R}_{1-\lambda})$ and $D\left(P\right||{R}_{\lambda})<\frac{log\left(\lambda \right)}{log(1-\lambda )}D\left(P\right||{R}_{1-\lambda})$ for $\lambda \in (0,\frac{1}{2})$ and $\lambda \in (\frac{1}{2},1)$, respectively.

**Proof.**

Without loss of generality, we can assume that $P\left(x\right)>0$ for all $x\in \mathcal{X}$. Let ${\mathcal{X}}_{+}:=\{x\in \mathcal{X}|P\left(X\right)>P(-x)\}$, ${\mathcal{X}}_{-}:=\{x\in \mathcal{X}|P\left(X\right)<P(-x)\}$ and ${\mathcal{X}}_{0}:=\{x\in \mathcal{X}|P\left(X\right)=P(-x)\}$. We notice that when $x\in {\mathcal{X}}_{+}$, then $-x\in {\mathcal{X}}_{-}$, and hence $|{\mathcal{X}}_{+}|=|{\mathcal{X}}_{-}|=m$ for a $0<m\le k$. After relabelling if needed, we can therefore assume that ${\mathcal{X}}_{+}=\{1,2,\cdots ,m\}$ and ${\mathcal{X}}_{-}=\{-m,\cdots ,-2,-1\}$. We can write
where $\left(a\right)$ follows from the fact that for $x\in {\mathcal{X}}_{0}$, $log\left(\frac{P\left(x\right)}{{R}_{\lambda}\left(x\right)}\right)=0$ for any $\lambda \in (0,1)$, and in $\left(b\right)$ and $\left(c\right)$ we introduced ${\zeta}_{x}:=\frac{P(-x)}{P\left(x\right)}$ and

$$\begin{array}{ccc}\hfill D\left(P\right|\left|{R}_{\lambda}\right)& =& \sum _{x=-k}^{k}log\left(\frac{P\left(x\right)}{\lambda P\left(x\right)+(1-\lambda )Q\left(x\right)}\right)=\sum _{x=-k}^{k}log\left(\frac{P\left(x\right)}{\lambda P\left(x\right)+(1-\lambda )P(-x)}\right)\hfill \\ & \stackrel{\left(a\right)}{=}& \sum _{x=1}^{m}\left[P\left(x\right)log\left(\frac{P\left(x\right)}{\lambda P\left(x\right)+(1-\lambda )P(-x)}\right)+P(-x)log\left(\frac{P(-x)}{\lambda P(-x)+(1-\lambda )P\left(x\right)}\right)\right]\hfill \\ & \stackrel{\left(b\right)}{=}& \sum _{x=1}^{m}\left[P\left(x\right)log\left(\frac{1}{\lambda +(1-\lambda ){\zeta}_{x}}\right)+P\left(x\right){\zeta}_{x}log\left(\frac{1}{\lambda +\frac{(1-\lambda )}{{\zeta}_{x}}}\right)\right]\hfill \\ & \stackrel{\left(c\right)}{=}& \sum _{x=1}^{m}P\left(x\right)\Xi (\lambda ,{\zeta}_{x})\phantom{\rule{0.166667em}{0ex}}log\left(\frac{1}{\lambda}\right)\hfill \end{array}$$

$$\Xi (\lambda ,\zeta ):=\frac{1}{log\left(\frac{1}{\lambda}\right)}\left(log\left(\frac{1}{\lambda +(1-\lambda )\zeta}\right)+\zeta log\left(\frac{1}{\lambda +\frac{(1-\lambda )}{\zeta}}\right)\right)$$

Similarly, we can write
which implies that

$$\begin{array}{ccc}\hfill D\left(P\right|\left|{R}_{1-\lambda}\right)& =& \sum _{x=-k}^{k}log\left(\frac{P\left(x\right)}{(1-\lambda )P\left(x\right)+\lambda Q\left(x\right)}\right)=\sum _{x=-k}^{k}log\left(\frac{P\left(x\right)}{(1-\lambda )P\left(x\right)+\lambda P(-x)}\right)\hfill \\ & =& \sum _{x=1}^{m}\left[P\left(x\right)log\left(\frac{P\left(x\right)}{(1-\lambda )P\left(x\right)+\lambda P(-x)}\right)+P(-x)log\left(\frac{P(-x)}{(1-\lambda )P(-x)+\lambda P\left(x\right)}\right)\right]\hfill \\ & =& \sum _{x=1}^{m}\left[P\left(x\right)log\left(\frac{1}{1-\lambda +\lambda {\zeta}_{x}}\right)+P\left(x\right){\zeta}_{x}log\left(\frac{1}{1-\lambda +\frac{\lambda}{{\zeta}_{x}}}\right)\right]\hfill \\ & =& \sum _{x=1}^{m}P\left(x\right)\Xi (1-\lambda ,{\zeta}_{x})\phantom{\rule{0.166667em}{0ex}}log\left(\frac{1}{1-\lambda}\right)\hfill \end{array}$$

$$\frac{D\left(P\right|\left|{R}_{\lambda}\right)}{-log\left(\lambda \right)}-\frac{D\left(P\right|\left|{R}_{1-\lambda}\right)}{-log(1-\lambda )}=\sum _{x=1}^{m}P\left(x\right)\left[\Xi (\lambda ,{\zeta}_{x})-\Xi (1-\lambda ,{\zeta}_{x})\right]$$

Hence, in order to show (A6), it suffices to verify that
for any $\lambda \in (0,\frac{1}{2})$ and $\zeta \in (1,\infty )$. Since $log\left(\lambda \right)log(1-\lambda )$ is always positive for $\lambda \in (0,\frac{1}{2})$, it suffices to show that
for $\lambda \in (0,\frac{1}{2})$ and $\zeta \in (1,\infty )$. We have
where
and
We have
because $\lambda \in (0,\frac{1}{2})$ and hence $\lambda <1-\lambda $. This implies that the map $\zeta \mapsto B(\lambda ,\zeta )$ is concave for any $\lambda \in (0,\frac{1}{2})$ and $\zeta \in (1,\infty )$. Moreover, since $\zeta \mapsto B(\lambda ,\zeta )$ is a quadratic polynomial with negative leading coefficient, it is clear that ${lim}_{\zeta \to \infty}B(\lambda ,\zeta )=-\infty $. Consider now $g\left(\lambda \right):=B(\lambda ,1)={\lambda}^{2}log\left(\lambda \right)-{(1-\lambda )}^{2}log(1-\lambda )$. We have ${lim}_{\lambda \to 0}g\left(\lambda \right)=g\left(\frac{1}{2}\right)=0$ and ${g}^{\prime \prime}\left(\lambda \right)=2log\left(\frac{\lambda}{1-\lambda}\right)<0$ for $\lambda \in (0,\frac{1}{2})$. It implies that $\lambda \mapsto g\left(\lambda \right)$ is concave over $(0,\frac{1}{2})$ and hence $g\left(\lambda \right)>0$ over $(0,\frac{1}{2})$ which implies that $B(\lambda ,1)>0$. This together with the fact that $\zeta \mapsto B(\lambda ,\zeta )$ is concave and it approaches to $-\infty $ as $\zeta \to \infty $ imply that there exists a real number $c=c\left(\lambda \right)>1$ such that $B(\lambda ,\zeta )>0$ for all $\zeta \in (1,c)$ and $B(\lambda ,\zeta )<0$ for all $\zeta \in (c,\infty )$. Since $A(\lambda ,\zeta )>0$, it follows from (A10) that $\zeta \mapsto h\left(\zeta \right)$ is convex over $(1,c)$ and concave over $(c,\infty )$. Since $h\left(1\right)={h}^{\prime}\left(1\right)=0$ and ${lim}_{\zeta \to \infty}h\left(\zeta \right)=\infty $, we can conclude that $h\left(\zeta \right)>0$ over $(1,\infty )$. That is, $\Phi (\lambda ,\zeta )>0$ and thus $\Xi (\lambda ,\zeta )-\Xi (1-\lambda ,\zeta )>0$, for $\lambda \in (0,\frac{1}{2})$ and $\zeta \in (1,\infty )$.

$$\Phi (\lambda ,\zeta ):=\Xi (\lambda ,\zeta )-\Xi (1-\lambda ,\zeta )>0$$

$$h\left(\zeta \right):=\Phi (\lambda ,\zeta )log(1-\lambda )log\left(\lambda \right)>0$$

$${h}^{\prime \prime}\left(\zeta \right)=A(\lambda ,\zeta )B(\lambda ,\zeta )$$

$$A(\lambda ,\zeta ):=\frac{1+\zeta}{{(1-\lambda +\lambda \zeta )}^{2}{(\lambda +(1-\lambda )\zeta )}^{2}\zeta}$$

$$B(\lambda ,\zeta ):={\lambda}^{2}(1+\lambda (\lambda -2){(\zeta -1)}^{2}+\zeta (\zeta -1))log\left(\lambda \right)-{(1-\lambda )}^{2}({\lambda}^{2}{(\zeta -1)}^{2}+\zeta )log(1-\lambda ).$$

$$\frac{{\partial}^{2}}{\partial {\zeta}^{2}}B(\lambda ,\zeta )=2{\lambda}^{2}{(1-\lambda )}^{2}log\left(\frac{\lambda}{1-\lambda}\right)<0$$

The inequality (A7) can be proved by (A6) and switching λ to $1-\lambda $. ☐

Letting $P(\xb7)={P}_{X|Y}(\xb7|1)$ and $Q(\xb7)={P}_{X|Y}(\xb7|0)$ and $\lambda =Pr(Y=1)=p$, we have ${R}_{p}\left(x\right)={P}_{X}\left(x\right)=pP\left(x\right)+(1-p)Q\left(x\right)$ and ${R}_{1-p}={P}_{X}(-x)=(1-p)P\left(x\right)+pQ\left(x\right)$. Since $D({P}_{X|Y}(\xb7|0)\left|\right|{P}_{X}(\xb7))=D(P\left|\right|{R}_{1-p})$, we can conclude from Lemma 33 that
over $p\in (0,\frac{1}{2})$ and
over $p\in (\frac{1}{2},1)$, and hence equation (39) has only solution $p=\frac{1}{2}$.

$$\frac{D\left({P}_{X|Y}(\xb7|0)\right|\left|{P}_{X}(\xb7)\right)}{-log(1-p)}<\frac{D\left({P}_{X|Y}(\xb7|1)\right|\left|{P}_{X}(\xb7)\right)}{-log\left(p\right)}$$

$$\frac{D\left({P}_{X|Y}(\xb7|0)\right|\left|{P}_{X}(\xb7)\right)}{-log(1-p)}>\frac{D\left({P}_{X|Y}(\xb7|1)\right|\left|{P}_{X}(\xb7)\right)}{-log\left(p\right)}$$

## Appendix C. Proof of Theorems 28 and 29

The proof of Theorem 29 does not depend on the proof of Theorem 28, so, there is no harm in proving the former theorem first. The following version of the data-processing inequality will be required.

**Lemma 34.**

Let X and Y be absolutely continuous random variables such that X, Y and $(X,Y)$ have finite differential entropies. If V is an absolutely continuous random variable independent of X and Y, then
with equality if and only if X and Y are independent.

$$I(X;Y+V)\le I(X;Y)$$

**Proof.**

Since $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -Y\phantom{\rule{-2.84544pt}{0ex}}-\circ -(Y+V)$, the data processing inequality implies that $I(X;Y+V)\le I(X;Y)$. It therefore suffices to show that this inequality is tight if and only X and Y are independent. It is known that data processing inequality is tight if and only if $X\phantom{\rule{-2.84544pt}{0ex}}-\circ -(Y+V)\phantom{\rule{-2.84544pt}{0ex}}-\circ -Y$. This is equivalent to saying that for any measurable set $A\subset \mathbb{R}$ and for ${P}_{Y+V}$ almost all z, $Pr(X\in A|Y+V=z,Y=y)=Pr(X\in A|Y+V=z)$. On the other hand, due to the independence of V and $(X,Y)$, we have $Pr(X\in A|Y+V=z,Y=y)=Pr(X\in A|Y=z-v)$. Hence, the equality holds if and only if $Pr(X\in A|Y+V=z)=Pr(X\in A|Y=z-v)$ which implies that X and Y must be independent. ☐

**Lemma 35.**

In the notation of Section 5.1, the function $\gamma \mapsto I(Y;{Z}_{\gamma})$ is strictly-decreasing and continuous. Additionally, it satisfies
with equality if and only if Y is Gaussian. In particular, $I(Y;{Z}_{\gamma})\to 0$ as $\gamma \to \infty $.

$$I(Y;{Z}_{\gamma})\le \frac{1}{2}log\left(1+\frac{\mathsf{var}\left(Y\right)}{{\gamma}^{2}}\right)$$

**Proof.**

Recall that, by assumption b), $\mathsf{var}\left(Y\right)$ is finite. The finiteness of the entropy of Y follows from assumption, the corresponding statement for $Y+\gamma N$ follows from a routine application of the entropy power inequality [50, Theorem 17.7.3] and the fact that $\mathsf{var}(Y+\gamma N)=\mathsf{var}\left(Y\right)+{\gamma}^{2}<\infty $, and for $(Y,Y+\gamma N)$ the same conclusion follows by the chain rule for differential entropy. The data processing inequality, as stated in Lemma 34, implies

$$I(Y;{Z}_{\gamma +\delta})\le I(Y;Y+\gamma N)=I(Y;{Z}_{\gamma})$$

Clearly Y and $Y+\gamma N$ are not independent, therefore the inequality is strict and thus $\gamma \mapsto I(Y,{Z}_{\gamma})$ is strictly-decreasing.

Continuity will be studied for $\gamma =0$ and $\gamma >0$ separately. Recall that $h\left(\gamma N\right)=\frac{1}{2}log\left(2\pi e{\gamma}^{2}\right)$. In particular, $\underset{\gamma \to 0}{lim}h\left(\gamma N\right)=-\infty$. The entropy power inequality shows then that $\underset{\gamma \to 0}{lim}I(Y;Y+\gamma N)=\infty$. This coincides with the convention $I(Y;{Z}_{0})=I(Y;Y)=\infty $. For $\gamma >0$, let ${\left({\gamma}_{n}\right)}_{n\ge 1}$ be a sequence of positive numbers such that ${\gamma}_{n}\to \gamma $. Observe that

$$\begin{array}{cc}\hfill I(Y;{Z}_{{\gamma}_{n}})& =h(Y+{\gamma}_{n}N)-h\left({\gamma}_{n}N\right)=h(Y+{\gamma}_{n}N)-\frac{1}{2}log\left(2\pi e{\gamma}_{n}^{2}\right)\hfill \end{array}$$

Since $\underset{n\to \infty}{lim}\frac{1}{2}log\left(2\pi e{\gamma}_{n}^{2}\right)=\frac{1}{2}log\left(2\pi e{\gamma}^{2}\right)$, we only have to show that $h(Y+{\gamma}_{n}N)\to h(Y+\gamma N)$ as $n\to \infty $ to establish the continuity at γ. This, in fact, follows from de Bruijn’s identity (cf., [50, Theorem 17.7.2]).

Since the channel from Y to ${Z}_{\gamma}$ is an additive Gaussian noise channel, we have $I(Y;{Z}_{\gamma})\le \frac{1}{2}log\left(1+\frac{\mathsf{var}\left(Y\right)}{{\gamma}^{2}}\right)$ with equality if and only if Y is Gaussian. The claimed limit as $\gamma \to 0$ is clear. ☐

**Lemma 36.**

The function $\gamma \mapsto I(X;{Z}_{\gamma})$ is strictly-decreasing and continuous. Moreover, $I(X;{Z}_{\gamma})\to 0$ when $\gamma \to \infty $.

**Proof.**

The proof of the strictly-decreasing behavior of $\gamma \mapsto I(X;{Z}_{\gamma})$ is proved as in the previous lemma.

To prove continuity, let $\gamma \ge 0$ be fixed. Let ${\left({\gamma}_{n}\right)}_{n\ge 1}$ be any sequence of positive numbers converging to γ. First suppose that $\gamma >0$. Observe that
for all $n\ge 1$. As shown in Lemma 35, $h(Y+{\gamma}_{n}N)\to h(Y+\gamma N)$ as $n\to \infty $. Therefore, it is enough to show that $h(Y+{\gamma}_{n}N|X)\to h(Y+\gamma N|X)$ as $n\to \infty $. Note that by de Bruijn’s identity, we have $h(Y+{\gamma}_{n}N|X=x)\to h(Y+\gamma N|X=x)$ as $n\to \infty $ for all $x\in \mathbb{R}$. Note also that since
we can write
and hence we can apply dominated convergence theorem to show that $h(Y+{\gamma}_{n}N|X)\to h(Y+\gamma N|X)$ as $n\to \infty $.

$$I(X;{Z}_{{\gamma}_{n}})=h(Y+{\gamma}_{n}N)-h(Y+{\gamma}_{n}N|X)$$

$$h\left({Z}_{{\gamma}_{n}}\right|X=x)\le \frac{1}{2}log\left(2\pi e\mathsf{var}\left({Z}_{{\gamma}_{n}}\right|x)\right)$$

$$h\left({Z}_{{\gamma}_{n}}\right|X)\le \mathbb{E}\left[\frac{1}{2}log\left(2\pi e\mathsf{var}\left({Z}_{{\gamma}_{n}}\right|X)\right)\right]\le \frac{1}{2}log\left(2\pi e\mathbb{E}\left[\mathsf{var}\left({Z}_{{\gamma}_{n}}\right|X)\right]\right)$$

To prove the continuity at $\gamma =0$, we first note that Linder and Zamir [51, Page 2028] showed that $h(Y+{\gamma}_{n}N|X=x)\to h\left(Y\right|X=x)$ as $n\to \infty $, then, as before, by dominated convergence theorem we can show that $h(Y+{\gamma}_{n}N|X)\to h\left(Y\right|X)$. Similarly [51] implies that $h(Y+{\gamma}_{n}N)\to h\left(Y\right)$. This concludes the proof of the continuity of $\gamma \mapsto I(X;{Z}_{\gamma})$.

Furthermore, by the data processing inequality and previous lemma,
and hence we conclude that $\underset{\gamma \to \infty}{lim}I(X;{Z}_{\gamma})=0$. ☐

$$0\le I(X;{Z}_{\gamma})\le I(Y;{Z}_{\gamma})\le \frac{1}{2}log\left(1+\frac{\mathsf{var}\left(Y\right)}{{\gamma}^{2}}\right)$$

**Proof of Theorem 29.**

The nonnegativity of ${g}_{\epsilon}(X;Y)$ follows directly from definition.

By Lemma 36, for every $0<\epsilon \le I(X;Y)$ there exists a unique ${\gamma}_{\epsilon}\in [0,\infty )$ such that $I(X;{Z}_{{\gamma}_{\epsilon}})=\epsilon $, so ${g}_{\epsilon}(X;Y)=I(Y;{Z}_{{\gamma}_{\epsilon}})$. Moreover, $\epsilon \mapsto {\gamma}_{\epsilon}$ is strictly decreasing. Since $\gamma \mapsto I(Y;{Z}_{\gamma})$ is strictly-decreasing, we conclude that $\epsilon \mapsto {g}_{\epsilon}(X;Y)$ is strictly increasing.

The fact that $\epsilon \mapsto {\gamma}_{\epsilon}$ is strictly decreasing, also implies that ${\gamma}_{\epsilon}\to \infty $ as $\epsilon \to 0$. In particular,

$$\underset{\epsilon \to 0}{lim}{g}_{\epsilon}(X;Y)=\underset{\epsilon \to 0}{lim}I(Y;{Z}_{{\gamma}_{\epsilon}})=\underset{{\gamma}_{\epsilon}\to \infty}{lim}I(Y;{Z}_{{\gamma}_{\epsilon}})=\underset{\gamma \to \infty}{lim}I(Y;{Z}_{\gamma})=0$$

By the data processing inequality we have that $I(X;{Z}_{\gamma})\le I(X;Y)$ for all $\gamma \ge 0$, i.e., any filter satisfies the privacy constraint for $\epsilon =I(X;Y)$. Thus, ${g}_{I(X;Y)}(X;Y)\ge I(Y;Y)=\infty$. ☐

In order to prove Theorem 28, we first recall the following theorem by Rényi [52].

**Theorem 37**

([52])
provided that the integral on the right hand side exists.

**.**If U is an absolutely continuous random variable with density ${f}_{U}\left(x\right)$ and if $H(\lfloor U\rfloor )<\infty $, then
$$\underset{n\to \infty}{lim}H\left({n}^{-1}\lfloor nU\rfloor \right)-log\left(n\right)=-{\int}_{\mathbb{R}}{f}_{U}\left(x\right)log{f}_{U}\left(x\right)\mathrm{d}x$$

We will need the following consequence of the previous theorem.

**Lemma 38.**

If U is an absolutely continuous random variable with density ${f}_{U}\left(x\right)$ and if $H(\lfloor U\rfloor )<\infty $, then $H\left({\mathcal{Q}}_{M}\left(U\right)\right)-M\ge H\left({\mathcal{Q}}_{M+1}\left(U\right)\right)-(M+1)$ for all $M\ge 1$ and
provided that the integral on the right hand side exists.

$$\underset{n\to \infty}{lim}H\left({\mathcal{Q}}_{M}\left(U\right)\right)-M=-{\int}_{\mathbb{R}}{f}_{U}\left(x\right)log{f}_{U}\left(x\right)\mathrm{d}x$$

The previous lemma follows from the fact that ${\mathcal{Q}}_{M+1}\left(U\right)$ is constructed by refining the quantization partition for ${\mathcal{Q}}_{M}\left(U\right)$.

**Lemma 39.**

For any $\gamma \ge 0$,

$$\underset{M\to \infty}{lim}I(X;{Z}_{\gamma}^{M})=I(X;{Z}_{\gamma})\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\mathrm{and}\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\phantom{\rule{1.em}{0ex}}\underset{M\to \infty}{lim}I(Y;{Z}_{\gamma}^{M})=I(Y;{Z}_{\gamma})$$

**Proof.**

Observe that

$$\begin{array}{cc}\hfill I(X;{Z}_{\gamma}^{M})& =I(X;{\mathcal{Q}}_{M}(Y+\gamma N))\hfill \\ & =H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right)-H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right|X)\hfill \\ & =[H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right)-M]-{\int}_{\mathbb{R}}{f}_{X}\left(x\right)[H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right|X=x)-M]\mathrm{d}x\hfill \end{array}$$

By the previous lemma, the integrand is decreasing in M, and thus we can take the limit with respect to M inside the integral. Thus,

$$\underset{M\to \infty}{lim}I(X;{Z}_{\gamma}^{M})=h(Y+\gamma N)-h(Y+\gamma N|X)=I(X;{Z}_{\gamma})$$

The proof for $I(Y;{Z}_{\gamma}^{M})$ is analogous. ☐

**Lemma 40.**

Fix $M\in \mathbb{N}$. Assume that ${f}_{Y}\left(y\right)\le C{\left|y\right|}^{-p}$ for some positive constant C and $p>1$. For integer k and $\gamma \ge 0$, let
Then

$${p}_{k,\gamma}:=Pr\left({\mathcal{Q}}_{M}(Y+\gamma N)=\frac{k}{{2}^{M}}\right)$$

$${p}_{k,\gamma}\le \frac{C{2}^{(p-1)M+p}}{{k}^{p}}+{1}_{\{\gamma >0\}}\frac{\gamma {2}^{M+1}}{k\sqrt{2\pi}}{e}^{-{k}^{2}/{2}^{2M+3}{\gamma}^{2}}$$

**Proof.**

The case $\gamma =0$ is trivial, so we assume that $\gamma >0$. For notational simplicity, let ${r}_{a}=\frac{a}{{2}^{M}}$ for all $a\in \mathbb{Z}$. Assume that $k\ge 0$. Observe that

$$\begin{array}{cc}\hfill {p}_{k,\gamma}& ={\int}_{-\infty}^{\infty}{\int}_{-\infty}^{\infty}{f}_{\gamma N}\left(n\right){f}_{Y}\left(y\right){1}_{\left[{r}_{k},{r}_{k+1}\right)}(y+n)\mathrm{d}y\mathrm{d}n\hfill \\ & ={\int}_{-\infty}^{\infty}\frac{{e}^{-{n}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)\mathrm{d}n\hfill \end{array}$$

We will estimate the above integral by breaking it up into two pieces.

First, we consider

$$\begin{array}{c}\hfill \underset{-\infty}{\overset{\frac{{r}_{k}}{2}}{\int}}\frac{{e}^{-{n}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)\mathrm{d}n\end{array}$$

When $n\le \frac{{r}_{k}}{2}$, then ${r}_{k}-n\ge {r}_{k}/2$. By the assumption on the density of Y,
(The previous estimate is the only contribution when $\gamma =0$.) Therefore,

$$\begin{array}{cc}\hfill Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)& \le \frac{C}{{2}^{M}}{\left(\frac{{r}_{k}}{2}\right)}^{-p}\hfill \end{array}$$

$$\begin{array}{cc}\hfill \underset{-\infty}{\overset{\frac{{r}_{k}}{2}}{\int}}\frac{{e}^{-{n}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)\mathrm{d}n& \le \frac{C}{{2}^{M}}{\left(\frac{{r}_{k}}{2}\right)}^{-p}\underset{-\infty}{\overset{\frac{{r}_{k}}{2}}{\int}}\frac{{e}^{-{n}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}\mathrm{d}n\hfill \\ & \le \frac{C{2}^{(p-1)M+p}}{{k}^{p}}\hfill \end{array}$$

Using the trivial bound $Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)\le 1$ and well known estimates for the error function, we obtain that

$$\begin{array}{cc}\hfill \underset{\frac{{r}_{k}}{2}}{\overset{\infty}{\int}}\frac{{e}^{-{n}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}Pr\left(Y\in \left[{r}_{k},{r}_{k+1}\right)-n\right)\mathrm{d}n& <\frac{1}{\sqrt{2\pi}}\frac{2\gamma}{{r}_{k}}{e}^{-{r}_{k}^{2}/8{\gamma}^{2}}\hfill \\ & =\frac{\gamma {2}^{M+1}}{k\sqrt{2\pi}}{e}^{-{k}^{2}/{2}^{2M+3}{\gamma}^{2}}\hfill \end{array}$$

Therefore,
The proof for $k<0$ is completely analogous. ☐

$${p}_{k,\gamma}\le \frac{C{2}^{(p-1)M+p}}{{k}^{p}}+\frac{\gamma {2}^{M+1}}{k\sqrt{2\pi}}{e}^{-{k}^{2}/{2}^{2M+3}{\gamma}^{2}}$$

**Lemma 41.**

Fix $M\in \mathbb{N}$. Assume that ${f}_{Y}\left(y\right)\le C{\left|y\right|}^{-p}$ for some positive constant C and $p>1$. The mapping $\gamma \mapsto H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right)$ is continuous.

**Proof.**

Let ${\left({\gamma}_{n}\right)}_{n\ge 1}$ be a sequence of non-negative real numbers converging to ${\gamma}_{0}$. First, we will prove continuity at ${\gamma}_{0}>0$. Without loss of generality, assume that ${\gamma}_{n}>0$ for all $n\in \mathbb{N}$. Define ${\gamma}_{*}=inf\left\{{\gamma}_{n}\right|n\ge 1\}$ and ${\gamma}^{*}=sup\left\{{\gamma}_{n}\right|n\ge 1\}$. Clearly $0<{\gamma}_{*}\le {\gamma}^{*}<\infty $. Recall that

$${p}_{k,\gamma}={\int}_{\mathbb{R}}\frac{{e}^{-{z}^{2}/2{\gamma}^{2}}}{\sqrt{2\pi {\gamma}^{2}}}Pr\left(Y\in \left[\frac{k}{{2}^{M}},\frac{k+1}{{2}^{M}}\right)-z\right)\mathrm{d}z$$

Since, for all $n\in \mathbb{N}$ and $z\in \mathbb{R}$,
the dominated convergence theorem implies that

$$\begin{array}{cc}\hfill \frac{{e}^{-{z}^{2}/2{\gamma}_{n}^{2}}}{\sqrt{2\pi {\gamma}_{n}^{2}}}Pr\left(Y\in \left[\frac{k}{{2}^{M}},\frac{k+1}{{2}^{M}}\right)-z\right)& \le \frac{{e}^{-{z}^{2}/2{\left({\gamma}^{*}\right)}^{2}}}{\sqrt{2\pi {\gamma}_{*}^{2}}}\hfill \end{array}$$

$$\underset{n\to \infty}{lim}{p}_{k,{\gamma}_{n}}={p}_{k,{\gamma}_{0}}$$

The previous lemma implies that for all $n\ge 0$ and $\left|k\right|>0$,

$${p}_{k,{\gamma}_{n}}\le \frac{C{2}^{(p-1)M+p}}{{k}^{p}}+\frac{{\gamma}_{n}{2}^{M+1}}{k\sqrt{2\pi}}{e}^{-{k}^{2}/{2}^{2M+3}{\gamma}_{n}^{2}}$$

Thus, for k large enough, $p}_{k,{\gamma}_{n}}\le \frac{A}{{k}^{p}$ for a suitable positive constant A that does not depend on n. Since the function $x\mapsto -xlog\left(x\right)$ is increasing in $[0,1/2]$, there exists ${K}^{\prime}>0$ such that for $\left|k\right|>{K}^{\prime}$

$$-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)\le \frac{A}{{k}^{p}}log\left({A}^{-1}{k}^{p}\right)$$

Since $\sum _{\left|k\right|>{K}^{\prime}}\frac{A}{{k}^{p}}log\left({A}^{-1}{k}^{p}\right)<\infty$, for any $\u03f5>0$ there exists ${K}_{\u03f5}$ such that

$$\sum _{\left|k\right|>{K}_{\u03f5}}\frac{A}{{k}^{p}}log\left({A}^{-1}{k}^{p}\right)<\u03f5$$

In particular, for all $n\ge 0$,

$$\begin{array}{cc}\hfill H\left(\mathcal{Q}(Y+{\gamma}_{n}N)\right)-\sum _{\left|k\right|\le {K}_{\u03f5}}-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)& =\sum _{\left|k\right|>{K}_{\u03f5}}-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)<\u03f5\hfill \end{array}$$

Therefore, for all $n\ge 1$,

$$\begin{array}{cc}& \left|H\left(\mathcal{Q}(Y+{\gamma}_{n}N)\right)-H\left(\mathcal{Q}(Y+{\gamma}_{0}N)\right)\right|\hfill \\ & \le \sum _{\left|k\right|>{K}_{\u03f5}}-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)+\left|\sum _{\left|k\right|\le {K}_{\u03f5}}{p}_{k,{\gamma}_{0}}log\left({p}_{k,{\gamma}_{0}}\right)-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)\right|+\sum _{\left|k\right|>{K}_{\u03f5}}-{p}_{k,{\gamma}_{0}}log\left({p}_{k,{\gamma}_{0}}\right)\hfill \\ & \le \u03f5+\left|\sum _{\left|k\right|\le {K}_{\u03f5}}{p}_{k,{\gamma}_{0}}log\left({p}_{k,{\gamma}_{0}}\right)-{p}_{k,{\gamma}_{n}}log\left({p}_{k,{\gamma}_{n}}\right)\right|+\u03f5\hfill \end{array}$$

By continuity of the function $x\mapsto -xlog\left(x\right)$ on $[0,1]$ and equation (A11), we conclude that

$$\underset{n\to \infty}{lim\; sup}\left|H\left(\mathcal{Q}(Y+{\gamma}_{n}N)\right)-H\left(\mathcal{Q}(Y+{\gamma}_{0}N)\right)\right|\le 3\u03f5$$

Since ϵ is arbitrary,
as we wanted to prove.

$$\underset{n\to \infty}{lim}H\left(\mathcal{Q}(Y+{\gamma}_{n}N)\right)=H\left(\mathcal{Q}(Y+{\gamma}_{0}N)\right)$$

To prove continuity at ${\gamma}_{0}=0$, observe that equation (A11) holds in this case as well. The rest is analogous to the case ${\gamma}_{0}>0$. ☐

**Lemma 42.**

The functions $\gamma \mapsto I(X;{Z}_{\gamma}^{M})$ and $\gamma \mapsto I(Y;{Z}_{\gamma}^{M})$ are continuous for each $M\in \mathbb{N}$.

**Proof.**

Since $H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right|Y=y)$ and $H\left({\mathcal{Q}}_{M}(Y+\gamma N)\right|X=x)$ for $x,y\in \mathbb{R}$ are bounded by M, and ${f}_{Y|X}\left(y\right|x)$ satisfies assumption (b), the conclusion follows from the dominated convergence theorem. ☐

**Proof of Theorem 28.**

For every $M\in \mathbb{N}$, let ${\Gamma}_{\u03f5}^{M}:=\{\gamma \ge 0|I(X;{Z}_{\gamma}^{M})\le \u03f5\}$. The Markov chain $X\to Y\to {Z}_{\gamma}\to {Z}_{\gamma}^{M+1}\to {Z}_{\gamma}^{M}$ and the data processing inequality imply that
and, in particular,
where ${\gamma}_{\u03f5}$ is as defined in the proof of Theorem 29. This implies then that
and thus

$$I(X;{Z}_{\gamma})\ge I(X;{Z}_{\gamma}^{M+1})\ge I(X;{Z}_{\gamma}^{M})$$

$$\u03f5=I(X;{Z}_{{\gamma}_{\u03f5}})\ge I(X;{Z}_{{\gamma}_{\u03f5}}^{M+1})\ge I(X;{Z}_{{\gamma}_{\u03f5}}^{M})$$

$${\gamma}_{\u03f5}\in {\Gamma}_{\u03f5}^{M+1}\subset {\Gamma}_{\u03f5}^{M}$$

$$I(Y;{Z}_{{\gamma}_{\u03f5}}^{M})\le {g}_{\u03f5,M}(X;Y)$$

Taking limits in both sides, Lemma 39 implies

$${g}_{\u03f5}(X;Y)=I(Y;{Z}_{{\gamma}_{\u03f5}})\le \underset{M\to \infty}{lim\; inf}{g}_{\u03f5,M}(X;Y)$$

Observe that
where inequality follows from Markovity and ${\gamma}_{\u03f5,min}^{M}:={inf}_{{\Gamma}_{\u03f5}^{M}}\gamma $. By equation (A12), ${\gamma}_{\u03f5}\in {\Gamma}_{\u03f5}^{M+1}\subset {\Gamma}_{\u03f5}^{M}$ and in particular ${\gamma}_{\u03f5,min}^{M}\le {\gamma}_{\u03f5,min}^{M+1}\le {\gamma}_{\u03f5}$. Thus, $\left\{{\gamma}_{\epsilon ,min}^{M}\right\}$ is an increasing sequence in M and bounded from above and, hence, has a limit. Let $\gamma}_{\u03f5,min}=\underset{M\to \infty}{lim}{\gamma}_{\u03f5,min}^{M$. Clearly

$$\begin{array}{cc}\hfill {g}_{\u03f5,M}(X;Y)& =\underset{\gamma \in {\Gamma}_{\u03f5}^{M}}{sup}I(Y;{Z}_{\gamma}^{M})\hfill \\ \hfill & \le \underset{\gamma \in {\Gamma}_{\u03f5}^{M}}{sup}I(Y;{Z}_{\gamma})\hfill \\ \hfill & =I(Y;{Z}_{{\gamma}_{\u03f5,min}^{M}})\hfill \end{array}$$

$${\gamma}_{\u03f5,min}\le {\gamma}_{\u03f5}$$

By the previous lemma we know that $I(X;{Z}_{\gamma}^{M})$ is continuous, so ${\Gamma}_{\u03f5}^{M}$ is closed for all $M\in \mathbb{N}$. Thus, we have that ${\gamma}_{\u03f5,min}^{M}={min}_{{\Gamma}_{\u03f5}^{M}}\gamma $ and in particular ${\gamma}_{\u03f5,min}^{M}\in {\Gamma}_{\u03f5}^{M}$. By the inclusion ${\Gamma}_{\u03f5}^{M+1}\subset {\Gamma}_{\u03f5}^{M}$, we have then that ${\gamma}_{\u03f5,min}^{M+n}\in {\Gamma}_{\u03f5}^{M}$ for all $n\in \mathbb{N}$. By closedness of ${\Gamma}_{\u03f5}^{M}$ we have then that ${\gamma}_{\u03f5,min}\in {\Gamma}_{\u03f5}^{M}$ for all $M\in \mathbb{N}$. In particular,
for all $M\in \mathbb{N}$. By Lemma 39,
and by the monotonicity of $\gamma \mapsto I(X;{Z}_{\gamma})$, we obtain that ${\gamma}_{\u03f5}\le {\gamma}_{\u03f5,min}$. Combining the previous inequality with (A15) we conclude that ${\gamma}_{\u03f5,min}={\gamma}_{\u03f5}$. Taking limits in the inequality (A14)

$$I(X;{Z}_{{\gamma}_{\u03f5,min}}^{M})\le \u03f5$$

$$I(X;{Z}_{{\gamma}_{\u03f5,min}})\le \u03f5=I(X;{Z}_{{\gamma}_{\u03f5}})$$

$$\underset{M\to \infty}{lim\; sup}{g}_{\u03f5,M}(X;Y)\le \underset{M\to \infty}{lim\; sup}I(Y;{Z}_{{\gamma}_{\u03f5,min}^{M}})=I(Y;{Z}_{{\gamma}_{\u03f5,min}})$$

Plugging ${\gamma}_{\u03f5,min}={\gamma}_{\u03f5}$ in above we conclude that
and therefore $\underset{M\to \infty}{lim}{g}_{\u03f5,M}(X;Y)={g}_{\u03f5}(X;Y)$. ☐

$$\underset{M\to \infty}{lim\; sup}{g}_{\u03f5,M}(X;Y)\le I(Y;{Z}_{{\gamma}_{\u03f5}})={g}_{\u03f5}(X;Y)$$

## References

- Asoodeh, S.; Alajaji, F.; Linder, T. Notes on information-theoretic privacy. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–3 October 2014; pp. 1272–1278.
- Asoodeh, S.; Alajaji, F.; Linder, T. On maximal correlation, mutual information and data privacy. In Proceedings of the IEEE 14th Canadian Workshop on Information Theory (CWIT), St. John’s, NL, Canada, 6–9 July 2015; pp. 27–31.
- Warner, S.L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Am. Stat. Assoc.
**1965**, 60, 63–69. [Google Scholar] [CrossRef] [PubMed] - Blum, A.; Ligett, K.; Roth, A. A learning theory approach to non-interactive database privacy. In Proceedings of the Fortieth Annual ACM Symposium on the Theory of Computing, Victoria, BC, Canada, 17–20 May 2008; pp. 1123–1127.
- Dinur, I.; Nissim, K. Revealing information while preserving privacy. In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, San Diego, CA, USA, 9–11 June 2003; pp. 202–210.
- Rubinstein, P.B.; Bartlett, L.; Huang, J.; Taft, N. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. J. Priv. Confid.
**2012**, 4, 65–100. [Google Scholar] - Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Privacy aware learning. 2014; arXiv: 1210.2085. [Google Scholar]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography (TCC’06), New York, NY, USA, 5–7 March 2006; pp. 265–284.
- Dwork, C. Differential privacy: A survey of results. In Theory and Applications of Models of Computation, Proceedings of the 5th International Conference, TAMC 2008, Xi’an, China, 25–29 April 2008; Agrawal, M., Du, D., Duan, Z., Li, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2008. Lecture Notes in Computer Science. Volume 4978, pp. 1–19. [Google Scholar]
- Dwork, C.; Lei, J. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 437–442.
- Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. 2014; arXiv: 1407.1338v2. [Google Scholar]
- Calmon, F.P.; Varia, M.; Médard, M.; Christiansen, M.M.; Duffy, K.R.; Tessaro, S. Bounds on inference. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 567–574.
- Yamamoto, H. A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers. IEEE Trans. Inf. Theory
**1983**, 29, 918–923. [Google Scholar] [CrossRef] - Sankar, L.; Rajagopalan, S.; Poor, H. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur.
**2013**, 8, 838–852. [Google Scholar] [CrossRef] - Tandon, R.; Sankar, L.; Poor, H. Discriminatory lossy source coding: side information privacy. IEEE Trans. Inf. Theory
**2013**, 59, 5665–5677. [Google Scholar] [CrossRef] - Calmon, F.; Fawaz, N. Privacy against statistical inference. In Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 1401–1408.
- Rebollo-Monedero, D.; Forne, J.; Domingo-Ferrer, J. From t-closeness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng.
**2010**, 22, 1623–1636. [Google Scholar] [CrossRef] - Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the IEEE Information Theory Workshop (ITW), Hobart, Australia, 2–5 November 2014; pp. 501–505.
- Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. 2000; arXiv: physics/0004057. [Google Scholar]
- Calmon, F.P.; Makhdoumi, A.; Médard, M. Fundamental limits of perfect privacy. In Proceedings of the IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1796–1800.
- Wyner, A.D. The Wire-Tap Channel. Bell Syst. Tech. J.
**1975**, 54, 1355–1387. [Google Scholar] [CrossRef] - Makhdoumi, A.; Fawaz, N. Privacy-utility tradeoff under statistical uncertainty. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 1627–1634.
- Li, C.T.; El Gamal, A. Maximal correlation secrecy. 2015; arXiv: 1412.5374. [Google Scholar]
- Ahlswede, R.; Gács, P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab.
**1976**, 4, 925–939. [Google Scholar] [CrossRef] - Anantharam, V.; Gohari, A.; Kamath, S.; Nair, C. On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover. 2014; arXiv:1304.6133v1. [Google Scholar]
- Courtade, T. Information masking and amplification: The source coding setting. In Proceedings of the IEEE Int. Symp. Inf. Theory (ISIT), Boston, MA, USA, 1–6 July 2012; pp. 189–193.
- Goldwasser, S.; Micali, S. Probabilistic encryption. J. Comput. Syst. Sci.
**1984**, 28, 270–299. [Google Scholar] [CrossRef] - Rockafellar, R.T. Convex Analysis; Princeton Univerity Press: Princeton, NJ, USA, 1997. [Google Scholar]
- Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Shulman, N.; Feder, M. The uniform distribution as a universal prior. IEEE Trans. Inf. Theory
**2004**, 50, 1356–1362. [Google Scholar] [CrossRef] - Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw Hill: New York, NY, USA, 1987. [Google Scholar]
- Gebelein, H. Das statistische Problem der Korrelation als Variations- und Eigenwert-problem und sein Zusammenhang mit der Ausgleichungsrechnung. Zeitschrift f ur Angewandte Mathematik und Mechanik
**1941**, 21, 364–379. (In German) [Google Scholar] [CrossRef] - Hirschfeld, H.O. A connection between correlation and contingency. Camb. Philos. Soc.
**1935**, 31, 520–524. [Google Scholar] [CrossRef] - Rényi, A. On measures of dependence. Acta Mathematica Academiae Scientiarum Hungarica
**1959**, 10, 441–451. [Google Scholar] [CrossRef] - Linfoot, E.H. An informational measure of correlation. Inf. Control
**1957**, 1, 85–89. [Google Scholar] [CrossRef] - Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica
**1967**, 2, 229–318. [Google Scholar] - Zhao, L. Common Randomness, Efficiency, and Actions. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2011. [Google Scholar]
- Berger, T.; Yeung, R. Multiterminal source encoding with encoder breakdown. IEEE Trans. Inf. Theory
**1989**, 35, 237–244. [Google Scholar] [CrossRef] - Kim, Y.H.; Sutivong, A.; Cover, T. State mplification. IEEE Trans. Inf. Theory
**2008**, 54, 1850–1859. [Google Scholar] [CrossRef] - Merhav, N.; Shamai, S. Information rates subject to state masking. IEEE Trans. Inf. Theory
**2007**, 53, 2254–2261. [Google Scholar] [CrossRef] - Ahlswede, R.; Körner, J. Source coding with side information and a converse for degraded broadcast channels. IEEE Trans. Inf. Theory
**1975**, 21, 629–637. [Google Scholar] [CrossRef] - Kim, Y.H.; El Gamal, A. Network Information Theory; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Asoodeh, S.; Alajaji, F.; Linder, T. Lossless secure source coding, Yamamoto’s setting. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–2 October 2015.
- Raginsky, M. Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels. In Proceedings of the IEEE Int. Sym. Inf. Theory (ISIT), Istanbul, Turkey, 7–12 July 2013; pp. 419–423.
- Geng, Y.; Nair, C.; Shamai, S.; Wang, Z.V. On broadcast channels with binary inputs and symmetric outputs. IEEE Trans. Inf. Theory
**2013**, 59, 6980–6989. [Google Scholar] [CrossRef] - Sutskover, I.; Shamai, S.; Ziv, J. Extremes of information combining. IEEE Trans. Inf. Theory
**2005**, 51, 1313–1325. [Google Scholar] [CrossRef] - Alajaji, F.; Chen, P.N. Information Theory for Single User Systems, Part I. Course Notes, Queen’s University. Available online: http://www.mast.queensu.ca/math474/it-lecture-notes.pdf (accessed on 4 March 2015).
- Chayat, N.; Shamai, S. Extension of an entropy property for binary input memoryless symmetric channels. IEEE Trans.Inf. Theory
**1989**, 35, 1077–1079. [Google Scholar] [CrossRef] - Oohama, Y. Gaussian multiterminal source coding. IEEE Trans. Inf. Theory
**1997**, 43, 2254–2261. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Linder, T.; Zamir, R. On the asymptotic tightness of the Shannon lower bound. IEEE Trans. Inf. Theory
**2008**, 40, 2026–2031. [Google Scholar] [CrossRef] - Rényi, A. On the dimension and entropy of probability distributions. cta Mathematica Academiae Scientiarum Hungarica
**1959**, 10, 193–215. [Google Scholar] [CrossRef]

**Figure 2.**Privacy filter that achieves the lower bound in (4) where ${Z}_{\delta}$ is the output of an erasure privacy filter with erasure probability specified in (5).

**Figure 4.**Optimal privacy filter for ${P}_{Y|X}=BSC\left(\alpha \right)$ with uniform X where $\delta (\epsilon ,\alpha )$ is specified in (40).

**Figure 5.**Optimal privacy filter for ${P}_{Y|X}=BEC\left(\delta \right)$ where $\delta (\epsilon ,\alpha )$ is specified in (42).

**Figure 6.**The privacy filter associated with (A1) and (A2) with $k=1$. We have ${P}_{Z|Y}(\xb7|1)=\mathsf{Bernoulli}\left(\delta \right)$ and ${P}_{Z|Y}(\xb7|y)=\mathsf{Bernoulli}\left(0\right)$ for $y\in \{2,3,\cdots ,n\}$.

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).