Information Extraction Under Privacy Constraints

Asoodeh, Shahab; Diaz, Mario; Alajaji, Fady; Linder, Tamás

doi:10.3390/info7010015

Open AccessArticle

Information Extraction Under Privacy Constraints^†

by

Shahab Asoodeh

^*,

Mario Diaz

,

Fady Alajaji

and

Tamás Linder

Department of Mathematics and Statistics, Queen’s University, Kingston, Canada

^*

Author to whom correspondence should be addressed.

^†

Parts of the results in this paper were presented at the 52nd Allerton Conference on Communications, Control and Computing [1] and the 14th Canadian Workshop on Information Theory [2].

Information 2016, 7(1), 15; https://doi.org/10.3390/info7010015

Submission received: 1 November 2015 / Revised: 24 February 2016 / Accepted: 3 March 2016 / Published: 10 March 2016

(This article belongs to the Special Issue Communication Theory)

Download

Browse Figures

Versions Notes

Abstract

:

A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables

(X, Y)

governed by a given joint distribution, an agent observes Y and wants to convey to a potentially public user as much information about Y as possible while limiting the amount of information revealed about X. To this end, the so-called rate-privacy function is investigated to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from Y under a privacy constraint between X and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and its information-theoretic and estimation-theoretic interpretations are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of

(X, Y)

. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where

(X, Y)

has a joint probability density function by studying the problem where the extracted information is a uniform quantization of Y corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.

Keywords:

data privacy; equivocation; rate-privacy function; information theory; minimum mean-squared error estimation; additive channels; mutual information; maximal correlation

1. Introduction

With the emergence of user-customized services, there is an increasing desire to balance between the need to share data and the need to protect sensitive and private information. For example, individuals who join a social network are asked to provide information about themselves which might compromise their privacy. However, they agree to do so, to some extent, in order to benefit from the customized services such as recommendations and personalized searches. As another example, a participatory technology for estimating road traffic requires each individual to provide her start and destination points as well as the travel time. However, most participating individuals prefer to provide somewhat distorted or false information to protect their privacy. Furthermore, suppose a software company wants to gather statistical information on how people use its software. Since many users might have used the software to handle some personal or sensitive information -for example, a browser for anonymous web surfing or a financial management software- they may not want to share their data with the company. On the other hand, the company cannot legally collect the raw data either, so it needs to entice its users. In all these situations, a tradeoff in a conflict between utility advantage and privacy breach is required and the question is how to achieve this tradeoff. For example, how can a company collect high-quality aggregate information about users while strongly guaranteeing to its users that it is not storing user-specific information?

To deal with such privacy considerations, Warner [3] proposed the randomized response model in which each individual user randomizes her own data using a local randomizer (i.e., a noisy channel) before sharing the data to an untrusted data collector to be aggregated. As opposed to conditional security, see, e.g., [4,5,6], the randomized response model assumes that the adversary can have unlimited computational power and thus it provides unconditional privacy. This model, in which the control of private data remains in the users’ hands, has been extensively studied since Warner. As a special case of the randomized response model, Duchi et al. [7], inspired by the well-known privacy guarantee called differential privacy introduced by Dwork et al. [8,9,10], introduced locally differential privacy (LDP). Given a random variable

X \in X

, another random variable

Z \in Z

is said to be the ε-LDP version of X if there exists a channel

Q : X \to Z

such that

\frac{Q (B | x)}{Q (B | x^{'})} \leq exp (ε)

for all measurable

B \subset Z

and all

x, x^{'} \in X

. The channel Q is then called as the ε-LDP mechanism. Using Jensen’s inequality, it is straightforward to see that any ε-LDP mechanism leaks at most ε bits of private information, i.e., the mutual information between X and Z satisfies

I (X, Z) \leq ε

.

There have been numerous studies on the tradeoff between privacy and utility for different examples of randomized response models with different choices of utility and privacy measures. For instance, Duchi et al. [7] studied the optimal ε-LDP mechanism

M : X \to Z

which minimizes the risk of estimation of a parameter θ related to

P_{X}

. Kairouz et al. [11] studied an optimal ε-LDP mechanism in the sense of mutual information, where an individual would like to release an ε-LDP version Z of X that preserves as much information about X as possible. Calmon et al. [12] proposed a novel privacy measure (which includes maximal correlation and chi-square correlation) between X and Z and studied the optimal privacy mechanism (according to their privacy measure) which minimizes the error probability

Pr (\hat{X} (Z) \neq X)

for any estimator

\hat{X} : Z \to X

.

In all above examples of randomized response models, given a private source, denoted by X, the mechanism generates Z which can be publicly displayed without breaching the desired privacy level. However, in a more realistic model of privacy, we can assume that for any given private data X, nature generates Y, via a fixed channel

P_{Y | X}

. Now we aim to release a public display Z of Y such that the amount of information in Y is preserved as much as possible while Z satisfies a privacy constraint with respect to X. Consider two communicating agents Alice and Bob. Alice collects all her measurements from an observation into a random variable Y and ultimately wants to reveal this information to Bob in order to receive a payoff. However, she is worried about her private data, represented by X, which is correlated with Y. For instance, X might represent her precise location and Y represents measurement of traffic load of a route she has taken. She wants to reveal these measurements to an online road monitoring system to received some utility. However, she does not want to reveal too much information about her exact location. In such situations, the utility is measured with respect to Y and privacy is measured with respect to X. The question raised in this situation then concerns the maximum payoff Alice can get from Bob (by revealing Z to him) without compromising her privacy. Hence, it is of interest to characterize such competing objectives in the form of a quantitative tradeoff. Such a characterization provides a controllable balance between utility and privacy.

This model of privacy first appears in Yamamoto’s work [13] in which the rate-distortion-equivocation function is defined as the tradeoff between a distortion-based utility and privacy. Recently, Sankar et al. [14], using the quantize-and-bin scheme [15], generalized Yamamoto’s model to study privacy in databases from an information-theoretic point of view. Calmon and Fawaz [16] and Monedero et al. [17] also independently used distortion and mutual information for utility and privacy, respectively, to define a privacy-distortion function which resembles the classical rate-distortion function. More recently, Makhdoumi et al. [18] proposed to use mutual information for both utility and privacy measures and defined the privacy funnel as the corresponding privacy-utility tradeoff, given by

t_{R} (X; Y) : = min_{\begin{matrix} P_{Z | Y} : X -\circ- Y -\circ- Z \\ I (Y; Z) \geq R \end{matrix}} I (X; Z)

(1)

where

X -\circ- Y -\circ- Z

denotes that

X, Y

and Z form a Markov chain in this order. Leveraging well-known algorithms for the information bottleneck problem [19], they provided a locally optimal greedy algorithm to evaluate

t_{R} (X; Y)

. Asoodeh et al. [1], independently, defined the rate-privacy function,

g_{ε} (X; Y)

, as the maximum achievable

I (Y; Z)

such that Z satisfies

I (X; Z) \leq ε

, which is a dual representation of the privacy funnel (1), and showed that for discrete X and Y,

g_{0} (X; Y) > 0

if and only if X is weakly independent of Y (cf, Definition 9). Recently, Calmon et al. [20] proved an equivalent result for

t_{R} (X; Y)

using a different approach. They also obtained lower and upper bounds for

t_{R} (X; Y)

which can be easily translated to bounds for

g_{ε} (X; Y)

(cf. Lemma 1). In this paper, we develop further properties of

g_{ε} (X; Y)

and also determine necessary and sufficient conditions on

P_{X Y}

, satisfying some symmetry conditions, for

g_{ε} (X; Y)

to achieve its upper and lower bounds.

The problem treated in this paper can also be contrasted with the better-studied concept of secrecy following the pioneering work of Wyner [21]. While in secrecy problems the aim is to keep information secret only from wiretappers, in privacy problems the aim is to keep the private information (not necessarily all the information) secret from everyone including the intended receiver.

1.1. Our Model and Main Contributions

Using mutual information as measure of both utility and privacy, we formulate the corresponding privacy-utility tradeoff for discrete random variables X and Y via the rate-privacy function,

g_{ε} (X; Y)

, in which the mutual information between Y and displayed data (i.e., the mechanism’s output), Z, is maximized over all channels

P_{Z | Y}

such that the mutual information between Z and X is no larger than a given ε. We also formulate a similar rate-privacy function

{\hat{g}}_{ε} (X; Y)

where the privacy is measured in terms of the squared maximal correlation,

ρ_{m}^{2}

, between, X and Z. In studying

g_{ε} (X; Y)

and

{\hat{g}}_{ε} (X; Y)

, any channel

Q : Y \to Z

that satisfies

I (X; Z) \leq ε

and

ρ_{m}^{2} (X; Z) \leq ε

, preserves the desired level of privacy and is hence called a privacy filter. Interpreting

I (Y; Z)

as the number of bits that a privacy filter can reveal about Y without compromising privacy, we present the rate-privacy function as a formulation of the problem of maximal privacy-constrained information extraction from Y.

We remark that using maximal correlation as a privacy measure is by no means new as it appears in other works, see, e.g., [22,23] and [12] for different utility functions. We do not put any likelihood constraints on the privacy filters as opposed to the definition of LDP. In fact, the optimal privacy filters that we obtain in this work induce channels

P_{Z | X}

that do not satisfy the LDP property.

The quantity

g_{ε} (X; Y)

is related to a notion of the reverse strong data processing inequality as follows. Given a joint distribution

P_{X Y}

, the strong data processing coefficient was introduced in [24,25], as the smallest

s (X; Y) \leq 1

such that

I (X; Z) \leq s (X; Y) I (Y; Z)

for all

P_{Z | Y}

satisfying the Markov condition

X - -\circ- Y -\circ- Z

. In the rate-privacy function, we instead seek an upper bound on the maximum achievable rate at which Y can display information,

I (Y; Z)

, while meeting the privacy constraint

I (X; Z) \leq ε

. The connection between the rate-privacy function and the strong data processing inequality is further studied in [20] to mirror all the results of [25] in the context of privacy.

The contributions of this work are as follows:

We study lower and upper bounds of $g_{ε} (X; Y)$ . The lower bound, in particular, establishes a multiplicative bound on $I (Y; Z)$ for any optimal privacy filter. Specifically, we show that for a given $(X, Y)$ and $ε > 0$ there exists a channel $Q : Y \to Z$ such that $I (X; Z) \leq ε$ and

$I (Y; Z) \geq λ (X; Y) ε$

(2)

where $λ (X; Y) \geq 1$ is a constant depending on the joint distribution $P_{X Y}$ . We then give conditions on $P_{X Y}$ such that the upper and lower bounds are tight. For example, we show that the lower bound is achieved when Y is binary and the channel from Y to X is symmetric. We show that this corresponds to the fact that both $Y = 0$ and $Y = 1$ induce distributions $P_{X | Y} (\cdot | 0)$ and $P_{X | Y} (\cdot | 1)$ which are equidistant from $P_{X}$ in the sense of Kullback-Leibler divergence. We then show that the upper bound is achieved when Y is an erased version of X, or equivalently, $P_{Y | X}$ is an erasure channel.
We propose an information-theoretic setting in which $g_{ε} (X; Y)$ appears as a natural upper-bound for the achievable rate in the so-called "dependence dilution" coding problem. Specifically, we examine the joint-encoder version of an amplification-masking tradeoff, a setting recently introduced by Courtade [26] and we show that the dual of $g_{ε} (X; Y)$ upper bounds the masking rate. We also present an estimation-theoretic motivation for the privacy measure $ρ_{m}^{2} (X; Z) \leq ε$ . In fact, by imposing $ρ_{m}^{2} (X; Y) \leq ε$ , we require that an adversary who observes Z cannot efficiently estimate $f (X)$ , for any function f. This is reminiscent of semantic security [27] in the cryptography community. An encryption mechanism is said to be semantically secure if the adversary’s advantage for correctly guessing any function of the privata data given an observation of the mechanism’s output (i.e., the ciphertext) is required to be negligible. This, in fact, justifies the use of maximal correlation as a measure of privacy. The use of mutual information as privacy measure can also be justified using Fano’s inequality. Note that $I (X; Z) \leq ε$ can be shown to imply that $Pr (\hat{X} (Z) \neq X) \geq \frac{H (X) - 1 - ε}{log (| X |)}$ and hence the probability of adversary correctly guessing X is lower-bounded.
We also study the rate of increase $g_{0}^{'} (X; Y)$ of $g_{ε} (X; Y)$ at $ε = 0$ and show that this rate can characterize the behavior of $g_{ε} (X; Y)$ for any $ε \geq 0$ provided that $g_{0} (X; Y) = 0$ . This again has connections with the results of [25]. Letting

$Γ (R) : = max_{\binom{P_{Z | Y} : X -\circ- Y -\circ- Z}{I (Y; Z) \leq R}} I (X; Z)$

one can easily show that $Γ^{'} (0) = {lim}_{R \to 0} \frac{Γ (R)}{R} = s (X; Y),$ and hence the rate of increase of $Γ (R)$ at $R = 0$ characterizes the strong data processing coefficient. Note that here we have $Γ (0) = 0$ .
Finally, we generalize the rate-privacy function to the continuous case where X and Y are both continuous and show that some of the properties of $g_{ε} (X; Y)$ in the discrete case do not carry over to the continuous case. In particular, we assume that the privacy filter belongs to a family of additive noise channels followed by an M-level uniform scalar quantizer and give asymptotic bounds as $M \to \infty$ for the rate-privacy function.

1.2. Organization

The rest of the paper is organized as follows. In Section 2, we define and study the rate-privacy function for discrete random variables for two different privacy measures, which, respectively, lead to the information-theoretic and estimation-theoretic interpretations of the rate-privacy function. In Section 3, we provide such interpretations for the rate-privacy function in terms of quantities from information and estimation theory. Having obtained lower and upper bounds of the rate-privacy function, in Section 4 we determine the conditions on

P_{X Y}

such that these bounds are tight. The rate-privacy function is then generalized and studied in Section 5 for continuous random variables.

2. Utility-Privacy Measures: Definitions and Properties

Consider two random variables X and Y, defined over finite alphabets

X

and

Y

, respectively, with a fixed joint distribution

P_{X Y}

. Let X represent the private data and let Y be the observable data, correlated with X and generated by the channel

P_{Y | X}

predefined by nature, which we call the observation channel. Suppose there exists a channel

P_{Z | Y}

such that Z, the displayed data made available to public users, has limited dependence with X. Such a channel is called the privacy filter. This setup is shown in Figure 1. The objective is then to find a privacy filter which gives rise to the highest dependence between Y and Z. To make this goal precise, one needs to specify a measure for both utility (dependence between Y and Z) and also privacy (dependence between X and Z).

2.1. Mutual Information as Privacy Measure

Adopting mutual information as a measure of both privacy and utility, we are interested in characterizing the following quantity, which we call the rate-privacy function (since mutual information is adopted for utility, the privacy-utility tradeoff characterizes the optimal rate for a given privacy level, where rate indicates the precision of the displayed data Z with respect to the observable data Y for a privacy filter, which suggests the name),

g_{ε} (X; Y) : = sup_{P_{Z | Y} \in D_{ε} (P)} I (Y; Z)

(3)

where

(X, Y)

has fixed distribution

P_{X Y} = P

and

D_{ε} (P) : = {P_{Z | Y} : X -\circ- Y -\circ- Z, I (X; Z) \leq ε}

(here

X -\circ- Y -\circ- Z

means that

X, Y,

and Z form a Markov chain in this order). Equivalently, we call

g_{ε} (X; Y)

the privacy-constrained information extraction function, as Z can be thought of as the extracted information from Y under privacy constraint

I (X; Z) \leq ε

.

Note that since

I (Y; Z)

is a convex function of

P_{Z | Y}

and furthermore the constraint set

D_{ε} (P)

is convex, [28, Theorem 32.2] implies that we can restrict

D_{ε} (P)

in (3) to

{P_{Z | Y} : X -\circ- Y -\circ- Z, I (X; Z) = ε}

whenever

ε \leq I (X; Y)

. Note also that since for finite

X

and

Y

,

P_{Z | Y} \to I (Y; Z)

is a continuous map, therefore

D_{ε} (P)

is compact and the supremum in (3) is indeed a maximum. In this case, using the Support Lemma [29], one can readily show that it suffices that the random variable Z is supported on an alphabet

Z

with cardinality

| Z | \leq | Y | + 1

. Note further that by the Markov condition

X -\circ- Y -\circ- Z

, we can always restrict

ε \geq 0

to only

0 \leq ε < I (X; Y)

, because

I (X; Z) \leq I (X; Y)

and hence for

ε \geq I (X; Y)

the privacy constraint is removed and thus by setting

Z = Y

, we obtain

g_{ε} (X; Y) = H (Y)

.

As mentioned earlier, a dual representation of

g_{ε} (X; Y)

, the so called privacy funnel, is introduced in [18,20], defined in (1), as the least information leakage about X such that the communication rate is greater than a positive constant;

I (Y; Z) \geq R

for some

R > 0

. Note that if

t_{R} (X; Y) = ε

then

g_{ε} (X; Y) = R

.

Given

ε_{1} < ε_{2}

and a joint distribution

P = P_{X} \times P_{Y | X}

, we have

D_{ε_{1}} (P) \subset D_{ε_{2}} (P)

and hence

ε \to g_{ε} (X; Y)

is non-decreasing, i.e.,

g_{ε_{1}} (X; Y) \leq g_{ε_{2}} (X; Y)

. Using a similar technique as in [30, Lemma 1], Calmon et al. [20] showed that the mapping

R \mapsto \frac{t_{R} (X; Y)}{R}

is non-decreasing for

R > 0

. This, in fact, implies that

ε \mapsto \frac{g_{ε} (X; Y)}{ε}

is non-increasing for

ε > 0

. This observation leads to a lower bound for the rate privacy function

g_{ε} (X; Y)

as described in the following lemma.

Lemma 1

([20]). For a given joint distribution P defined over

X \times Y

, the mapping

ε \mapsto \frac{g_{ε} (X; Y)}{ε}

is non-increasing on

ε \in (0, \infty)

and

g_{ε} (X; Y)

lies between two straight lines as follows:

ε \frac{H (Y)}{I (X; Y)} \leq g_{ε} (X; Y) \leq H (Y | X) + ε

(4)

for

ε \in (0, I (X; Y))

.

Using a simple calculation, the lower bound in (4) can be shown to be achieved by the privacy filter depicted in Figure 2 with the erasure probability

δ = 1 - \frac{ε}{I (X; Y)}

(5)

In light of Lemma 1, the possible range of the map

ε \mapsto g_{ε} (X; Y)

is as depicted in Figure 3.

We next show that

ε \mapsto g_{ε} (X; Y)

is concave and continuous.

Lemma 2.

For any given pair of random variables

(X, Y)

over

X \times Y

, the mapping

ε \mapsto g_{ε} (X; Y)

is concave for

ε \geq 0

.

Proof.

It suffices to show that for any

0 \leq ε_{1} < ε_{2} < ε_{3} \leq I (X; Y)

, we have

\frac{g_{ε_{3}} (X; Y) - g_{ε_{1}} (X; Y)}{ε_{3} - ε_{1}} \leq \frac{g_{ε_{2}} (X; Y) - g_{ε_{1}} (X; Y)}{ε_{2} - ε_{1}}

(6)

which, in turn, is equivalent to

(\frac{ε_{2} - ε_{1}}{ε_{3} - ε_{1}}) g_{ε_{3}} (X; Y) + (\frac{ε_{3} - ε_{2}}{ε_{3} - ε_{1}}) g_{ε_{1}} (X; Y) \leq g_{ε_{2}} (X; Y)

(7)

Let

P_{Z_{1} | Y} : Y \to Z_{1}

and

P_{Z_{3} | Y} : Y \to Z_{3}

be two optimal privacy filters in

D_{ε_{1}} (P)

and

D_{ε_{3}} (P)

with disjoint output alphabets

Z_{1}

and

Z_{3}

, respectively.

We introduce an auxiliary binary random variable

U \sim Bernoulli (λ)

, independent of

(X, Y)

, where

λ : = \frac{ε_{2} - ε_{1}}{ε_{3} - ε_{1}}

and define the following random privacy filter

P_{Z_{λ} | Y}

: We pick

P_{Z_{3} | Y}

if

U = 1

and

P_{Z_{1} | Y}

if

U = 0

, and let

Z_{λ}

be the output of this random channel which takes values in

Z_{1} \cup Z_{3}

. Note that

(X, Y) -\circ- Z -\circ- U

. Then we have

\begin{matrix} I (X; Z_{λ}) & = & I (X; Z_{λ}, U) = I (X; Z_{λ} | U) = λ I (X; Z_{3}) + (1 - λ) I (X; Z_{1}), \\ \leq & ε_{2} \end{matrix}

which implies that

P_{Z_{λ} | Y} \in D_{ε_{2}} (P)

. On the other hand, we have

\begin{matrix} g_{ε_{2}} (X; Y) \geq I (Y; Z_{λ}) & = & I (Y; Z_{λ}, U) = I (Y; Z_{λ} | U) = λ I (Y; Z_{3}) + (1 - λ) I (Y; Z_{1}) \\ = & (\frac{ε_{2} - ε_{1}}{ε_{3} - ε_{1}}) g_{ε_{3}} (X; Y) + (\frac{ε_{3} - ε_{2}}{ε_{3} - ε_{1}}) g_{ε_{1}} (X; Y) \end{matrix}

which, according to (7), completes the proof. ☐

Remark 1.

By the concavity of

ε \mapsto g_{ε} (X; Y)

, we can show that

g_{ε} (X; Y)

is a strictly increasing function of

ε \leq I (X; Y)

. To see this, assume there exists

ε_{1} < ε_{2} \leq I (X; Y)

such that

g_{ε_{1}} (X; Y) = g_{ε_{2}} (X; Y)

. Since

ε \mapsto g_{ε} (X; Y)

is concave, then it follows that for all

ε \geq ε_{2}

,

g_{ε} (X; Y) = g_{ε_{2}} (X; Y)

and since for

ε = I (X; Y)

,

g_{I (X; Y)} (X; Y) = H (Y)

, implying that for any

ε \geq ε_{2}

, we must have

g_{ε} (X; Y) = H (Y)

which contradicts the upper bound shown in (4).

Corollary 3.

For any given pair of random variables

(X, Y)

over

X \times Y

, the mapping

ε \mapsto g_{ε} (X; Y)

is continuous for

ε \geq 0

.

Proof.

Concavity directly implies that the mapping

ε \mapsto g_{ε} (X; Y)

is continuous on

(0, \infty)

(see for example [31, Theorem 3.2]). Continuity at zero follows from the continuity of mutual information. ☐

Remark 2.

Using the concavity of the map

ε \mapsto g_{ε} (X; Y)

, we can provide an alternative proof for the lower bound in (4). Note that point

(I (X; Y), H (Y))

is always on the curve

g_{ε} (X; Y)

, and hence by concavity, the straight line

ε \mapsto ε \frac{H (Y)}{I (X; Y)}

is always below the lower convex envelop of

g_{ε} (X; Y)

, i.e., the chord connecting

(0, g_{0} (X; Y))

to

(I (X; Y), H (Y))

, and hence

g_{ε} (X; Y) \geq ε \frac{H (Y)}{I (X; Y)}

. In fact, this chord yields a better lower bound for

g_{ε} (X; Y)

on

ε \in [0, I (X; Y]

as

g_{ε} (X; Y) \geq ε \frac{H (Y)}{I (X; Y)} + g_{0} (X; Y) [1 - \frac{ε}{I (X; Y)}]

(8)

which reduces to the lower bound in (4) only if

g_{0} (X; Y) = 0

.

2.2. Maximal Correlation as Privacy Measure

By adopting the mutual information as the privacy measure between the private and the displayed data, we make sure that only limited bits of private information is revealed during the process of transferring Y. In order to have an estimation theoretic guarantee of privacy, we propose alternatively to measure privacy using a measure of correlation, the so-called maximal correlation.

Given the collection

C

of all pairs of random variables

(U, V) \in U \times V

where

U

and

V

are general alphabets, a mapping

T : C \to [0, 1]

defines a measure of correlation [32] if

T (U, V) = 0

if and only if U and V are independent (in short,

U ⊥ ⊥ V

) and

T (U, V)

attains its maximum value if

X = f (Y)

or

Y = g (X)

almost surely for some measurable real-valued functions f and g. There are many different examples of measures of correlation including the Hirschfeld-Gebelein-Rényi maximal correlation [32,33,34], the information measure [35], mutual information and f-divergence [36].

Definition 4

([34]). Given random variables X and Y, the maximal correlation

ρ_{m} (X; Y)

is defined as follows (recall that the correlation coefficient between U and V, is defined as

ρ (U; V) : = \frac{cov (U; V)}{σ_{U} σ_{V}}

, where

cov (U; V), σ_{U}

and

σ_{V}

are the covariance between U and V, the standard deviations of U and V, respectively):

ρ_{m} (X; Y) : = sup_{f, g} ρ (f (X), g (Y)) = sup_{(f (X), g (Y)) \in S} E [f (X) g (Y)]

where

S

is the collection of pairs of real-valued random variables

f (X)

and

g (Y)

such that

E f (X) = E g (Y) = 0

and

E f^{2} (X) = E g^{2} (Y) = 1

. If

S

is empty (which happens precisely when at least one of X and Y is constant almost surely) then one defines

ρ_{m} (X; Y)

to be 0. Rényi [34] derived an equivalent characterization of maximal correlation as follows:

ρ_{m}^{2} (X; Y) = sup_{f : E f (X) = 0, E f^{2} (X) = 1} E [E^{2} [f (X) | Y]] .

(9)

Measuring privacy in terms of maximal correlation, we propose

{\hat{g}}_{ε} (X; Y) : = sup_{P_{Z | Y} \in {\hat{D}}_{ε} (P)} I (Y; Z)

as the corresponding rate-privacy tradeoff, where

{\hat{D}}_{ε} (P) : = {P_{Z | Y} : X -\circ- Y -\circ- Z, ρ_{m}^{2} (X; Z) \leq ε, P_{X Y} = P}

Again, we equivalently call

{\hat{g}}_{ε} (X; Y)

as the privacy-constrained information extraction function, where here the privacy is guaranteed by

ρ_{m}^{2} (X; Z) \leq ε

.

Setting

ε = 0

corresponds to the case where X and Z are required to be statistically independent, i.e., absolutely no information leakage about the private source X is allowed. This is called perfect privacy. Since the independence of X and Z is equivalent to

I (X; Z) = ρ_{m} (X; Z) = 0

, we have

{\hat{g}}_{0} (X; Y) = g_{0} (X; Y)

. However, for

ε > 0

, both

g_{ε} (X; Y) \leq {\hat{g}}_{ε} (X; Y)

and

g_{ε} (X; Y) \geq {\hat{g}}_{ε} (X; Y)

might happen in general. For general

ε \geq 0

, it directly follows using [23, Proposition 1] that

{\hat{g}}_{ε} (X; Y) \leq g_{ε^{'}} (X; Y)

where

ε^{'} : = log (k ε + 1)

and

k : = | X | - 1

Similar to

g_{ε} (X; Y)

, we see that for

ε_{1} \leq ε_{2}

,

{\hat{D}}_{ε_{1}} (P) \subset {\hat{D}}_{ε_{2}} (P)

and hence

ε \to {\hat{g}}_{ε} (X; Y)

is non-decreasing. The following lemma is a counterpart of Lemma 1 for

{\hat{g}}_{ε} (X; Y)

.

Lemma 5.

For a given joint distribution

P_{X Y}

defined over

X \times Y

,

ε \mapsto \frac{{\hat{g}}_{ε} (X; Y)}{ε}

is non-increasing on

(0, \infty)

.

Proof.

Like Lemma 1, the proof is similar to the proof of [30, Lemma 1]. We, however, give a brief proof for the sake of completeness.

For a given channel

P_{Z | Y} \in {\hat{D}}_{ε} (P)

and

δ \geq 0

, we can define a new channel with an additional symbol e as follows

P_{Z^{'} | Y} (z^{'} | y) = \{\begin{matrix} (1 - δ) P_{Z | Y} (z^{'} | y) & if z^{'} \neq e \\ δ & if z^{'} = e \end{matrix}

(10)

It is easy to check that

I (Y; Z^{'}) = (1 - δ) I (Y; Z)

and also

ρ_{m}^{2} (X; Z^{'}) = (1 - δ) ρ_{m}^{2} (X; Z)

; see [37, Page 8], which implies that

P_{Z^{'} | Y} \in {\hat{D}}_{ε^{'}} (P)

where

ε^{'} = (1 - δ) ε

. Now suppose that

P_{Z | Y}

achieves

{\hat{g}}_{ε} (X; Y)

, that is,

{\hat{g}}_{ε} (X; Y) = I (Y; Z)

and

ρ_{m}^{2} (X; Z) = ε

. We can then write

\frac{{\hat{g}}_{ε} (X; Y)}{ε} = \frac{I (Y; Z)}{ε} = \frac{I (Y; Z^{'})}{ε^{'}} \leq \frac{g_{ε^{'}} (X; Y)}{ε^{'}}

Therefore, for

ε^{'} \leq ε

we have

\frac{g_{ε^{'}} (X; Y)}{ε^{'}} \geq \frac{g_{ε} (X; Y)}{ε}

. ☐

Similar to the lower bound for

g_{ε} (X; Y)

obtained from Lemma 1, we can obtain a lower bound for

{\hat{g}}_{ε} (X; Y)

using Lemma 5. Before we get to the lower bound, we need a data processing lemma for maximal correlation. The following lemma proves a version of strong data processing inequality for maximal correlation from which the typical data processing inequality follows, namely,

ρ_{m} (X; Z) \leq min {ρ_{m} (Y; Z), ρ_{m} (X; Y)}

for

X, Y

and Z satisfying

X -\circ- Y -\circ- Z

.

Lemma 6.

For random variables X and Y with a joint distribution

P_{X Y}

, we have

sup_{\begin{matrix} X -\circ- Y -\circ- Z \\ ρ_{m} (Y; Z) \neq 0 \end{matrix}} \frac{ρ_{m} (X; Z)}{ρ_{m} (Y; Z)} = ρ_{m} (X; Y)

Proof.

For arbitrary zero-mean and unit variance measurable functions

f \in L^{2} (X)

and

g \in L^{2} (Z)

and

X -\circ- Y -\circ- Z

, we have

E [f (X) g (Z)] = E [E [f (X) | Y] E [g (Z) | Y]] \leq ρ_{m} (X; Y) ρ_{m} (Y; Z)

where the inequality follows from the Cauchy-Schwartz inequality and (9). Thus we obtain

ρ_{m} (X; Z) \leq ρ_{m} (X; Y) ρ_{m} (Y; Z)

.

This bound is tight for the special case of

X \to Y \to X^{'}

, where

P_{X^{'} | Y}

is the backward channel associated with

P_{Y | X}

. In the following, we shall show that

ρ_{m} (X; Y) ρ_{m} (Y; X^{'}) = ρ_{m} (X; X^{'})

.

To this end, first note that the above implies that

ρ_{m} (X; Y) ρ_{m} (Y; X^{'}) \geq ρ_{m} (X; X^{'})

. Since

P_{X Y} = P_{X^{'} Y}

, it follows that

ρ_{m} (X; Y) = ρ_{m} (Y; X^{'})

and hence the above implies that

ρ_{m}^{2} (X; Y) \geq ρ_{m} (X; X^{'})

. One the other hand, we have

E [{[E [f (X) | Y]]}^{2}] = E [E [f (X) | Y] E [f (X^{'}) | Y]] = E [E [f (X) f (X^{'}) | Y]] = E [f (X) f (X^{'})]

which together with (9) implies that

ρ_{m}^{2} (X; Y) \leq sup_{f : E f (X) = 0, E f^{2} (X) = 1} E [f (X) f (X^{'})] \leq ρ_{m} (X; X^{'})

Thus,

ρ_{m}^{2} (X; Y) = ρ_{m} (X; X^{'})

which completes the proof. ☐

Now a lower bound of

{\hat{g}}_{ε} (X; Y)

can be readily obtained.

Corollary 7.

For a given joint distribution

P_{X Y}

defined over

X \times Y

, we have for any

ε > 0

{\hat{g}}_{ε} (X; Y) \geq \frac{H (Y)}{ρ_{m}^{2} (X; Y)} min {ε, ρ_{m}^{2} (X; Y)}

Proof.

By Lemma 6, we know that for any Markov chain

X -\circ- Y -\circ- Z

, we have

ρ_{m} (X; Z) \leq ρ_{m} (X; Y)

and hence for

ε \geq ρ_{m}^{2} (X; Y)

, the privacy constraint

ρ_{m}^{2} (X; Z) \leq ε

is not restrictive and hence

{\hat{g}}_{ε} (X; Y) = H (Y)

by setting

Y = Z

. For

0 < ε \leq ρ_{m}^{2} (X; Y)

, Lemma 5 implies that

\frac{{\hat{g}}_{ε} (X; Y)}{ε} \geq \frac{H (Y)}{ρ_{m}^{2} (X; Y)}

from which the result follows. ☐

A loose upper bound of

{\hat{g}}_{ε} (X; Y)

can be obtained using an argument similar to the one used for

g_{ε} (X; Y)

. For the Markov chain

X -\circ- Y -\circ- Z

, we have

\begin{matrix} I (Y; Z) & = & I (X; Z) + I (Y; Z | X) \leq I (X; Z) + H (Y | X) \\ \overset{(a)}{\leq} & log (k ρ_{m}^{2} (X; Z) + 1) + H (Y | X) \end{matrix}

(11)

where

k : = | X | - 1

and

(a)

comes from [23, Proposition 1]. We can, therefore, conclude from (11) and Corollary 7 that

ε \frac{H (Y)}{ρ_{m}^{2} (X; Y)} \leq {\hat{g}}_{ε} (X; Y) \leq log (k ε + 1) + H (Y | X)

(12)

Similar to Lemma 2, the following lemma shows that the

{\hat{g}}_{ε} (X; Y)

is a concave function of ε.

Lemma 8.

For any given pair of random variables

(X, Y)

with distribution P over

X \times Y

, the mapping

ε \mapsto {\hat{g}}_{ε} (X; Y)

is concave for

ε \geq 0

.

Proof.

The proof is similar to that of Lemma 2 except that here for two optimal filters

P_{Z_{1} | Y} : Y \to Z_{1}

and

P_{Z_{3} | Y} : Y \to Z_{3}

in

{\hat{D}}_{ε_{1}} (P)

and

{\hat{D}}_{ε_{3}} (P)

, respectively, and the random channel

P_{Z_{λ} | Y} : Y \to Z

with output alphabet

Z_{1} \cup Z_{3}

constructed using a coin flip with probability γ, we need to show that

P_{Z_{λ} | Y} \in {\hat{D}}_{ε_{2}} (P)

, where

0 \leq ε_{1} < ε_{2} < ε_{3} \leq ρ_{m}^{2} (X; Y)

. To show this, consider

f : X \to R

such that

E [f (X)] = 0

and

E [f^{2} (X)] = 1

and let U be a binary random variable as in the proof of Lemma 2. We then have

\begin{matrix} E [E^{2} [f (X) | Z_{λ}]] & = & E [E [E^{2} [f (X) | Z_{λ}] | U]] \\ \overset{(a)}{=} & γ E [E^{2} [f (X) | Z_{3}]] + (1 - γ) E [E^{2} [f (X) | Z_{1}]] \end{matrix}

(13)

where

(a)

comes from the fact that U is independent of X. We can then conclude from (13) and the alternative characterization of maximal correlation (9) that

\begin{matrix} ρ_{m}^{2} (X; Z_{λ}) & = & sup_{f : E [f (X)] = 0, E [f^{2} (X)] = 1} E [E^{2} [f (X) | Z_{λ}]] \\ = & sup_{f : E [f (X)] = 0, E [f^{2} (X)] = 1} [γ E [E^{2} [f (X) | Z_{3}]] + (1 - γ) E [E^{2} [f (X) | Z_{1}]]] \\ \leq & γ ρ_{m}^{2} (X; Z_{3}) + (1 - γ) ρ_{m}^{2} (X; Z_{1}) \leq γ ε_{3} + (1 - γ) ε_{1} \end{matrix}

from which we can conclude that

P_{Z_{λ} | Y} \in {\hat{D}}_{ε_{2}} (P)

. ☐

2.3. Non-Trivial Filters For Perfect Privacy

As it becomes clear later, requiring that

g_{0} (X; Y) = 0

is a useful assumption for the analysis of

g_{ε} (X; Y)

. Thus, it is interesting to find a necessary and sufficient condition on the joint distribution

P_{X Y}

which results in

g_{0} (X; Y) = 0

.

Definition 9

([38]). The random variable X is said to be weakly independent of Y if the rows of the transition matrix

P_{X | Y}

, i.e., the set of vectors

{P_{X | Y} (\cdot | y), y \in Y}

, are linearly dependent.

The following lemma provides a necessary and sufficient condition for

g_{0} (X; Y) > 0

.

Lemma 10.

For a given

(X, Y)

with a given joint distribution

P_{X Y} = P_{Y} \times P_{X | Y}

,

g_{0} (X; Y) > 0

(and equivalently

{\hat{g}}_{0} (X; Y) > 0

) if and only if X is weakly independent of Y.

Proof.

⇒ direction:

Assuming that

g_{0} (X; Y) > 0

implies that there exists a random variable Z over an alphabet

Z

such that the Markov condition

X -\circ- Y -\circ- Z

is satisfied and

Z ⊥ ⊥ X

while

I (Y; Z) > 0

. Hence, for any

z_{1}

and

z_{2}

in

Z

, we must have

P_{X | Z} (x | z_{1}) = P_{X | Z} (x | z_{2})

for all

x \in X

, which implies that

\sum_{y \in Y} P_{X | Y} (x | y) P_{Y | Z} (y | z_{1}) = \sum_{y \in Y} P_{X | Y} (x | y) P_{Y | Z} (y | z_{2})

and hence

\sum_{y \in Y} P_{X | Y} (x | y) [P_{Y | Z} (y | z_{1}) - P_{Y | Z} (y | z_{2})] = 0

Since Y is not independent of Z, there exist

z_{1}

and

z_{2}

such that

P_{Y | Z} (y | z_{1}) \neq P_{Y | Z} (y | z_{2})

and hence the above shows that the set of vectors

P_{X | Y} (\cdot | y)

,

y \in Y

is linearly dependent.

⇐ direction:

Berger and Yeung [38, Appendix II], in a completely different context, showed that if X being weakly independent of Y, one can always construct a binary random variable Z correlated with Y which satisfies

X -\circ- Y -\circ- Z

and

X ⊥ ⊥ Z

, and hence

g_{0} (X; Y) > 0

. ☐

Remark 3.

Lemma 10 first appeared in [1]. However, Calmon et al. [20] studied (1), the dual version of

g_{ε} (X; Y)

, and showed an equivalent result for

t_{R} (X; Y)

. In fact, they showed that for a given

P_{X Y}

, one can always generate Z such that

I (X; Z) = 0

,

I (Y; Z) > 0

and

X -\circ- Y -\circ- Z

, or equivalently

g_{0} (X; Y) > 0

, if and only if the smallest singular value of the conditional expectation operator

f \mapsto E [f (X) | Y]

is zero. This condition can, in fact, be shown to be equivalent to the condition in Lemma 10.

Remark 4.

It is clear that, according to Definition 9, X is weakly independent of Y if

| Y | > | X |

. Hence, Lemma 10 implies that

g_{0} (X; Y) > 0

if Y has strictly larger alphabet than X.

In light of the above remark, in the most common case of

| Y | = | X |

, one might have

g_{0} (X; Y) = 0

, which corresponds to the most conservative scenario as no privacy leakage implies no broadcasting of observable data. In such cases, the rate of increase of

g_{ε} (X; Y)

at

ε = 0

, that is

g_{0}^{'} (X; Y) : = \frac{d}{d ε} g_{ε} {(X; Y) |}_{ε = 0}

, which corresponds to the initial efficiency of privacy-constrained information extraction, proves to be very important in characterizing the behavior of

g_{ε} (X; Y)

for all

ε \geq 0

. This is because, for example, by concavity of

ε \mapsto g_{ε} (X; Y)

, the slope of

g_{ε} (X; Y)

is maximized at

ε = 0

and so

g_{0}^{'} (X; Y) = lim_{ε \to 0} \frac{g_{ε} (X; Y)}{ε} = sup_{ε > 0} \frac{g_{ε} (X; Y)}{ε}

and hence

g_{ε} (X; Y) \leq ε g_{0}^{'} (X; Y)

for all

ε \leq I (X; Y)

which, together with (4), implies that

g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}

if

g_{0}^{'} (X; Y) \leq \frac{H (Y)}{I (X; Y)}

. In the sequel, we always assume that X is not weakly independent of Y, or equivalently

g_{0} (X; Y) = 0

. For example, in light of Lemma 10 and Remark 4, we can assume that

| Y | \leq | X |

.

It is easy to show that, X is weakly independent of binary Y if and only if X and Y are independent (see, e.g., [38, Remark 2]). The following corollary, therefore, immediately follows from Lemma 10.

Corollary 11.

Let Y be a non-degenerate binary random variable correlated with X. Then

g_{0} (X; Y) = 0

.

3. Operational Interpretations of the Rate-Privacy Function

In this section, we provide a scenario in which

g_{ε} (X; Y)

appears as a boundary point of an achievable rate region and thus giving an information-theoretic operational interpretation for

g_{ε} (X; Y)

. We then proceed to present an estimation-theoretic motivation for

{\hat{g}}_{ε} (X; Y)

.

3.1. Dependence Dilution

Inspired by the problems of information amplification [39] and state masking [40], Courtade [26] proposed the information-masking tradeoff problem as follows. The tuple

(R_{u}, R_{v}, Δ_{A}, Δ_{M}) \in R^{4}

is said to be achievable if for two given separated sources

U \in U

and

V \in V

and any

ε > 0

there exist mappings

f : U^{n} \to {1, 2, \dots, 2^{n R_{u}}}

and

g : V^{n} \to {1, 2, \dots, 2^{n R_{v}}}

such that

I (U^{n}; f (U^{n}), g (V^{n})) \leq n (Δ_{M} + ε)

and

I (V^{n}; f (U^{n}), g (V^{n})) \geq n (Δ_{A} - ε)

. In other words,

(R_{u}, R_{v}, Δ_{A}, Δ_{M})

is achievable if there exist indices K and J of rates

R_{u}

and

R_{v}

given

U^{n}

and

V^{n}

, respectively, such that the receiver in possession of

(K, J)

can recover at most

n Δ_{M}

bits about

U^{n}

and at least

n Δ_{A}

about

V^{n}

. The closure of the set of all achievable tuple

(R_{u}, R_{v}, Δ_{A}, Δ_{M})

is characterized in [26]. Here, we look at a similar problem but for a joint encoder. In fact, we want to examine the achievable rate of an encoder observing both

X^{n}

and

Y^{n}

which masks

X^{n}

and amplifies

Y^{n}

at the same time, by rates

Δ_{M}

and

Δ_{A}

, respectively.

We define a

(2^{n R}, n)

dependence dilution code by an encoder

f_{n} : X^{n} \times Y^{n} \to {1, 2, \dots, 2^{n R}}

and a list decoder

g_{n} : {1, 2, \dots, 2^{n R}} \to 2^{Y^{n}}

where

2^{Y^{n}}

denotes the power set of

Y^{n}

. A dependence dilution triple

(R, Δ_{A}, Δ_{M}) \in R_{+}^{3}

is said to be achievable if, for any

δ > 0

, there exists a

(2^{n R}, n)

dependence dilution code that for sufficiently large n satisfies the utility constraint:

Pr (Y^{n} \notin g_{n} (J)) < δ

(14)

having a fixed list size

| g_{n} (J) | = 2^{n (H (Y) - Δ_{A})}, \forall J \in {1, 2, \dots, 2^{n R}}

(15)

where

J : = f_{n} (X^{n}, Y^{n})

is the encoder’s output, and satisfies the privacy constraint:

\frac{1}{n} I (X^{n}; J) \leq Δ_{M} + δ

(16)

Intuitively speaking, upon receiving J, the decoder is required to construct list

g_{n} (J) \subset Y^{n}

of fixed size which contains likely candidates of the actual sequence

Y^{n}

. Without any observation, the decoder can only construct a list of size

2^{n H (Y)}

which contains

Y^{n}

with probability close to one. However, after J is observed and the list

g_{n} (J)

is formed, the decoder’s list size can be reduced to

2^{n (H (Y) - Δ_{A})}

and thus reducing the uncertainty about

Y^{n}

by

0 \leq n Δ_{A} \leq n H (Y)

. This observation led Kim et al. [39] to show that the utility constraint (14) is equivalent to the amplification requirement

\frac{1}{n} I (Y^{n}; J) \geq Δ_{A} - δ

(17)

which lower bounds the amount of information J carries about

Y^{n}

. The following lemma gives an outer bound for the achievable dependence dilution region.

Theorem 12.

Any achievable dependence dilution triple

(R, Δ_{A}, Δ_{M})

satisfies

\{\begin{matrix} R \geq Δ_{A} \\ Δ_{A} \leq I (Y; U) \\ Δ_{M} \geq I (X; U) - I (Y; U) + Δ_{A} \end{matrix}

for some auxiliary random variable

U \in U

with a finite alphabet and jointly distributed with X and Y.

Before we prove this theorem, we need two preliminary lemmas. The first lemma is an extension of Fano’s inequality for list decoders and the second one makes use of a single-letterization technique to express

I (X^{n}; J) - I (Y^{n}; J)

in a single-letter form in the sense of Csiszár and Körner [29].

Lemma 13

([39,41]). Given a pair of random variables

(U, V)

defined over

U \times V

for finite

V

and arbitrary

U

, any list decoder

g : U \to 2^{V}

,

U \mapsto g (U)

of fixed list size m (i.e.,

| g (u) | = m, \forall u \in U

), satisfies

H (V | U) \leq h_{b} (p_{e}) + p_{e} log | V | + (1 - p_{e}) log m

where

p_{e} : = Pr (V \notin g (U))

and

h_{b} : [0, 1] \to [0, 1]

is the binary entropy function.

This lemma, applied to J and

Y^{n}

in place of U and V, respectively, implies that for any list decoder with the property (14), we have

H (Y^{n} | J) \leq log | g_{n} (J) | + n ε_{n}

(18)

where

ε_{n} : = \frac{1}{n} + (log | Y | - \frac{1}{n} log | g_{n} (J) |) p_{e}

and hence

ε_{n} \to 0

as

n \to \infty

.

Lemma 14.

Let

(X^{n}, Y^{n})

be n i.i.d. copies of a pair of random variables

(X, Y)

. Then for a random variable J jointly distributed with

(X^{n}, Y^{n})

, we have

I (X^{n}; J) - I (Y^{n}; J) = \sum_{i = 1}^{n} [I (X_{i}; U_{i}) - I (Y_{i}; U_{i})]

where

U_{i} : = (J, X_{i + 1}^{n}, Y^{i - 1})

.

Proof.

Using the chain rule for the mutual information, we can express

I (X^{n}; J)

as follows

\begin{matrix} I (X^{n}; J) & = & \sum_{i = 1}^{n} I (X_{i}; J | X_{i + 1}^{n}) = \sum_{i = 1}^{n} I (X_{i}; J, X_{i + 1}^{n}) \\ = & \sum_{i = 1}^{n} [I (X_{i}; J, X_{i + 1}^{n}, Y^{i - 1}) - I (X_{i}; Y^{i - 1} | J, X_{i + 1}^{n})] \\ = & \sum_{i = 1}^{n} I (X_{i}; U_{i}) - \sum_{i = 1}^{n} I (X_{i}; Y^{i - 1} | J, X_{i + 1}^{n}) \end{matrix}

(19)

Similarly, we can expand

I (Y^{n}; J)

as

\begin{matrix} I (Y^{n}; J) & = & \sum_{i = 1}^{n} I (Y_{i}; J | Y^{i - 1}) = \sum_{i = 1}^{n} I (Y_{i}; J, Y^{i - 1}) \\ = & \sum_{i = 1}^{n} [I (Y_{i}; J, X_{i + 1}^{n}, Y^{i - 1}) - I (Y_{i}; X_{i + 1}^{n} | J, Y^{i - 1})] \\ = & \sum_{i = 1}^{n} I (Y_{i}; U_{i}) - \sum_{i = 1}^{n} I (Y_{i}; X_{i + 1}^{n} | J, Y^{i - 1}) \end{matrix}

(20)

Subtracting (20) from (19), we get

\begin{matrix} I (X^{n}; J) - I (Y^{n}; J) & = & \sum_{i = 1}^{n} [I (X_{i}; U_{i}) - I (Y_{i}; U_{i})] - \sum_{i = 1}^{n} [I (X_{i}; Y^{i - 1} | J, X_{i + 1}^{n}) - I (X_{i + 1}^{n}; Y_{i} | J, Y^{i - 1})] \\ \overset{(a)}{=} & \sum_{i = 1}^{n} [I (X_{i}; U_{i}) - I (Y_{i}; U_{i})] \end{matrix}

where

(a)

follows from the Csiszár sum identity [42]. ☐

Proof of Theorem 12.

The rate R can be bounded as

\begin{matrix} n R & \geq & H (J) \geq I (Y^{n}; J) \\ = & n H (Y) - H (Y^{n} | J) \\ \overset{(a)}{\geq} & n H (Y) - log | g_{n} (J) | - n ε_{n} \end{matrix}

(21)

\begin{matrix} \overset{(b)}{=} & n Δ_{A} - n ε_{n} \end{matrix}

(22)

where

(a)

follows from Fano’s inequality (18) with

ε_{n} \to 0

as

n \to \infty

and

(b)

is due to (15). We can also upper bound

Δ_{A}

as

\begin{matrix} Δ_{A} & \overset{(a)}{=} & H (Y^{n}) - log | g_{n} (J) | \\ \overset{(b)}{\leq} & H (Y^{n}) - H (Y^{n} | J) + n ε_{n} \\ = & \sum_{i = 1}^{n} H (Y_{i}) - H (Y_{i} | Y^{i - 1}, J) + n ε_{n} \\ \leq & \sum_{i = 1}^{n} H (Y_{i}) - H (Y_{i} | Y^{i - 1}, X_{i + 1}^{n}, J) + n ε_{n} \\ = & \sum_{i = 1}^{n} I (Y_{i}; U_{i}) + n ε_{n} \end{matrix}

(23)

where

(a)

follows from (15),

(b)

follows from (18), and in the last equality the auxiliary random variable

U_{i} : = (Y^{i - 1}, X_{i + 1}^{n}, J)

is introduced.

We shall now lower bound

I (X^{n}; J)

:

\begin{matrix} n (Δ_{M} + δ) & \geq & I (X^{n}; J) \\ \overset{(a)}{=} & I (Y^{n}; J) + \sum_{i = 1}^{n} [I (X_{i}; U_{i}) - I (Y_{i}; U_{i})] \\ \overset{(b)}{\geq} & n Δ_{A} + \sum_{i = 1}^{n} [I (X_{i}; U_{i}) - I (Y_{i}; U_{i})] - n ε_{n} \end{matrix}

(24)

where

(a)

follows from Lemma 14 and

(b)

is due to Fano’s inequality and (15) (or equivalently from (17)).

Combining (22), (23) and (24), we can write

\begin{matrix} R & \geq & Δ_{A} - ε_{n} \\ Δ_{A} & \leq & I (Y_{Q}; U_{Q} | Q) + ε_{n} = I (Y_{Q}; U_{Q}, Q) + ε_{n} \\ Δ_{M} & \geq & Δ_{A} + I (X_{Q}; U_{Q} | Q) - I (Y_{Q}; U_{Q} | Q) - ε_{n}^{'} \\ = & Δ_{A} + I (X_{Q}; U_{Q}, Q) - I (Y_{Q}; U_{Q}, Q) - ε_{n}^{'} \end{matrix}

where

ε_{n}^{'} : = ε_{n} + δ

and Q is a random variable distributed uniformly over

{1, 2, \dots, n}

which is independent of

(X, Y)

and hence

I (Y_{Q}; U_{Q} | Q) = \frac{1}{n} \sum_{i = 1}^{n} I (Y_{i}; U_{i})

. The results follow by denoting

U : = (U_{Q}, Q)

and noting that

Y_{Q}

and

X_{Q}

have the same distributions as Y and X, respectively. ☐

If the encoder does not have direct access to the private source

X^{n}

, then we can define the encoder mapping as

f_{n} : Y^{n} \to {1, 2, \dots, s^{n R}}

. The following corollary is an immediate consequence of Theorem 12.

Corollary 15.

If the encoder does not see the private source, then for all achievable dependence dilution triple

(R, Δ_{A}, Δ_{M})

, we have

\{\begin{matrix} R \geq Δ_{A} \\ Δ_{A} \leq I (Y; U) \\ Δ_{M} \geq I (X; U) - I (Y; U) + Δ_{A} \end{matrix}

for some joint distribution

P_{X Y U} = P_{X Y} P_{U | Y}

where the auxiliary random variable

U \in U

satisfies

| U | \leq | Y | + 1

.

Remark 5.

If source Y is required to be amplified (according to (17)) at maximum rate, that is,

Δ_{A} = I (Y; U)

for an auxiliary random variable U which satisfies

X -\circ- Y -\circ- U

, then by Corollary 15, the best privacy performance one can expect from the dependence dilution setting is

Δ_{M}^{*} = min_{\begin{matrix} U : X -\circ- Y -\circ- U \\ I (Y; U) \geq Δ_{A} \end{matrix}} I (X; U)

(25)

which is equal to the dual of

g_{ε} (X; Y)

evaluated at

Δ_{A}

,

t_{Δ_{A}} (X; Y)

, as defined in (1).

The dependence dilution problem is closely related to the discriminatory lossy source coding problem studied in [15]. In this problem, an encoder f observes

(X^{n}, Y^{n})

and wants to describe this source to a decoder, g, such that g recovers

Y^{n}

within distortion level D and

I (f (X^{n}, Y^{n}); X^{n}) \leq n Δ_{M}

. If the distortion level is Hamming measure, then the distortion constraint and the amplification constraint are closely related via Fano’s inequality. Moreover, dependence dilution problem reduces to a secure lossless (list decoder of fixed size 1) source coding problem by setting

Δ_{A} = H (H)

, which is recently studied in [43].

3.2. MMSE Estimation of Functions of Private Information

In this section, we provide a justification for the privacy guarantee

ρ_{m}^{2} (X; Z) \leq ε

. To this end, we recall the definition of the minimum mean squared error estimation.

Definition 16.

Given random variables U and V,

mmse (U | V)

is defined as the minimum error of an estimate,

g (V)

, of U based on V, measured in the mean-square sense, that is

mmse (U | V) : = inf_{g \in L^{2} (V)} E [{(U - g (V))}^{2}] = E [{(U - E [U | V])}^{2}] = E [var (U | V)]

(26)

where

var (U | V)

denotes the conditional variance of U given V.

It is easy to see that

mmse (U | V) = 0

if and only if

U = f (V)

for some measurable function f and

mmse (U | V) = var (U)

if and only if

U ⊥ ⊥ V

. Hence, unlike for the case of maximal correlation, a small value of

mmse (U | V)

implies a strong dependence between U and V. Hence, although it is not a "proper" measure of correlation, in a certain sense it measures how well one random variable can be predicted from another one.

Given a non-degenerate measurable function

f : X \to R

, consider the following constraint on

mmse (f (X) | Y)

(1 - ε) var (f (X)) \leq mmse (f (X) | Z) \leq var (f (X)) .

(27)

This guarantees that no adversary knowing Z can efficiently estimate

f (X)

. First consider the case where f is an identity function, i.e.,

f (x) = x

. In this case, a direct calculation shows that

\begin{matrix} mmse (X | Z) & \overset{(a)}{=} & E [{(X - E [X | Z])}^{2}] = E [X^{2}] - E [{(E [X | Z])}^{2}] \\ = & var (X) (1 - ρ^{2} (X; E [X | Z])) \\ \overset{(b)}{\geq} & var (X) (1 - ρ_{m}^{2} (X; Z)) \end{matrix}

where

(a)

follows from (26) and

(b)

is due to the definition of maximal correlation. Having imposed

ρ_{m}^{2} (X; Z) \leq ε

, we, can therefore conclude that the MMSE of estimating X given Z satisfies

(1 - ε) var (X) \leq mmse (X | Z) \leq var (X)

(28)

which shows that

ρ_{m}^{2} (X; Z) \leq ε

implies (27) for

f (x) = x

. However, in the following we show that the constraint

ρ_{m}^{2} (X; Z) \leq ε

is, indeed, equivalent to (27) for any non-degenerate measurable

f : X \to R

.

Definition 17

([44]). A joint distribution

P_{U V}

satisfies a Poincaré inequality with constant

c \leq 1

if for all

f : U \to R

c \cdot var (f (U)) \leq mmse (f (U) | V)

and the Poincaré constant for

P_{U V}

is defined as

ϑ (U; V) : = inf_{f} \frac{mmse (f (U) | V)}{var (f (U))}

The privacy constraint (27) can then be viewed as

ϑ (X; Z) \geq 1 - ε .

(29)

Theorem 18

([44]). For any joint distribution

P_{U V}

, we have

ϑ (U; V) = 1 - ρ_{m}^{2} (U; V)

In light of Theorem 18 and (29), the privacy constraint (27) is equivalent to

ρ_{m}^{2} (X; Z) \leq ε

, that is,

ρ_{m}^{2} (X; Z) \leq ε ⟺ (1 - ε) var (f (X)) \leq mmse (f (X) | Z) \leq var (f (X))

for any non-degenerate measurable functions

f : X \to R

.

Hence,

{\hat{g}}_{ε} (X; Y)

characterizes the maximum information extraction from Y such that no (non-trivial) function of X can be efficiently estimated, in terms of MMSE (27), given the extracted information.

4. Observation Channels for Minimal and Maximal $g_{ε} (X; Y)$

In this section, we characterize the observation channels which achieve the lower or upper bounds on the rate-privacy function in (4). We first derive general conditions for achieving the lower bound and then present a large family of observation channels

P_{Y | X}

which achieve the lower bound. We also give a family of

P_{Y | X}

which attain the upper bound on

g_{ε} (X; Y)

.

4.1. Conditions for Minimal $g_{ε} (X; Y)$

Assuming that

g_{0} (X; Y) = 0

, we seek a set of conditions on

P_{X Y}

such that

g_{ε} (X; Y)

is linear in ε, or equivalently,

g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}

. In order to do this, we shall examine the slope of

g_{ε} (X; Y)

at zero. Recall that by concavity of

g_{ε} (X; Y)

, it is clear that

g_{0}^{'} (X; Y) \geq \frac{H (Y)}{I (X; Y)}

. We strengthen this bound in the following lemmas. For this, we need to recall the notion of Kullback-Leibler divergence. Given two probability distribution P and Q supported over a finite alphabet

U

,

D (P | | Q) : = \sum_{u \in U} P (u) log (\frac{P (u)}{Q (u)})

(30)

Lemma 19.

For a given joint distribution

P_{X Y} = P_{Y} \times P_{X | Y}

, if

g_{0} (X; Y) = 0

, then for any

ε \geq 0

g_{0}^{'} (X; Y) \geq max_{y \in Y} \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}

Proof.

The proof is given in Appendix A. ☐

Remark 6.

Note that if for a given joint distribution

P_{X Y}

, there exists

y_{0} \in Y

such that

D (P_{X | Y} (\cdot | y_{0}) | | P_{X} (\cdot)) = 0

, it implies that

P_{X | Y} (\cdot | y_{0}) = P_{X} (x)

. Consider the binary random variable

Z \in {1, e}

constructed according to the distribution

P_{Z | Y} (1 | y_{0}) = 1

and

P_{Z | Y} (e | y) = 1

for

y \in Y \ {y_{0}}

. We can now claim that Z is independent of X, because

P_{X | Z} (x | 1) = P_{X | Y} (x | y_{0}) = P_{X} (x)

and

\begin{matrix} P_{X | Z} (x | e) & = & \sum_{y \neq y_{0}} P_{X | Y} (x | y) P_{Y | Z} (y | e) = \sum_{y \neq y_{0}} P_{X | Y} (x | y) \frac{P_{Y} (y)}{1 - P_{Y} (y_{0})} \\ = & \frac{1}{1 - P_{Y} (y_{0})} \sum_{y \neq y_{0}} P_{X Y} (x, y) = P_{X} (x) \end{matrix}

Clearly, Z and Y are not independent, and hence

g_{0} (X; Y) > 0

. This implies that the right-hand side of inequality in Lemma 19 can not be infinity.

In order to prove the main result, we need the following simple lemma.

Lemma 20.

For any joint distribution

P_{X Y}

, we have

\frac{H (Y)}{I (X; Y)} \leq max_{y \in Y} \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))}

where equality holds if and only if there exists a constant

c > 0

such that

- log P_{Y} (y) = c D (P_{X | Y} (\cdot | y) | | P_{X} (x))

for all

y \in Y

.

Proof.

It is clear that

\frac{H (Y)}{I (X; Y)} = \frac{- \sum_{y \in Y} P_{Y} (y) log P_{Y} (y)}{\sum_{y \in Y} P_{Y} (y) D (P_{X | Y} (\cdot | y) | | P_{X} (x))} \leq max_{y \in Y} \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))}

where the inequality follows from the fact that for any three sequences of positive numbers

{a_{i}}_{i = 1}^{n}

,

{b_{i}}_{i = 1}^{n}

and

{λ_{i}}_{i = 1}^{n}

we have

\frac{\sum_{i = 1}^{n} λ_{i} a_{i}}{\sum_{i = 1}^{n} λ_{i} b_{i}} \leq {max}_{1 \leq i \leq n} \frac{a_{i}}{b_{i}}

where equality occurs if and only if

\frac{a_{i}}{b_{i}} = c

for all

1 \leq i \leq n

. ☐

Now we are ready to state the main result of this subsection.

Theorem 21.

For a given

(X, Y)

with joint distribution

P_{X Y} = P_{Y} \times P_{X | Y}

, if

g_{0} (X; Y) = 0

and

ε \mapsto g_{ε} (X; Y)

is linear for

0 \leq ε \leq I (X; Y)

, then for any

y \in Y

\frac{H (Y)}{I (X; Y)} = \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}

Proof.

Note that the fact that

g_{0} (X; Y) = 0

and

g_{ε} (X; Y)

is linear in ε is equivalent to

g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}

. It is, therefore, immediate from Lemmas 19 and 20 that we have

\begin{matrix} g_{0}^{'} (X; Y) & \overset{(a)}{=} & \frac{H (Y)}{I (X; Y)} \overset{(b)}{\leq} max_{y \in Y} \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))} \\ \overset{(c)}{\leq} & g_{0}^{'} (X; Y) \end{matrix}

(31)

where

(a)

follows from the fact that

g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}

and

(b)

and

(c)

are due to Lemmas 20 and 19, respectively. The inequality in (31) shows that

\frac{H (Y)}{I (X; Y)} = max_{y \in Y} \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))}

(32)

According to Lemma 20, (32) implies that the ratio of

\frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))}

does not depend on

y \in Y

and hence the result follows. ☐

This theorem implies that if there exists

y = y_{1}

and

y = y_{2}

such that

\frac{log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (x))}

results in two different values, then

ε \mapsto g_{ε} (X, Y)

cannot achieve the lower bound in (4), or equivalently

g_{ε} (X; Y) > ε \frac{H (Y)}{I (X; Y)}

This, therefore, gives a necessary condition for the lower bound to be achievable. The following corollary simplifies this necessary condition.

Corollary 22.

For a given joint distribution

P_{X Y} = P_{Y} \times P_{X | Y}

, if

g_{0} (X; Y) = 0

and

ε \mapsto g_{ε} (X; Y)

is linear, then the following are equivalent:

(i): Y is uniformly distributed,
(ii): $D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))$ is constant for all $y \in Y$ .

Proof.

(i) \Rightarrow (i i)

:

From Theorem 21, we have for all

y \in Y

\frac{H (Y)}{I (X; Y)} = \frac{- log (P_{Y} (y))}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}

(33)

Letting

D : = D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))

for any

y \in Y

, we have

\sum_{y} P_{Y} (y) D = I (X; Y)

and hence

D = I (X; Y)

, which together with (33) implies that

H (Y) = - log (P_{Y} (y))

for all

y \in Y

and hence Y is uniformly distributed.

(i i) \Rightarrow (i)

:

When Y is uniformly distributed, we have from (33) that

I (X; Y) = D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))

which implies that

D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))

is constant for all

y \in Y

. ☐

Example 1.

Suppose

P_{Y | X}

is a binary symmetric channel (BSC) with crossover probability

0 < α < 1

and

P_{X} = Bernoulli (0.5)

. In this case,

P_{X | Y}

is also a BSC with input distribution

P_{Y} = Bernoulli (0.5)

. Note that Corollary 11 implies that

g_{0} (X; Y) = 0

. We will show that

g_{ε} (X; Y)

is linear as a function of

ε \geq 0

for a larger family of symmetric channels (including BSC) in Corollary 24. Hence, the BSC with uniform input nicely illustrates Corollary 22, because

D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot)) = 1 - h (α)

for

y \in {0, 1}

.

Example 2.

Now suppose

P_{X | Y}

is a binary asymmetric channel such that

P_{X | Y} (\cdot | 0) = Bernoulli (α_{0})

,

P_{X | Y} (\cdot | 1) = Bernoulli (α_{1})

for some

0 < α_{0}, α_{1} < 1

and input distribution

P_{Y} = Bernoulli (p)

,

0 < p \leq 0.5

. It is easy to see that if

α_{0} + α_{1} = 1

then

D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))

does not depend on y and hence we can conclude from Corollary 22 (noticing that

g_{0} (X; Y) = 0

) that in this case for any

p < 0.5

,

g_{ε} (X; Y)

is not linear and hence for

0 < ε < I (X; Y)

g_{ε} (X; Y) > ε \frac{H (Y)}{I (X; Y)}

In Theorem 21, we showed that when

g_{ε} (X; Y)

achieves its lower bound, illustrated in (4), the slope of the mapping

ε \mapsto g_{ε} (X; Y)

at zero is equal to

\frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}

for any

y \in Y

. We will show in the next section that the reverse direction is also true at least for a large family of binary-input symmetric output channels, for instance when

P_{Y | X}

is a BSC, and thus showing that in this case,

g_{0}^{'} (X; Y) = \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}, \forall y \in Y ⟺ g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}, 0 \leq ε \leq I (X; Y)

4.2. Special Observation Channels

In this section, we apply the results of last section to different joint distributions

P_{X Y}

. In the first family of channels from X to Y, we look at the case where Y is binary and the reverse channel

P_{X | Y}

has symmetry in a particular sense, which will be specified later. One particular case of this family of channels is when

P_{X | Y}

is a BSC. As a family of observation channels which achieves the upper bound of

g_{ε} (X; Y)

, stated in (4), we look at the class of erasure channels from

X \to Y

, i.e., Y is an erasure version of X.

4.2.1. Observation Channels With Symmetric Reverse

The first example of

P_{X Y}

that we consider for binary Y is the so-called Binary Input Symmetric Output (BISO)

P_{X | Y}

, see for example [45,46]. Suppose

Y = {0, 1}

and

X = {0, \pm 1, \pm 2, \dots, \pm k}

, and for any

x \in X

we have

P_{X | Y} (x | 1) = P_{X | Y} (- x | 0)

. This clearly implies that

p_{0} : = P_{X | Y} (0 | 0) = P_{X | Y} (0 | 1)

. We notice that with this definition of symmetry, we can always assume that the output alphabet

X = {\pm 1, \pm 2, \dots, \pm k}

has even number of elements because we can split

X = 0

into two outputs,

X = 0^{+}

and

X = 0^{-}

, with

P_{X | Y} (0^{-} | 0) = P_{X | Y} (0^{+} | 0) = \frac{p_{0}}{2}

and

P_{X | Y} (0^{-} | 1) = P_{X | Y} (0^{+} | 1) = \frac{p_{0}}{2}

. The new channel is clearly essentially equivalent to the original one, see [46] for more details. This family of channels can also be characterized using the definition of quasi-symmetric channels [47, Definition 4.17]. A channel

W

is BISO if (after making

| X |

even) the transition matrix

P_{X | Y}

can be partitioned along its columns into binary-input binary-output sub-arrays in which rows are permutations of each other and the column sums are equal. It is clear that binary symmetric channels and binary erasure channels are both BISO. The following lemma gives an upper bound for

g_{ε} (X, Y)

when

P_{X | Y}

belongs to such a family of channels.

Lemma 23.

If the channel

P_{X | Y}

is BISO, then for

ε \in [0, I (X; Y)]

,

ε \frac{H (Y)}{I (X; Y)} \leq g_{ε} (X; Y) \leq H (Y) - \frac{I (X; Y) - ε}{C (P_{X | Y})}

where

C (P_{X | Y})

denotes the capacity of

P_{X | Y}

.

Proof.

The lower bound has already appeared in (4). To prove the upper bound note that by Markovity

X -\circ- Y -\circ- Z

, we have for any

x \in X

and

z \in Z

P_{X | Z} (x | z) = P_{X | Y} (x | 0) P_{Y | Z} (0 | z) + P_{X | Y} (x | 1) P_{Y | Z} (1 | z)

(34)

Now suppose

Z_{0} : = {z : P_{Y | Z} (0 | z) \leq P_{Y | Z} (1 | z)}

and similarly

Z_{1} : = {z : P_{Y | Z} (1 | z) \leq P_{Y | Z} (0 | z)}

. Then (34) allows us to write for

z \in Z_{0}

P_{X | Z} (x | z) = P_{X | Y} (x | 0) h_{b}^{- 1} (H (Y | Z = z)) + P_{X | Y} (x | 1) (1 - h_{b}^{- 1} (H (Y | Z = z)))

(35)

where

h_{b}^{- 1} : [0, 1] \to [0, 0.5]

is the inverse of binary entropy function, and for

z \in Z_{1}

,

P_{X | Z} (x | z) = P_{X | Y} (x | 0) (1 - h_{b}^{- 1} (H (Y | Z = z))) + P_{X | Y} (x | 1) h_{b}^{- 1} (H (Y | Z = z))

(36)

Letting

P \otimes h_{b}^{- 1} (H (Y | z))

and

\tilde{P} \otimes h_{b}^{- 1} (H (Y | z))

denote the right-hand sides of (35) and (36), respectively, we can, hence, write

\begin{matrix} H (X | Z) & = & \sum_{z \in Z} P_{Z} (z) H (X | Z = z) \\ \overset{(a)}{=} & \sum_{z \in Z_{0}} P_{Z} (z) H (P \otimes h_{b}^{- 1} (H (Y | Z = z))) + \sum_{z \in Z_{1}} P_{Z} (z) H (\tilde{P} \otimes h_{b}^{- 1} (H (Y | Z = z))) \\ \overset{(b)}{\leq} & \sum_{z \in Z_{0}} P_{Z} (z) [(1 - H (Y | Z = z)) H (P \otimes h_{b}^{- 1} (0)) + H (Y | Z = z) H (P \otimes h_{b}^{- 1} (1))] \\ + \sum_{z \in Z_{1}} P_{Z} (z) [(1 - H (Y | Z = z)) H (\tilde{P} \otimes h_{b}^{- 1} (0)) + H (Y | Z = z) H (\tilde{P} \otimes h_{b}^{- 1} (1))] \\ \overset{(c)}{=} & \sum_{z \in Z_{0}} P_{Z} (z) [(1 - H (Y | Z = z)) H (X | Y) + H (Y | Z = z) H (X_{unif})] \\ + \sum_{z \in Z_{1}} P_{Z} (z) [(1 - H (Y | Z = z)) H (X | Y) + H (Y | Z = z) H (X_{unif})] \\ = & H (X | Y) [1 - H (Y | Z)] + H (Y | Z) H (X_{unif}) \end{matrix}

where

H (X_{unif})

denotes the entropy of X when Y is uniformly distributed. Here,

(a)

is due to (35) and (36),

(b)

follows form convexity of

u \mapsto H (P \otimes h_{b}^{- 1} (u)))

for all

u \in [0, 1]

[48] and Jensen’s inequality. In

(c)

, we used the symmetry of channel

P_{X | Y}

to show that

H (X | Y = 0) = H (X | Y = 1) = H (X | Y)

. Hence, we obtain

H (Y | Z) \geq \frac{H (X | Z) - H (X | Y)}{H (X_{unif}) - H (X | Y)} = \frac{I (X; Y) - I (X; Z)}{C (P_{X | Y})}

where the equality follows from the fact that for BISO channel (and in general for any quasi-symmetric channel) the uniform input distribution is the capacity-achieving distribution [47, Lemma 4.18]. Since

g_{ε} (X; Y)

is attained when

I (X; Z) = ε

, the conclusion immediately follows. ☐

This lemma then shows that the larger the gap between

I (X; Y)

and

I (X; Y^{'})

is for

Y^{'} \sim Bernoulli (0.5)

, the more

g_{ε} (X; Y)

deviates from its lower bound. When

Y \sim Bernoulli (0.5)

, then

C (P_{Y | X}) = I (X; Y)

and

H (Y) = 1

and hence Lemma 23 implies that

\frac{ε}{I (X; Y)} \leq g_{ε} (X; Y) \leq 1 - \frac{I (X; Y) - ε}{I (X; Y)} = \frac{ε}{I (X; Y)}

and hence we have proved the following corollary.

Corollary 24.

If the channel

P_{X | Y}

is BISO and

Y \sim Bernoulli (0.5)

, then for any

ε \geq 0

g_{ε} (X; Y) = \frac{1}{I (X; Y)} min {ε, I (X; Y)}

This corollary now enables us to prove the reverse direction of Theorem 21 for the family of BISO channels.

Theorem 25.

If

P_{X | Y}

is a BISO channel, then the following statements are equivalent:

(i): $g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}$ for $0 \leq ε \leq I (X; Y)$ .
(ii): The initial efficiency of privacy-constrained information extraction is

$g_{0}^{'} (X; Y) = \frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}, \forall y \in Y$

Proof.

(i)⇒ (ii).

This follows from Theorem 21.

(ii)⇒ (i).

Let

Y \sim Bernoulli (p)

for

0 < p < 1

, and, as before,

X = {\pm 1, \pm 2, \dots, \pm k}

, so that

P_{X | Y}

is determined by a

2 \times (2 k)

matrix. We then have

\frac{- log P_{Y} (0)}{D (P_{X | Y} (\cdot | 0) | | P_{X} (\cdot))} = \frac{log (1 - p)}{H (X | Y) + \sum_{x = - k}^{k} P_{X | Y} (x | 0) log (P_{X} (x))}

(37)

and

\frac{- log P_{Y} (1)}{D (P_{X | Y} (\cdot | 1) | | P_{X} (\cdot))} = \frac{log (p)}{H (X | Y) + \sum_{x = - k}^{k} P_{X | Y} (x | 1) log (P_{X} (x))} .

(38)

The hypothesis implies that (37) is equal to (38), that is,

\frac{log (1 - p)}{H (X | Y) + \sum_{x = - k}^{k} P_{X | Y} (x | 0) log (P_{X} (x))} = \frac{log (p)}{H (X | Y) + \sum_{x = - k}^{k} P_{X | Y} (x | 1) log (P_{X} (x))}

(39)

It is shown in Appendix B that (39) holds if and only if

p = 0.5

. Now we can invoke Corollary 24 to conclude that

g_{ε} (X; Y) = ε \frac{H (Y)}{I (X; Y)}

. ☐

This theorem shows that for any BISO

P_{X | Y}

channel with uniform input, the optimal privacy filter is an erasure channel depicted in Figure 2. Note that if

P_{X | Y}

is a BSC with uniform input

P_{Y} = Bernoulli (0.5)

, then

P_{Y | X}

is also a BSC with uniform input

P_{X} = Bernoulli (0.5)

. The following corollary specializes Corollary 24 for this case.

Corollary 26.

For the joint distribution

P_{X} P_{Y | X} = Bernoulli (0.5) \times BSC (α)

, the binary erasure channel with erasure probability (shown in Figure 4)

δ (ε, α) : = 1 - \frac{ε}{I (X; Y)}

(40)

for

0 \leq ε \leq I (X; Y)

, is the optimal privacy filter in (3). In other words, for

ε \geq 0

g_{ε} (X; Y) = \frac{1}{I (X; Y)} min {ε, I (X; Y)}

Moreover, for a given

0 < α < \frac{1}{2}

,

P_{X} = Bernoulli (0.5)

is the only distribution for which

ε \mapsto g_{ε} (X; Y)

is linear. That is, for

P_{X} P_{Y | X} = Bernoulli (p) \times BSC (α)

,

0 < p < 0.5

, we have

g_{ε} (X; Y) > ε \frac{H (Y)}{I (X; Y)}

Proof.

As mentioned earlier, since

P_{X} = Bernoulli (0.5)

and

P_{Y | X}

is

BSC (α)

, it follows that

P_{X | Y}

is also a BSC with uniform input and hence from Corollary 24, we have

g_{ε} (X; Y) = \frac{ε}{I (X; Y)}

. As in this case

g_{ε} (X; Y)

achieves the lower bound given in Lemma 1, we conclude from Figure 2 that BEC(

δ (ε, α)

), where

δ (ε, α) = 1 - \frac{ε}{I (X; Y)}

, is an optimal privacy filter. The fact that

P_{X} = Bernoulli (0.5)

is the only input distribution for which

ε \mapsto g_{ε} (X; Y)

is linear follows from the proof of Theorem 25. In particular, we saw that a necessary and sufficient condition for

g_{ε} (X; Y)

being linear is that the ratio

\frac{- log P_{Y} (y)}{D (P_{X | Y} (\cdot | y) | | P_{X} (\cdot))}

is constant for all

y \in Y

. As shown before, this is equivalent to

Y \sim Bernoulli (0.5)

. For the binary symmetric channel, this is equivalent to

X \sim Bernoulli (0.5)

. ☐

The optimal privacy filter for BSC(α) and uniform X is shown in Figure 4. In fact, this corollary immediately implies that the general lower-bound given in (4) is tight for the binary symmetric channel with uniform X.

4.2.2. Erasure Observation Channel

Combining (8) and Lemma 1, we have for

ε \leq I (X; Y)

ε \frac{H (Y)}{I (X; Y)} + g_{0} (X; Y) [1 - \frac{ε}{I (X; Y)}] \leq g_{ε} (X; Y) \leq H (Y | X) + ε

(41)

In the following we show that the above upper and lower bound coincide when

P_{Y | X}

is an erasure channel, i.e.,

P_{Y | X} (x | x) = 1 - δ

and

P_{Y | X} (e | x) = δ

for all

x \in X

and

0 \leq δ \leq 1

.

Lemma 27.

For any given

(X, Y)

, if

P_{Y | X}

is an erasure channel (as defined above), then

g_{ε} (X; Y) = H (Y | X) + min {ε, I (X; Y)}

for any

ε \geq 0

.

Proof.

It suffices to show that if

P_{Y | X}

is an erasure channel, then

g_{0} (X; Y) = H (Y | X)

. This follows, since if

g_{0} (X; Y) = H (Y | X)

, then the lower bound in (41) becomes

H (Y | X) + ε

and thus

g_{ε} (X; Y) = H (Y | X) + ε

.

Let

| X | = m

and

Y = X \cup {e}

where e denotes the erasure symbol. Consider the following privacy filter to generate

Z \in Y

:

P_{Z | Y} (z | y) = \{\begin{matrix} \frac{1}{m} & if y \neq e, z \neq e, \\ 1 & if y = z = e . \end{matrix}

For any

x \in X

, we have

P_{Z | X} (z | x) = P_{Z | Y} (z | x) P_{Y | X} (x | x) + P_{Z | Y} (z | e) P_{Y | X} (e | x) = [\frac{1 - δ}{m}] 1_{{z \neq e}} + δ 1_{{z = e}}

which implies

Z ⊥ ⊥ X

and thus

I (X; Z) = 0

. On the other hand,

P_{Z} (z) = [\frac{1 - δ}{m}] 1_{{z \neq e}} + δ 1_{{z = e}}

, and therefore we have

\begin{matrix} g_{0} (X; Y) & \geq & I (Y; Z) = H (Z) - H (Z | Y) = H (\frac{1 - δ}{m}, \dots, \frac{1 - δ}{m}, δ) - (1 - δ) log (m) \\ = & h (δ) = H (Y | X) \end{matrix}

It then follows from Lemma 1 that

g_{0} (X; Y) = H (Y | X)

, which completes the proof. ☐

Example 3.

In light of this lemma, we can conclude that if

P_{Y | X} = BEC (δ)

, then the optimal privacy filter is a combination of an identity channel and a BSC(

α (ε, δ)

), as shown in Figure 5, where

0 \leq α (ε, δ) \leq \frac{1}{2}

is the unique solution of

(1 - δ) [h_{b} (α * p) - h_{b} (α)] = ε

(42)

where

X \sim Bernoulli (p)

,

p \leq 0.5

and

a * b = a (1 - b) + b (1 - a)

. Note that it is easy to check that

I (X; Z) = (1 - δ) [h_{b} (α * p) - h_{b} (α)]

. Therefore, in order for this channel to be a valid privacy filter, the crossover probability,

α (ε, δ)

, must be chosen such that

I (X; Z) = ε

. We note that for fixed

0 < δ < 1

and

0 < p < 0.5

, the map

α \mapsto (1 - δ) [h_{b} (α * p) - h_{b} (α)]

is monotonically decreasing on

[0, \frac{1}{2}]

ranging over

[0, (1 - δ) h_{b} (p)]

and since

ε \leq I (X; Y) = (1 - δ) h_{b} (p)

, the solution of the above equation is unique.

Combining Lemmas 1 and 27 with Corollary 26, we can show the following extremal property of the BEC and BSC channels, which is similar to other existing extremal properties of the BEC and the BSC, see, e.g., [46] and [45]. For

X \sim Bernoulli (0.5)

, we have for any channel

P_{Y | X}

,

g_{ε} (X; Y) \geq \frac{ε}{I (X; Y)} = g_{ε} (BSC (\hat{α}))

where

g_{ε} (BSC (α))

is the rate-privacy function corresponding to

P_{X Y} = Bernoulli (0.5) \times BSC (α)

and

\hat{α} : = h_{b}^{- 1} (H (X | Y))

. Similarly, if

X \sim Bernoulli (p)

, we have for any channel

P_{Y | X}

with

H (Y | X) \leq 1

,

g_{ε} (X; Y) \leq H (Y | X) + ε = g_{ε} (BEC (\hat{δ}))

where

g_{ε} (BEC (δ))

is the rate-privacy function corresponding to

P_{X Y} = Bernoulli (p) \times BEC (δ)

and

\hat{δ} : = h_{b}^{- 1} (H (Y | X))

.

5. Rate-Privacy Function for Continuous Random Variables

In this section we extend the rate-privacy function

g_{ε} (X; Y)

to the continuous case. Specifically, we assume that the private and observable data are continuous random variables and that the filter is composed of two stages: first Gaussian noise is added and then the resulting random variable is quantized using an M-bit accuracy uniform scalar quantizer (for some positive integer

M \in N

). These filters are of practical interest as they can be easily implemented. This section is divided in two subsections, in the first we discuss general properties of the rate-privacy function and in the second we study the Gaussian case in more detail. Some observations on

{\hat{g}}_{ε} (X; Y)

for continuous X and Y are also given.

5.1. General Properties of the Rate-Privacy Function

Throughout this section we assume that the random vector

(X, Y)

is absolutely continuous with respect to the Lebesgue measure on

R^{2}

. Additionally, we assume that its joint density

f_{X, Y}

satisfies the following.

(a): There exist constants $C_{1} > 0$ , $p > 1$ and bounded function $C_{2} : R \to R$ such that

$f_{Y} (y) \leq C_{1} {| y |}^{- p}$

and also for $x \in R$

$f_{Y | X} (y | x) \leq C_{2} (x) {| y |}^{- p}$
(b): $E [X^{2}]$ and $E [Y^{2}]$ are both finite,
(c): the differential entropy of $(X, Y)$ satisfies $h (X, Y) > - \infty$ ,
(d): $H (⌊ Y ⌋) < \infty$ , where $⌊ a ⌋$ denotes the largest integer ℓ such that $ℓ \leq a$ .

Note that assumptions (b) and (c) together imply that

h (X, Y)

,

h (X)

and

h (Y)

are finite, i.e., the maps

x \mapsto f_{X} (x) | log f_{X} (x) |, y \mapsto f_{Y} (y) | log f_{Y} (y) |

and

(x, y) \mapsto f_{X, Y} (x, y) | log (f_{X, Y} (x, y)) |

are integrable. We also assume that X and Y are not independent, since otherwise the problem to characterize

g_{ε} (X; Y)

becomes trivial by assuming that the displayed data Z can equal the observable data Y.

We are interested in filters of the form

Q_{M} (Y + γ N)

where

γ \geq 0

,

N \sim N (0, 1)

is a standard normal random variable which is independent of X and Y, and for any positive integer M,

Q_{M}

denotes the M-bit accuracy uniform scalar quantizer, i.e., for all

x \in R

Q_{M} (x) = \frac{1}{2^{M}} ⌊2^{M} x⌋

Let

Z_{γ} = Y + γ N

and

Z_{γ}^{M} = Q_{M} (Z_{γ}) = Q_{M} (Y + γ N)

. We define, for any

M \in N

,

g_{ε, M} (X; Y) : = sup_{\begin{matrix} γ \geq 0, \\ I (X; Z_{γ}^{M}) \leq ε \end{matrix}} I (Y; Z_{γ}^{M})

(43)

and similarly

g_{ε} (X; Y) : = sup_{\begin{matrix} γ \geq 0, \\ I (X; Z_{γ}) \leq ε \end{matrix}} I (Y; Z_{γ})

(44)

The next theorem shows that the previous definitions are closely related.

Theorem 28.

Let

ε > 0

be fixed. Then

lim_{M \to \infty} g_{ε, M} (X; Y) = g_{ε} (X; Y)

.

Proof.

See Appendix C. ☐

In the limit of large M,

g_{ε} (X; Y)

approximates

g_{ε, M} (X; Y)

. This becomes relevant when

g_{ε} (X; Y)

is easier to compute than

g_{ε, M} (X; Y)

, as demonstrated in the following subsection. The following theorem summarizes some general properties of

g_{ε} (X; Y)

.

Theorem 29.

The function

ε \mapsto g_{ε} (X; Y)

is non-negative, strictly-increasing, and satisfies

lim_{ε \to 0} g_{ε} (X; Y) = 0 a n d g_{I (X; Y)} (X; Y) = \infty

Proof.

See Appendix C. ☐

As opposed to the discrete case, in the continuous case

g_{ε} (X; Y)

is no longer bounded. In the following section we show that

ε \mapsto g_{ε} (X; Y)

can be convex, in contrast to the discrete case where it is always concave.

We can also define

{\hat{g}}_{ε, M} (X; Y)

and

{\hat{g}}_{ε} (X; Y)

for continuous X and Y, similar to (43) and (44), but where the privacy constraints are replaced by

ρ_{m}^{2} (X; Z_{γ}^{M}) \leq ε

and

ρ_{m}^{2} (X; Z_{γ}) \leq ε

, respectively. It is clear to see from Theorem 29 that

{\hat{g}}_{0} (X; Y) = g_{0} (X; Y) = 0

and

{\hat{g}}_{ρ^{2} (X; Y)} (X; Y) = \infty

. However, although we showed that

g_{ε} (X; Y)

is indeed the asymptotic approximation of

g_{ε, M} (X; Y)

for M large enough, it is not clear that the same statement holds for

{\hat{g}}_{ε} (X; Y)

and

{\hat{g}}_{ε, M} (X; Y)

.

5.2. Gaussian Information

The rate-privacy function for Gaussian Y has an interesting interpretation from an estimation theoretic point of view. Given the private and observable data

(X, Y)

, suppose an agent is required to estimateY based on the output of the privacy filter. We wish to know the effect of imposing a privacy constraint on the estimation performance.

The following lemma shows that

g_{ε} (X; Y)

bounds the best performance of the predictability of Y given the output of the privacy filter. The proof provided for this lemma does not use the Gaussianity of the noise process, so it holds for any noise process.

Lemma 30.

For any given private data X and Gaussian observable data Y, we have for any

ε \geq 0

inf_{\begin{matrix} γ \geq 0, \\ I (X; Z_{γ}) \leq ε \end{matrix}} mmse (Y | Z_{γ}) \geq var (Y) 2^{- 2 g_{ε} (X; Y)}

Proof.

It is a well-known fact from rate-distortion theory that for a Gaussian Y and its reconstruction

\hat{Y}

I (Y; \hat{Y}) \geq \frac{1}{2} log \frac{var (Y)}{E [{(Y - \hat{Y})}^{2}]}

and hence by setting

\hat{Y} = E [Y | Z_{γ}]

, where

Z_{γ}

is an output of a privacy filter, and noting that

I (Y; \hat{Y}) \leq I (Y; Z_{γ})

, we obtain

mmse (Y | Z_{γ}) \geq var (Y) 2^{- 2 I (Y; Z_{γ})}

(45)

from which the result follows immediately. ☐

According to Lemma 30, the quantity

λ_{ε} (X) : = 2^{- 2 g_{ε} (X; Y)}

is a parameter that bounds the difficulty of estimating Gaussian Y when observing an additive perturbation Z with privacy constraint

I (X; Z) \leq ε

. Note that

0 < λ_{ε} (X) \leq 1

, and therefore, provided that the privacy threshold is not trivial (i.e,

ε < I (X; Y)

), the mean squared error of estimating Y given the privacy filter output is bounded away from zero, however the bound decays exponentially at rate of

g_{ε} (X; Y)

.

To finish this section, assume that X and Y are jointly Gaussian with correlation coefficient ρ. The value of

g_{ε} (X; Y)

can be easily obtained in closed form as demonstrated in the following theorem.

Theorem 31.

Let

(X, Y)

be jointly Gaussian random variables with correlation coefficient ρ. For any

ε \in [0, I (X; Y))

we have

g_{ε} (X; Y) = \frac{1}{2} log (\frac{ρ^{2}}{2^{- 2 ε} + ρ^{2} - 1})

Proof.

One can always write

Y = a X + N_{1}

where

a^{2} = ρ^{2} \frac{var (Y)}{var (X)}

and

N_{1}

is a Gaussian random variable with mean 0 and variance

σ^{2} = (1 - ρ^{2}) var (Y)

which is independent of

(X, Y)

. On the other hand, we have

Z_{γ} = Y + γ N

where N is the standard Gaussian random variable independent of

(X, Y)

and hence

Z_{γ} = a X + N_{1} + γ N

. In order for this additive channel to be a privacy filter, it must satisfy

I (X; Z_{γ}) \leq ε

which implies

\frac{1}{2} log (\frac{var (Y) + γ^{2}}{σ^{2} + γ^{2}}) \leq ε

and hence

γ^{2} \geq \frac{2^{- 2 ε} + ρ^{2} - 1}{1 - 2^{- 2 ε}} var (Y) = : γ^{*}

Since

γ \mapsto I (Y; Z_{γ})

is strictly decreasing (cf., Appendix C), we obtain

\begin{matrix} g_{ε} (X; Y) & = & I (Y; Z_{γ^{*}}) = \frac{1}{2} log (1 + \frac{var (Y)}{γ^{2}}) \\ = & \frac{1}{2} log (1 + \frac{1 - 2^{- 2 ε}}{2^{- 2 ε} + ρ^{2} - 1}) ☐ \end{matrix}

(46)

According to (46), we conclude that the optimal privacy filter for jointly Gaussian

(X, Y)

is an additive Gaussian channel with signal to noise ratio

\frac{1 - 2^{- 2 ε}}{2^{- 2 ε} + ρ^{2} - 1}

, which shows that if perfect privacy is required, then the displayed data is independent of the observable data Y, i.e.,

g_{0} (X; Y) = 0

.

Remark 7.

We could assume that the privacy filter adds non-Gaussian noise to the observable data and define the rate-privacy function accordingly. To this end, we define

g_{ε}^{f} (X; Y) : = sup_{\binom{γ \geq 0,}{I (X; Z_{γ}^{f})}} I (Y; Z_{γ}^{f})

where

Z_{γ}^{f} = Y + γ M_{f}

and

M_{f}

is a noise process that has stable distribution with density f and is independent of

(X, Y)

. In this case, we can use a technique similar to Oohama [49] to lower bound

g_{ε}^{f} (X; Y)

for jointly Gaussian

(X, Y)

. Since X and Y are jointly Gaussian, we can write

X = a Y + b N

where

a^{2} = ρ^{2} \frac{var (X)}{var (Y)}

,

b = \sqrt{(1 - ρ^{2}) var X}

, and N is a standard Gaussian random variable that is independent of Y. We can apply the conditional entropy power inequality (cf., [42, Page 22]) for a random variable Z that is independent of N, to obtain

2^{2 h (X | Z)} \geq 2^{2 h (a Y | Z)} + 2^{2 h (N)} = a^{2} 2^{2 h (Y | Z)} + 2 π e (1 - ρ^{2}) var (X)

and hence

2^{- 2 I (X; Z)} 2^{2 h (X)} \geq a^{2} 2^{2 h (Y)} 2^{- 2 I (Y; Z)} + 2 π e (1 - ρ^{2}) var (X)

Assuming

Z = Z_{γ}^{f}

and taking infimum from both sides of above inequality over γ such that

I (X; Z_{γ}^{f}) \leq ε

, we obtain

g_{ε}^{f} (X; Y) \geq \frac{1}{2} log (\frac{ρ^{2}}{2^{- 2 ε} + ρ^{2} - 1}) = g_{ε} (X; Y)

which shows that for Gaussian

(X, Y)

, Gaussian noise is the worst stable additive noise in the sense of privacy-constrained information extraction.

We can also calculate

{\hat{g}}_{ε} (X; Y)

for jointly Gaussian

(X, Y)

.

Theorem 32.

Let

(X, Y)

be jointly Gaussian random variables with correlation coefficient ρ. For any

ε \in [0, ρ^{2})

we have that

{\hat{g}}_{ε} (X; Y) = \frac{1}{2} log (\frac{ρ^{2}}{ρ^{2} - ε})

Proof.

Since for the correlation coefficient between Y and

Z_{γ}

we have for any

γ \geq 0

,

ρ^{2} (Y; Z_{γ}) = \frac{var (Y)}{var (Y) + γ^{2}}

we can conclude that

ρ^{2} (X; Z_{γ}) = \frac{ρ^{2} var (Y)}{var (Y) + γ^{2}}

Since

ρ_{m}^{2} (X; Z) = ρ^{2} (X; Z)

(see, e.g., [34]), the privacy constraint

ρ_{m}^{2} (X; Z) \leq ε

implies that

\frac{ρ^{2} var (Y)}{var (Y) + γ^{2}} \leq ε

and hence

γ^{2} \geq \frac{(ρ^{2} - ε) var (Y)}{ε} = : {\hat{γ}}_{ε}^{2}

By monotonicity of the map

γ \mapsto I (Y; Z_{γ})

, we have

{\hat{g}}_{ε} (X; Y) = I (Y; Z_{{\hat{γ}}_{ε}}) = \frac{1}{2} log (1 + \frac{var (Y)}{{\hat{γ}}_{ε}^{2}}) = \frac{1}{2} log (\frac{ρ^{2}}{ρ^{2} - ε}) ☐

Theorems 31 and 32 show that unlike to the discrete case (cf. Lemmas 2 and 8),

ε \mapsto g_{ε} (X; Y)

and

ε \mapsto {\hat{g}}_{ε} (X; Y)

are convex.

6. Conclusions

In this paper, we studied the problem of determining the maximal amount of information that one can extract by observing a random variable Y, which is correlated with another random variable X that represents sensitive or private data, while ensuring that the extracted data Z meets a privacy constraint with respect to X. Specifically, given two correlated discrete random variables X and Y, we introduced the rate-privacy function as the maximization of

I (Y; Z)

over all stochastic ”privacy filters”

P_{Z | Y}

such that

p m (X; Z) \leq ϵ

, where

p m (\cdot; \cdot)

is a privacy measure and

ϵ \geq 0

is a given privacy threshold. We considered two possible privacy measure functions,

p m (X; Z) = I (X; Z)

and

p m (X; Z) = ρ_{m}^{2} (X; Z)

where

ρ_{m}

denotes maximal correlation, resulting in the rate-privacy functions

g_{ϵ} (X; Y)

and

{\hat{g}}_{ϵ} (X; Y)

, respectively. We analyzed these two functions, noting that each function lies between easily evaluated upper and lower bounds, and derived their monotonicity and concavity properties. We next provided an information-theoretic interpretation for

g_{ϵ} (X; Y)

and an estimation-theoretic characterization for

{\hat{g}}_{ϵ} (X; Y)

. In particular, we demonstrated that the dual function of

g_{ϵ} (X; Y)

is a corner point of an outer bound on the achievable region of the dependence dilution coding problem. We also showed that

{\hat{g}}_{ϵ} (X; Y)

constitutes the largest amount of information that can be extracted from Y such that no meaningful MMSE estimation of any function of X can be realized by just observing the extracted information Z. We then examined conditions on

P_{X Y}

under which the lower bound on

g_{ϵ} (X; Y)

is tight, hence determining the exact value of

g_{ϵ} (X; Y)

. We also showed that for any given Y, if the observation channel

P_{Y | X}

is an erasure channel, then

g_{ϵ} (X; Y)

attains its upper bound. Finally, we extended the notions of the rate-privacy functions

g_{ϵ} (X; Y)

and

{\hat{g}}_{ϵ} (X; Y)

to the continuous case where the observation channel consists of an additive Gaussian noise channel followed by uniform scalar quantization.

Acknowledgments

This work was supported in part by Natural Sciences and Engineering Council (NSERC) of Canada.

Author Contributions

All authors of this paper contributed equally. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 19

Given a joint distribution

P_{X Y}

defined over

X \times Y

where

X = {1, 2, \dots, m}

and

Y = {1, 2, \dots, n}

with

n \leq m

, we consider a privacy filter specified by the following distribution for

δ > 0

and

Z = {k, e}

\begin{matrix} P_{Z | Y} (k | y) & = & δ 1_{{y = k}} \end{matrix}

(A1)

\begin{matrix} P_{Z | Y} (e | y) & = & 1 - δ 1_{{y = k}} \end{matrix}

(A2)

where

1_{{\cdot}}

denotes the indicator function. The system of

X -\circ- Y -\circ- Z

in this case is depicted in Figure 6 for the case of

k = 1

.

We clearly have

P_{Z} (k) = δ P_{Y} (k)

and

P_{Z} (e) = 1 - δ P_{Y} (k)

, and hence

P_{X | Z} (x | k) = \frac{P_{X Z} (x, k)}{δ P_{Y} (k)} = \frac{P_{X Y Z} (x, k, k)}{δ P_{Y} (k)} = \frac{δ P_{X Y} (x, k)}{δ P_{Y} (k)} = P_{X | Y} (x | k)

and also,

\begin{matrix} P_{X | Z} (x | e) & = & \frac{P_{X Z} (x, e)}{1 - δ P_{Y} (k)} = \frac{\sum_{y} P_{X Y Z} (x, y, e)}{1 - δ P_{Y} (k)} \\ = & \frac{\sum_{y \neq k} P_{X Y Z} (x, y) + (1 - δ) P_{X Y} (x, k)}{1 - δ P_{Y} (k)} = \frac{P_{X} (x) - δ P_{X Y} (x, k)}{1 - δ P_{Y} (k)} \end{matrix}

It, therefore, follows that for

k \in {1, 2, \dots, n}

H (X | Z = k) = H (X | Y = k)

and

H (X | Z = e) = H (\frac{P_{X} (1) - δ P_{X Y} (1, k)}{1 - δ P_{Y} (k)}, \dots, \frac{P_{X} (m) - δ P_{X Y} (m, k)}{1 - δ P_{Y} (k)}) = : h_{X} (δ)

We then write

I (X; Z) = H (X) - H (X | Z) = H (X) - δ P_{Y} (k) H (X | Y = k) - (1 - δ P_{Y} (k)) h_{X} (δ)

and hence,

\frac{d}{d δ} I (X; Z) = - P_{Y} (k) H (X | Y = k) + P_{Y} (k) h_{X} (δ) - (1 - δ P_{Y} (k)) h_{X}^{'} (δ)

where

h_{X}^{'} (δ) = \frac{d}{d δ} h_{X} (δ) = - \sum_{x = 1}^{m} \frac{P_{X} (x) P_{Y} (k) - P_{X Y} (x, k)}{{[1 - δ P_{Y} (k)]}^{2}} log (\frac{P_{X} (x) - δ P_{X Y} (x, y)}{1 - δ P_{Y} (k)})

Using the first-order approximation of mutual information for

δ = 0

, we can write

\begin{matrix} I (X; Z) & = & \frac{d}{d δ} {I (X; Z) |}_{δ = 0} δ + o (δ) \\ = & δ [\sum_{x = 1}^{m} P_{X Y} (x, k) log (\frac{P_{X Y} (x, k)}{P_{X} (x) P_{Y} (k)})] + o (δ) \\ = & δ P_{Y} (k) D (P_{X | Y} (\cdot | k) | | P_{X} (\cdot)) + o (δ) \end{matrix}

(A3)

Similarly, we can write

\begin{matrix} I (Y; Z) & = & h (Z) - \sum_{y = 1}^{n} P_{Y} (y) h (Z | Y = y) = h (Z) - P_{Y} (k) h (δ) = h (δ P_{Y} (k)) - P_{Y} (k) h (δ) \\ = & - δ P_{Y} (k) log (P_{Y} (k)) - Ψ (1 - δ P_{Y} (k)) + P_{Y} (k) Ψ (1 - δ) \end{matrix}

where

Ψ (x) : = x log x

which yields

\frac{d}{d δ} I (Y; Z) = - Ψ (P_{Y} (k)) + P_{Y} (k) log (\frac{1 - δ P_{Y} (k)}{1 - δ})

From the above, we obtain

\begin{matrix} I (Y; Z) & = & \frac{d}{d δ} {I (Y; Z) |}_{δ = 0} δ + o (δ) \\ = & - δ Ψ (P_{Y} (k)) + o (δ) \end{matrix}

(A4)

Clearly from (A3), in order for the filter

P_{Z | Y}

specified in (A1) and (A2) to belong to

D_{ε} (P_{X Y})

, we must have

\frac{ε}{δ} = P_{Y} (k) D (P_{X | Y} (\cdot | k) | | P_{X} (\cdot)) + \frac{o (δ)}{δ}

and hence from (A4), we have

I (Y; Z) = \frac{- Ψ (P_{Y} (k))}{P_{Y} (k) D (P_{X | Y} (\cdot | k) | | P_{X} (\cdot))} ε + o (δ)

This immediately implies that

g_{0}^{'} (X; Y) = lim_{ε ↓ 0} \frac{g_{ε} (X; Y)}{ε} \geq \frac{- Ψ (P_{Y} (k))}{P_{Y} (k) D (P_{X | Y} (\cdot | k) | | P_{X} (\cdot))} = \frac{- log (P_{Y} (k))}{D (P_{X | Y} (\cdot | k) | | P_{X} (\cdot))}

(A5)

where we have used the assumption

g_{0} (X, Y) = 0

in the first equality.

Appendix B. Completion of Proof of Theorem 25

To prove that the equality (39) has only one solution

p = \frac{1}{2}

, we first show the following lemma.

Lemma 33.

Let P and Q be two distributions over

X = {\pm 1, \pm 2, \dots, \pm k}

which satisfy

P (x) = Q (- x)

. Let

R_{λ} : = λ P + (1 - λ) Q

for

λ \in (0, 1)

. Then

\frac{D (P | | R_{1 - λ})}{D (P | | R_{λ})} < \frac{log (1 - λ)}{log (λ)}

(A6)

for

λ \in (0, \frac{1}{2})

and

\frac{D (P | | R_{1 - λ})}{D (P | | R_{λ})} > \frac{log (1 - λ)}{log (λ)}

(A7)

for

λ \in (\frac{1}{2}, 1)

.

Note that it is easy to see that the map

λ \mapsto D (P | | R_{λ})

is convex and strictly decreasing and hence

D (P | | R_{λ}) > D (P | | R_{1 - λ})

when

λ \in (0, \frac{1}{2})

and

D (P | | R_{λ}) < D (P | | R_{1 - λ})

when

λ \in (\frac{1}{2}, 1)

. Inequality (A6) and (A7) strengthen these monotonic behavior and show that

D (P | | R_{λ}) > \frac{log (λ)}{log (1 - λ)} D (P | | R_{1 - λ})

and

D (P | | R_{λ}) < \frac{log (λ)}{log (1 - λ)} D (P | | R_{1 - λ})

for

λ \in (0, \frac{1}{2})

and

λ \in (\frac{1}{2}, 1)

, respectively.

Proof.

Without loss of generality, we can assume that

P (x) > 0

for all

x \in X

. Let

X_{+} : = {x \in X | P (X) > P (- x)}

,

X_{-} : = {x \in X | P (X) < P (- x)}

and

X_{0} : = {x \in X | P (X) = P (- x)}

. We notice that when

x \in X_{+}

, then

- x \in X_{-}

, and hence

| X_{+} | = | X_{-} | = m

for a

0 < m \leq k

. After relabelling if needed, we can therefore assume that

X_{+} = {1, 2, \dots, m}

and

X_{-} = {- m, \dots, - 2, - 1}

. We can write

\begin{matrix} D (P | | R_{λ}) & = & \sum_{x = - k}^{k} log (\frac{P (x)}{λ P (x) + (1 - λ) Q (x)}) = \sum_{x = - k}^{k} log (\frac{P (x)}{λ P (x) + (1 - λ) P (- x)}) \\ \overset{(a)}{=} & \sum_{x = 1}^{m} [P (x) log (\frac{P (x)}{λ P (x) + (1 - λ) P (- x)}) + P (- x) log (\frac{P (- x)}{λ P (- x) + (1 - λ) P (x)})] \\ \overset{(b)}{=} & \sum_{x = 1}^{m} [P (x) log (\frac{1}{λ + (1 - λ) ζ_{x}}) + P (x) ζ_{x} log (\frac{1}{λ + \frac{(1 - λ)}{ζ_{x}}})] \\ \overset{(c)}{=} & \sum_{x = 1}^{m} P (x) Ξ (λ, ζ_{x}) log (\frac{1}{λ}) \end{matrix}

where

(a)

follows from the fact that for

x \in X_{0}

,

log (\frac{P (x)}{R_{λ} (x)}) = 0

for any

λ \in (0, 1)

, and in

(b)

and

(c)

we introduced

ζ_{x} : = \frac{P (- x)}{P (x)}

and

Ξ (λ, ζ) : = \frac{1}{log (\frac{1}{λ})} (log (\frac{1}{λ + (1 - λ) ζ}) + ζ log (\frac{1}{λ + \frac{(1 - λ)}{ζ}}))

Similarly, we can write

\begin{matrix} D (P | | R_{1 - λ}) & = & \sum_{x = - k}^{k} log (\frac{P (x)}{(1 - λ) P (x) + λ Q (x)}) = \sum_{x = - k}^{k} log (\frac{P (x)}{(1 - λ) P (x) + λ P (- x)}) \\ = & \sum_{x = 1}^{m} [P (x) log (\frac{P (x)}{(1 - λ) P (x) + λ P (- x)}) + P (- x) log (\frac{P (- x)}{(1 - λ) P (- x) + λ P (x)})] \\ = & \sum_{x = 1}^{m} [P (x) log (\frac{1}{1 - λ + λ ζ_{x}}) + P (x) ζ_{x} log (\frac{1}{1 - λ + \frac{λ}{ζ_{x}}})] \\ = & \sum_{x = 1}^{m} P (x) Ξ (1 - λ, ζ_{x}) log (\frac{1}{1 - λ}) \end{matrix}

which implies that

\frac{D (P | | R_{λ})}{- log (λ)} - \frac{D (P | | R_{1 - λ})}{- log (1 - λ)} = \sum_{x = 1}^{m} P (x) [Ξ (λ, ζ_{x}) - Ξ (1 - λ, ζ_{x})]

Hence, in order to show (A6), it suffices to verify that

Φ (λ, ζ) : = Ξ (λ, ζ) - Ξ (1 - λ, ζ) > 0

(A8)

for any

λ \in (0, \frac{1}{2})

and

ζ \in (1, \infty)

. Since

log (λ) log (1 - λ)

is always positive for

λ \in (0, \frac{1}{2})

, it suffices to show that

h (ζ) : = Φ (λ, ζ) log (1 - λ) log (λ) > 0

(A9)

for

λ \in (0, \frac{1}{2})

and

ζ \in (1, \infty)

. We have

h^{''} (ζ) = A (λ, ζ) B (λ, ζ)

(A10)

where

A (λ, ζ) : = \frac{1 + ζ}{{(1 - λ + λ ζ)}^{2} {(λ + (1 - λ) ζ)}^{2} ζ}

and

B (λ, ζ) : = λ^{2} (1 + λ (λ - 2) {(ζ - 1)}^{2} + ζ (ζ - 1)) log (λ) - {(1 - λ)}^{2} (λ^{2} {(ζ - 1)}^{2} + ζ) log (1 - λ) .

We have

\frac{\partial^{2}}{\partial ζ^{2}} B (λ, ζ) = 2 λ^{2} {(1 - λ)}^{2} log (\frac{λ}{1 - λ}) < 0

because

λ \in (0, \frac{1}{2})

and hence

λ < 1 - λ

. This implies that the map

ζ \mapsto B (λ, ζ)

is concave for any

λ \in (0, \frac{1}{2})

and

ζ \in (1, \infty)

. Moreover, since

ζ \mapsto B (λ, ζ)

is a quadratic polynomial with negative leading coefficient, it is clear that

{lim}_{ζ \to \infty} B (λ, ζ) = - \infty

. Consider now

g (λ) : = B (λ, 1) = λ^{2} log (λ) - {(1 - λ)}^{2} log (1 - λ)

. We have

{lim}_{λ \to 0} g (λ) = g (\frac{1}{2}) = 0

and

g^{''} (λ) = 2 log (\frac{λ}{1 - λ}) < 0

for

λ \in (0, \frac{1}{2})

. It implies that

λ \mapsto g (λ)

is concave over

(0, \frac{1}{2})

and hence

g (λ) > 0

over

(0, \frac{1}{2})

which implies that

B (λ, 1) > 0

. This together with the fact that

ζ \mapsto B (λ, ζ)

is concave and it approaches to

- \infty

as

ζ \to \infty

imply that there exists a real number

c = c (λ) > 1

such that

B (λ, ζ) > 0

for all

ζ \in (1, c)

and

B (λ, ζ) < 0

for all

ζ \in (c, \infty)

. Since

A (λ, ζ) > 0

, it follows from (A10) that

ζ \mapsto h (ζ)

is convex over

(1, c)

and concave over

(c, \infty)

. Since

h (1) = h^{'} (1) = 0

and

{lim}_{ζ \to \infty} h (ζ) = \infty

, we can conclude that

h (ζ) > 0

over

(1, \infty)

. That is,

Φ (λ, ζ) > 0

and thus

Ξ (λ, ζ) - Ξ (1 - λ, ζ) > 0

, for

λ \in (0, \frac{1}{2})

and

ζ \in (1, \infty)

.

The inequality (A7) can be proved by (A6) and switching λ to

1 - λ

. ☐

Letting

P (\cdot) = P_{X | Y} (\cdot | 1)

and

Q (\cdot) = P_{X | Y} (\cdot | 0)

and

λ = Pr (Y = 1) = p

, we have

R_{p} (x) = P_{X} (x) = p P (x) + (1 - p) Q (x)

and

R_{1 - p} = P_{X} (- x) = (1 - p) P (x) + p Q (x)

. Since

D (P_{X | Y} (\cdot | 0) | | P_{X} (\cdot)) = D (P | | R_{1 - p})

, we can conclude from Lemma 33 that

\frac{D (P_{X | Y} (\cdot | 0) | | P_{X} (\cdot))}{- log (1 - p)} < \frac{D (P_{X | Y} (\cdot | 1) | | P_{X} (\cdot))}{- log (p)}

over

p \in (0, \frac{1}{2})

and

\frac{D (P_{X | Y} (\cdot | 0) | | P_{X} (\cdot))}{- log (1 - p)} > \frac{D (P_{X | Y} (\cdot | 1) | | P_{X} (\cdot))}{- log (p)}

over

p \in (\frac{1}{2}, 1)

, and hence equation (39) has only solution

p = \frac{1}{2}

.

Appendix C. Proof of Theorems 28 and 29

The proof of Theorem 29 does not depend on the proof of Theorem 28, so, there is no harm in proving the former theorem first. The following version of the data-processing inequality will be required.

Lemma 34.

Let X and Y be absolutely continuous random variables such that X, Y and

(X, Y)

have finite differential entropies. If V is an absolutely continuous random variable independent of X and Y, then

I (X; Y + V) \leq I (X; Y)

with equality if and only if X and Y are independent.

Proof.

Since

X -\circ- Y -\circ- (Y + V)

, the data processing inequality implies that

I (X; Y + V) \leq I (X; Y)

. It therefore suffices to show that this inequality is tight if and only X and Y are independent. It is known that data processing inequality is tight if and only if

X -\circ- (Y + V) -\circ- Y

. This is equivalent to saying that for any measurable set

A \subset R

and for

P_{Y + V}

almost all z,

Pr (X \in A | Y + V = z, Y = y) = Pr (X \in A | Y + V = z)

. On the other hand, due to the independence of V and

(X, Y)

, we have

Pr (X \in A | Y + V = z, Y = y) = Pr (X \in A | Y = z - v)

. Hence, the equality holds if and only if

Pr (X \in A | Y + V = z) = Pr (X \in A | Y = z - v)

which implies that X and Y must be independent. ☐

Lemma 35.

In the notation of Section 5.1, the function

γ \mapsto I (Y; Z_{γ})

is strictly-decreasing and continuous. Additionally, it satisfies

I (Y; Z_{γ}) \leq \frac{1}{2} log (1 + \frac{var (Y)}{γ^{2}})

with equality if and only if Y is Gaussian. In particular,

I (Y; Z_{γ}) \to 0

as

γ \to \infty

.

Proof.

Recall that, by assumption b),

var (Y)

is finite. The finiteness of the entropy of Y follows from assumption, the corresponding statement for

Y + γ N

follows from a routine application of the entropy power inequality [50, Theorem 17.7.3] and the fact that

var (Y + γ N) = var (Y) + γ^{2} < \infty

, and for

(Y, Y + γ N)

the same conclusion follows by the chain rule for differential entropy. The data processing inequality, as stated in Lemma 34, implies

I (Y; Z_{γ + δ}) \leq I (Y; Y + γ N) = I (Y; Z_{γ})

Clearly Y and

Y + γ N

are not independent, therefore the inequality is strict and thus

γ \mapsto I (Y, Z_{γ})

is strictly-decreasing.

Continuity will be studied for

γ = 0

and

γ > 0

separately. Recall that

h (γ N) = \frac{1}{2} log (2 π e γ^{2})

. In particular,

lim_{γ \to 0} h (γ N) = - \infty

. The entropy power inequality shows then that

lim_{γ \to 0} I (Y; Y + γ N) = \infty

. This coincides with the convention

I (Y; Z_{0}) = I (Y; Y) = \infty

. For

γ > 0

, let

{(γ_{n})}_{n \geq 1}

be a sequence of positive numbers such that

γ_{n} \to γ

. Observe that

\begin{matrix} I (Y; Z_{γ_{n}}) & = h (Y + γ_{n} N) - h (γ_{n} N) = h (Y + γ_{n} N) - \frac{1}{2} log (2 π e γ_{n}^{2}) \end{matrix}

Since

lim_{n \to \infty} \frac{1}{2} log (2 π e γ_{n}^{2}) = \frac{1}{2} log (2 π e γ^{2})

, we only have to show that

h (Y + γ_{n} N) \to h (Y + γ N)

as

n \to \infty

to establish the continuity at γ. This, in fact, follows from de Bruijn’s identity (cf., [50, Theorem 17.7.2]).

Since the channel from Y to

Z_{γ}

is an additive Gaussian noise channel, we have

I (Y; Z_{γ}) \leq \frac{1}{2} log (1 + \frac{var (Y)}{γ^{2}})

with equality if and only if Y is Gaussian. The claimed limit as

γ \to 0

is clear. ☐

Lemma 36.

The function

γ \mapsto I (X; Z_{γ})

is strictly-decreasing and continuous. Moreover,

I (X; Z_{γ}) \to 0

when

γ \to \infty

.

Proof.

The proof of the strictly-decreasing behavior of

γ \mapsto I (X; Z_{γ})

is proved as in the previous lemma.

To prove continuity, let

γ \geq 0

be fixed. Let

{(γ_{n})}_{n \geq 1}

be any sequence of positive numbers converging to γ. First suppose that

γ > 0

. Observe that

I (X; Z_{γ_{n}}) = h (Y + γ_{n} N) - h (Y + γ_{n} N | X)

for all

n \geq 1

. As shown in Lemma 35,

h (Y + γ_{n} N) \to h (Y + γ N)

as

n \to \infty

. Therefore, it is enough to show that

h (Y + γ_{n} N | X) \to h (Y + γ N | X)

as

n \to \infty

. Note that by de Bruijn’s identity, we have

h (Y + γ_{n} N | X = x) \to h (Y + γ N | X = x)

as

n \to \infty

for all

x \in R

. Note also that since

h (Z_{γ_{n}} | X = x) \leq \frac{1}{2} log (2 π e var (Z_{γ_{n}} | x))

we can write

h (Z_{γ_{n}} | X) \leq E [\frac{1}{2} log (2 π e var (Z_{γ_{n}} | X))] \leq \frac{1}{2} log (2 π e E [var (Z_{γ_{n}} | X)])

and hence we can apply dominated convergence theorem to show that

h (Y + γ_{n} N | X) \to h (Y + γ N | X)

as

n \to \infty

.

To prove the continuity at

γ = 0

, we first note that Linder and Zamir [51, Page 2028] showed that

h (Y + γ_{n} N | X = x) \to h (Y | X = x)

as

n \to \infty

, then, as before, by dominated convergence theorem we can show that

h (Y + γ_{n} N | X) \to h (Y | X)

. Similarly [51] implies that

h (Y + γ_{n} N) \to h (Y)

. This concludes the proof of the continuity of

γ \mapsto I (X; Z_{γ})

.

Furthermore, by the data processing inequality and previous lemma,

0 \leq I (X; Z_{γ}) \leq I (Y; Z_{γ}) \leq \frac{1}{2} log (1 + \frac{var (Y)}{γ^{2}})

and hence we conclude that

lim_{γ \to \infty} I (X; Z_{γ}) = 0

. ☐

Proof of Theorem 29.

The nonnegativity of

g_{ε} (X; Y)

follows directly from definition.

By Lemma 36, for every

0 < ε \leq I (X; Y)

there exists a unique

γ_{ε} \in [0, \infty)

such that

I (X; Z_{γ_{ε}}) = ε

, so

g_{ε} (X; Y) = I (Y; Z_{γ_{ε}})

. Moreover,

ε \mapsto γ_{ε}

is strictly decreasing. Since

γ \mapsto I (Y; Z_{γ})

is strictly-decreasing, we conclude that

ε \mapsto g_{ε} (X; Y)

is strictly increasing.

The fact that

ε \mapsto γ_{ε}

is strictly decreasing, also implies that

γ_{ε} \to \infty

as

ε \to 0

. In particular,

lim_{ε \to 0} g_{ε} (X; Y) = lim_{ε \to 0} I (Y; Z_{γ_{ε}}) = lim_{γ_{ε} \to \infty} I (Y; Z_{γ_{ε}}) = lim_{γ \to \infty} I (Y; Z_{γ}) = 0

By the data processing inequality we have that

I (X; Z_{γ}) \leq I (X; Y)

for all

γ \geq 0

, i.e., any filter satisfies the privacy constraint for

ε = I (X; Y)

. Thus,

g_{I (X; Y)} (X; Y) \geq I (Y; Y) = \infty

. ☐

In order to prove Theorem 28, we first recall the following theorem by Rényi [52].

Theorem 37

([52]). If U is an absolutely continuous random variable with density

f_{U} (x)

and if

H (⌊ U ⌋) < \infty

, then

lim_{n \to \infty} H (n^{- 1} ⌊ n U ⌋) - log (n) = - \int_{R} f_{U} (x) log f_{U} (x) d x

provided that the integral on the right hand side exists.

We will need the following consequence of the previous theorem.

Lemma 38.

If U is an absolutely continuous random variable with density

f_{U} (x)

and if

H (⌊ U ⌋) < \infty

, then

H (Q_{M} (U)) - M \geq H (Q_{M + 1} (U)) - (M + 1)

for all

M \geq 1

and

lim_{n \to \infty} H (Q_{M} (U)) - M = - \int_{R} f_{U} (x) log f_{U} (x) d x

provided that the integral on the right hand side exists.

The previous lemma follows from the fact that

Q_{M + 1} (U)

is constructed by refining the quantization partition for

Q_{M} (U)

.

Lemma 39.

For any

γ \geq 0

,

lim_{M \to \infty} I (X; Z_{γ}^{M}) = I (X; Z_{γ}) and lim_{M \to \infty} I (Y; Z_{γ}^{M}) = I (Y; Z_{γ})

Proof.

Observe that

\begin{matrix} I (X; Z_{γ}^{M}) & = I (X; Q_{M} (Y + γ N)) \\ = H (Q_{M} (Y + γ N)) - H (Q_{M} (Y + γ N) | X) \\ = [H (Q_{M} (Y + γ N)) - M] - \int_{R} f_{X} (x) [H (Q_{M} (Y + γ N) | X = x) - M] d x \end{matrix}

By the previous lemma, the integrand is decreasing in M, and thus we can take the limit with respect to M inside the integral. Thus,

lim_{M \to \infty} I (X; Z_{γ}^{M}) = h (Y + γ N) - h (Y + γ N | X) = I (X; Z_{γ})

The proof for

I (Y; Z_{γ}^{M})

is analogous. ☐

Lemma 40.

Fix

M \in N

. Assume that

f_{Y} (y) \leq C {| y |}^{- p}

for some positive constant C and

p > 1

. For integer k and

γ \geq 0

, let

p_{k, γ} : = Pr (Q_{M} (Y + γ N) = \frac{k}{2^{M}})

Then

p_{k, γ} \leq \frac{C 2^{(p - 1) M + p}}{k^{p}} + 1_{{γ > 0}} \frac{γ 2^{M + 1}}{k \sqrt{2 π}} e^{- k^{2} / 2^{2 M + 3} γ^{2}}

Proof.

The case

γ = 0

is trivial, so we assume that

γ > 0

. For notational simplicity, let

r_{a} = \frac{a}{2^{M}}

for all

a \in Z

. Assume that

k \geq 0

. Observe that

\begin{matrix} p_{k, γ} & = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f_{γ N} (n) f_{Y} (y) 1_{[r_{k}, r_{k + 1})} (y + n) d y d n \\ = \int_{- \infty}^{\infty} \frac{e^{- n^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} Pr (Y \in [r_{k}, r_{k + 1}) - n) d n \end{matrix}

We will estimate the above integral by breaking it up into two pieces.

First, we consider

\begin{matrix} \int_{- \infty}^{\frac{r_{k}}{2}} \frac{e^{- n^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} Pr (Y \in [r_{k}, r_{k + 1}) - n) d n \end{matrix}

When

n \leq \frac{r_{k}}{2}

, then

r_{k} - n \geq r_{k} / 2

. By the assumption on the density of Y,

\begin{matrix} Pr (Y \in [r_{k}, r_{k + 1}) - n) & \leq \frac{C}{2^{M}} {(\frac{r_{k}}{2})}^{- p} \end{matrix}

(The previous estimate is the only contribution when

γ = 0

.) Therefore,

\begin{matrix} \int_{- \infty}^{\frac{r_{k}}{2}} \frac{e^{- n^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} Pr (Y \in [r_{k}, r_{k + 1}) - n) d n & \leq \frac{C}{2^{M}} {(\frac{r_{k}}{2})}^{- p} \int_{- \infty}^{\frac{r_{k}}{2}} \frac{e^{- n^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} d n \\ \leq \frac{C 2^{(p - 1) M + p}}{k^{p}} \end{matrix}

Using the trivial bound

Pr (Y \in [r_{k}, r_{k + 1}) - n) \leq 1

and well known estimates for the error function, we obtain that

\begin{matrix} \int_{\frac{r_{k}}{2}}^{\infty} \frac{e^{- n^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} Pr (Y \in [r_{k}, r_{k + 1}) - n) d n & < \frac{1}{\sqrt{2 π}} \frac{2 γ}{r_{k}} e^{- r_{k}^{2} / 8 γ^{2}} \\ = \frac{γ 2^{M + 1}}{k \sqrt{2 π}} e^{- k^{2} / 2^{2 M + 3} γ^{2}} \end{matrix}

Therefore,

p_{k, γ} \leq \frac{C 2^{(p - 1) M + p}}{k^{p}} + \frac{γ 2^{M + 1}}{k \sqrt{2 π}} e^{- k^{2} / 2^{2 M + 3} γ^{2}}

The proof for

k < 0

is completely analogous. ☐

Lemma 41.

Fix

M \in N

. Assume that

f_{Y} (y) \leq C {| y |}^{- p}

for some positive constant C and

p > 1

. The mapping

γ \mapsto H (Q_{M} (Y + γ N))

is continuous.

Proof.

Let

{(γ_{n})}_{n \geq 1}

be a sequence of non-negative real numbers converging to

γ_{0}

. First, we will prove continuity at

γ_{0} > 0

. Without loss of generality, assume that

γ_{n} > 0

for all

n \in N

. Define

γ_{*} = inf {γ_{n} | n \geq 1}

and

γ^{*} = sup {γ_{n} | n \geq 1}

. Clearly

0 < γ_{*} \leq γ^{*} < \infty

. Recall that

p_{k, γ} = \int_{R} \frac{e^{- z^{2} / 2 γ^{2}}}{\sqrt{2 π γ^{2}}} Pr (Y \in [\frac{k}{2^{M}}, \frac{k + 1}{2^{M}}) - z) d z

Since, for all

n \in N

and

z \in R

,

\begin{matrix} \frac{e^{- z^{2} / 2 γ_{n}^{2}}}{\sqrt{2 π γ_{n}^{2}}} Pr (Y \in [\frac{k}{2^{M}}, \frac{k + 1}{2^{M}}) - z) & \leq \frac{e^{- z^{2} / 2 {(γ^{*})}^{2}}}{\sqrt{2 π γ_{*}^{2}}} \end{matrix}

the dominated convergence theorem implies that

lim_{n \to \infty} p_{k, γ_{n}} = p_{k, γ_{0}}

(A11)

The previous lemma implies that for all

n \geq 0

and

| k | > 0

,

p_{k, γ_{n}} \leq \frac{C 2^{(p - 1) M + p}}{k^{p}} + \frac{γ_{n} 2^{M + 1}}{k \sqrt{2 π}} e^{- k^{2} / 2^{2 M + 3} γ_{n}^{2}}

Thus, for k large enough,

p_{k, γ_{n}} \leq \frac{A}{k^{p}}

for a suitable positive constant A that does not depend on n. Since the function

x \mapsto - x log (x)

is increasing in

[0, 1 / 2]

, there exists

K^{'} > 0

such that for

| k | > K^{'}

- p_{k, γ_{n}} log (p_{k, γ_{n}}) \leq \frac{A}{k^{p}} log (A^{- 1} k^{p})

Since

\sum_{| k | > K^{'}} \frac{A}{k^{p}} log (A^{- 1} k^{p}) < \infty

, for any

ϵ > 0

there exists

K_{ϵ}

such that

\sum_{| k | > K_{ϵ}} \frac{A}{k^{p}} log (A^{- 1} k^{p}) < ϵ

In particular, for all

n \geq 0

,

\begin{matrix} H (Q (Y + γ_{n} N)) - \sum_{| k | \leq K_{ϵ}} - p_{k, γ_{n}} log (p_{k, γ_{n}}) & = \sum_{| k | > K_{ϵ}} - p_{k, γ_{n}} log (p_{k, γ_{n}}) < ϵ \end{matrix}

Therefore, for all

n \geq 1

,

\begin{matrix} |H (Q (Y + γ_{n} N)) - H (Q (Y + γ_{0} N))| \\ \leq \sum_{| k | > K_{ϵ}} - p_{k, γ_{n}} log (p_{k, γ_{n}}) + |\sum_{| k | \leq K_{ϵ}} p_{k, γ_{0}} log (p_{k, γ_{0}}) - p_{k, γ_{n}} log (p_{k, γ_{n}})| + \sum_{| k | > K_{ϵ}} - p_{k, γ_{0}} log (p_{k, γ_{0}}) \\ \leq ϵ + |\sum_{| k | \leq K_{ϵ}} p_{k, γ_{0}} log (p_{k, γ_{0}}) - p_{k, γ_{n}} log (p_{k, γ_{n}})| + ϵ \end{matrix}

By continuity of the function

x \mapsto - x log (x)

on

[0, 1]

and equation (A11), we conclude that

\underset{n \to \infty}{lim sup} |H (Q (Y + γ_{n} N)) - H (Q (Y + γ_{0} N))| \leq 3 ϵ

Since ϵ is arbitrary,

lim_{n \to \infty} H (Q (Y + γ_{n} N)) = H (Q (Y + γ_{0} N))

as we wanted to prove.

To prove continuity at

γ_{0} = 0

, observe that equation (A11) holds in this case as well. The rest is analogous to the case

γ_{0} > 0

. ☐

Lemma 42.

The functions

γ \mapsto I (X; Z_{γ}^{M})

and

γ \mapsto I (Y; Z_{γ}^{M})

are continuous for each

M \in N

.

Proof.

Since

H (Q_{M} (Y + γ N) | Y = y)

and

H (Q_{M} (Y + γ N) | X = x)

for

x, y \in R

are bounded by M, and

f_{Y | X} (y | x)

satisfies assumption (b), the conclusion follows from the dominated convergence theorem. ☐

Proof of Theorem 28.

For every

M \in N

, let

Γ_{ϵ}^{M} : = {γ \geq 0 | I (X; Z_{γ}^{M}) \leq ϵ}

. The Markov chain

X \to Y \to Z_{γ} \to Z_{γ}^{M + 1} \to Z_{γ}^{M}

and the data processing inequality imply that

I (X; Z_{γ}) \geq I (X; Z_{γ}^{M + 1}) \geq I (X; Z_{γ}^{M})

and, in particular,

ϵ = I (X; Z_{γ_{ϵ}}) \geq I (X; Z_{γ_{ϵ}}^{M + 1}) \geq I (X; Z_{γ_{ϵ}}^{M})

where

γ_{ϵ}

is as defined in the proof of Theorem 29. This implies then that

γ_{ϵ} \in Γ_{ϵ}^{M + 1} \subset Γ_{ϵ}^{M}

(A12)

and thus

I (Y; Z_{γ_{ϵ}}^{M}) \leq g_{ϵ, M} (X; Y)

Taking limits in both sides, Lemma 39 implies

g_{ϵ} (X; Y) = I (Y; Z_{γ_{ϵ}}) \leq \underset{M \to \infty}{lim inf} g_{ϵ, M} (X; Y)

(A13)

Observe that

\begin{matrix} g_{ϵ, M} (X; Y) & = sup_{γ \in Γ_{ϵ}^{M}} I (Y; Z_{γ}^{M}) \\ \leq sup_{γ \in Γ_{ϵ}^{M}} I (Y; Z_{γ}) \\ = I (Y; Z_{γ_{ϵ, m i n}^{M}}) \end{matrix}

(a14)

where inequality follows from Markovity and

γ_{ϵ, min}^{M} : = {inf}_{Γ_{ϵ}^{M}} γ

. By equation (A12),

γ_{ϵ} \in Γ_{ϵ}^{M + 1} \subset Γ_{ϵ}^{M}

and in particular

γ_{ϵ, min}^{M} \leq γ_{ϵ, min}^{M + 1} \leq γ_{ϵ}

. Thus,

{γ_{ε, min}^{M}}

is an increasing sequence in M and bounded from above and, hence, has a limit. Let

γ_{ϵ, min} = lim_{M \to \infty} γ_{ϵ, min}^{M}

. Clearly

γ_{ϵ, min} \leq γ_{ϵ}

(A15)

By the previous lemma we know that

I (X; Z_{γ}^{M})

is continuous, so

Γ_{ϵ}^{M}

is closed for all

M \in N

. Thus, we have that

γ_{ϵ, min}^{M} = {min}_{Γ_{ϵ}^{M}} γ

and in particular

γ_{ϵ, min}^{M} \in Γ_{ϵ}^{M}

. By the inclusion

Γ_{ϵ}^{M + 1} \subset Γ_{ϵ}^{M}

, we have then that

γ_{ϵ, min}^{M + n} \in Γ_{ϵ}^{M}

for all

n \in N

. By closedness of

Γ_{ϵ}^{M}

we have then that

γ_{ϵ, min} \in Γ_{ϵ}^{M}

for all

M \in N

. In particular,

I (X; Z_{γ_{ϵ, min}}^{M}) \leq ϵ

for all

M \in N

. By Lemma 39,

I (X; Z_{γ_{ϵ, min}}) \leq ϵ = I (X; Z_{γ_{ϵ}})

and by the monotonicity of

γ \mapsto I (X; Z_{γ})

, we obtain that

γ_{ϵ} \leq γ_{ϵ, min}

. Combining the previous inequality with (A15) we conclude that

γ_{ϵ, min} = γ_{ϵ}

. Taking limits in the inequality (A14)

\underset{M \to \infty}{lim sup} g_{ϵ, M} (X; Y) \leq \underset{M \to \infty}{lim sup} I (Y; Z_{γ_{ϵ, min}^{M}}) = I (Y; Z_{γ_{ϵ, min}})

Plugging

γ_{ϵ, min} = γ_{ϵ}

in above we conclude that

\underset{M \to \infty}{lim sup} g_{ϵ, M} (X; Y) \leq I (Y; Z_{γ_{ϵ}}) = g_{ϵ} (X; Y)

and therefore

lim_{M \to \infty} g_{ϵ, M} (X; Y) = g_{ϵ} (X; Y)

. ☐

References

Asoodeh, S.; Alajaji, F.; Linder, T. Notes on information-theoretic privacy. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–3 October 2014; pp. 1272–1278.
Asoodeh, S.; Alajaji, F.; Linder, T. On maximal correlation, mutual information and data privacy. In Proceedings of the IEEE 14th Canadian Workshop on Information Theory (CWIT), St. John’s, NL, Canada, 6–9 July 2015; pp. 27–31.
Warner, S.L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
Blum, A.; Ligett, K.; Roth, A. A learning theory approach to non-interactive database privacy. In Proceedings of the Fortieth Annual ACM Symposium on the Theory of Computing, Victoria, BC, Canada, 17–20 May 2008; pp. 1123–1127.
Dinur, I.; Nissim, K. Revealing information while preserving privacy. In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, San Diego, CA, USA, 9–11 June 2003; pp. 202–210.
Rubinstein, P.B.; Bartlett, L.; Huang, J.; Taft, N. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. J. Priv. Confid. 2012, 4, 65–100. [Google Scholar]
Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Privacy aware learning. 2014; arXiv: 1210.2085. [Google Scholar]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography (TCC’06), New York, NY, USA, 5–7 March 2006; pp. 265–284.
Dwork, C. Differential privacy: A survey of results. In Theory and Applications of Models of Computation, Proceedings of the 5th International Conference, TAMC 2008, Xi’an, China, 25–29 April 2008; Agrawal, M., Du, D., Duan, Z., Li, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2008. Lecture Notes in Computer Science. Volume 4978, pp. 1–19. [Google Scholar]
Dwork, C.; Lei, J. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 437–442.
Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. 2014; arXiv: 1407.1338v2. [Google Scholar]
Calmon, F.P.; Varia, M.; Médard, M.; Christiansen, M.M.; Duffy, K.R.; Tessaro, S. Bounds on inference. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 567–574.
Yamamoto, H. A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers. IEEE Trans. Inf. Theory 1983, 29, 918–923. [Google Scholar] [CrossRef]
Sankar, L.; Rajagopalan, S.; Poor, H. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Tandon, R.; Sankar, L.; Poor, H. Discriminatory lossy source coding: side information privacy. IEEE Trans. Inf. Theory 2013, 59, 5665–5677. [Google Scholar] [CrossRef]
Calmon, F.; Fawaz, N. Privacy against statistical inference. In Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 1401–1408.
Rebollo-Monedero, D.; Forne, J.; Domingo-Ferrer, J. From t-closeness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng. 2010, 22, 1623–1636. [Google Scholar] [CrossRef] [Green Version]
Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the IEEE Information Theory Workshop (ITW), Hobart, Australia, 2–5 November 2014; pp. 501–505.
Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. 2000; arXiv: physics/0004057. [Google Scholar]
Calmon, F.P.; Makhdoumi, A.; Médard, M. Fundamental limits of perfect privacy. In Proceedings of the IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1796–1800.
Wyner, A.D. The Wire-Tap Channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Makhdoumi, A.; Fawaz, N. Privacy-utility tradeoff under statistical uncertainty. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 1627–1634.
Li, C.T.; El Gamal, A. Maximal correlation secrecy. 2015; arXiv: 1412.5374. [Google Scholar]
Ahlswede, R.; Gács, P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 1976, 4, 925–939. [Google Scholar] [CrossRef]
Anantharam, V.; Gohari, A.; Kamath, S.; Nair, C. On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover. 2014; arXiv:1304.6133v1. [Google Scholar]
Courtade, T. Information masking and amplification: The source coding setting. In Proceedings of the IEEE Int. Symp. Inf. Theory (ISIT), Boston, MA, USA, 1–6 July 2012; pp. 189–193.
Goldwasser, S.; Micali, S. Probabilistic encryption. J. Comput. Syst. Sci. 1984, 28, 270–299. [Google Scholar] [CrossRef]
Rockafellar, R.T. Convex Analysis; Princeton Univerity Press: Princeton, NJ, USA, 1997. [Google Scholar]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Shulman, N.; Feder, M. The uniform distribution as a universal prior. IEEE Trans. Inf. Theory 2004, 50, 1356–1362. [Google Scholar] [CrossRef]
Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw Hill: New York, NY, USA, 1987. [Google Scholar]
Gebelein, H. Das statistische Problem der Korrelation als Variations- und Eigenwert-problem und sein Zusammenhang mit der Ausgleichungsrechnung. Zeitschrift f ur Angewandte Mathematik und Mechanik 1941, 21, 364–379. (In German) [Google Scholar] [CrossRef]
Hirschfeld, H.O. A connection between correlation and contingency. Camb. Philos. Soc. 1935, 31, 520–524. [Google Scholar] [CrossRef]
Rényi, A. On measures of dependence. Acta Mathematica Academiae Scientiarum Hungarica 1959, 10, 441–451. [Google Scholar] [CrossRef]
Linfoot, E.H. An informational measure of correlation. Inf. Control 1957, 1, 85–89. [Google Scholar] [CrossRef]
Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica 1967, 2, 229–318. [Google Scholar]
Zhao, L. Common Randomness, Efficiency, and Actions. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2011. [Google Scholar]
Berger, T.; Yeung, R. Multiterminal source encoding with encoder breakdown. IEEE Trans. Inf. Theory 1989, 35, 237–244. [Google Scholar] [CrossRef]
Kim, Y.H.; Sutivong, A.; Cover, T. State mplification. IEEE Trans. Inf. Theory 2008, 54, 1850–1859. [Google Scholar] [CrossRef]
Merhav, N.; Shamai, S. Information rates subject to state masking. IEEE Trans. Inf. Theory 2007, 53, 2254–2261. [Google Scholar] [CrossRef]
Ahlswede, R.; Körner, J. Source coding with side information and a converse for degraded broadcast channels. IEEE Trans. Inf. Theory 1975, 21, 629–637. [Google Scholar] [CrossRef]
Kim, Y.H.; El Gamal, A. Network Information Theory; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Asoodeh, S.; Alajaji, F.; Linder, T. Lossless secure source coding, Yamamoto’s setting. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–2 October 2015.
Raginsky, M. Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels. In Proceedings of the IEEE Int. Sym. Inf. Theory (ISIT), Istanbul, Turkey, 7–12 July 2013; pp. 419–423.
Geng, Y.; Nair, C.; Shamai, S.; Wang, Z.V. On broadcast channels with binary inputs and symmetric outputs. IEEE Trans. Inf. Theory 2013, 59, 6980–6989. [Google Scholar] [CrossRef]
Sutskover, I.; Shamai, S.; Ziv, J. Extremes of information combining. IEEE Trans. Inf. Theory 2005, 51, 1313–1325. [Google Scholar] [CrossRef]
Alajaji, F.; Chen, P.N. Information Theory for Single User Systems, Part I. Course Notes, Queen’s University. Available online: http://www.mast.queensu.ca/math474/it-lecture-notes.pdf (accessed on 4 March 2015).
Chayat, N.; Shamai, S. Extension of an entropy property for binary input memoryless symmetric channels. IEEE Trans.Inf. Theory 1989, 35, 1077–1079. [Google Scholar] [CrossRef]
Oohama, Y. Gaussian multiterminal source coding. IEEE Trans. Inf. Theory 1997, 43, 2254–2261. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
Linder, T.; Zamir, R. On the asymptotic tightness of the Shannon lower bound. IEEE Trans. Inf. Theory 2008, 40, 2026–2031. [Google Scholar] [CrossRef]
Rényi, A. On the dimension and entropy of probability distributions. cta Mathematica Academiae Scientiarum Hungarica 1959, 10, 193–215. [Google Scholar] [CrossRef]

Figure 1. Information-theoretic privacy.

Figure 2. Privacy filter that achieves the lower bound in (4) where

Z_{δ}

is the output of an erasure privacy filter with erasure probability specified in (5).

Figure 2. Privacy filter that achieves the lower bound in (4) where

Z_{δ}

is the output of an erasure privacy filter with erasure probability specified in (5).

Figure 3. The region of

g_{ε} (X; Y)

in terms of ε specified by (4).

Figure 3. The region of

g_{ε} (X; Y)

in terms of ε specified by (4).

Figure 4. Optimal privacy filter for

P_{Y | X} = B S C (α)

with uniform X where

δ (ε, α)

is specified in (40).

Figure 4. Optimal privacy filter for

P_{Y | X} = B S C (α)

with uniform X where

δ (ε, α)

is specified in (40).

Figure 5. Optimal privacy filter for

P_{Y | X} = B E C (δ)

where

δ (ε, α)

is specified in (42).

Figure 5. Optimal privacy filter for

P_{Y | X} = B E C (δ)

where

δ (ε, α)

is specified in (42).

Figure 6. The privacy filter associated with (A1) and (A2) with

k = 1

. We have

P_{Z | Y} (\cdot | 1) = Bernoulli (δ)

and

P_{Z | Y} (\cdot | y) = Bernoulli (0)

for

y \in {2, 3, \dots, n}

.

Figure 6. The privacy filter associated with (A1) and (A2) with

k = 1

. We have

P_{Z | Y} (\cdot | 1) = Bernoulli (δ)

and

P_{Z | Y} (\cdot | y) = Bernoulli (0)

for

y \in {2, 3, \dots, n}

.

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Asoodeh, S.; Diaz, M.; Alajaji, F.; Linder, T. Information Extraction Under Privacy Constraints. Information 2016, 7, 15. https://doi.org/10.3390/info7010015

AMA Style

Asoodeh S, Diaz M, Alajaji F, Linder T. Information Extraction Under Privacy Constraints. Information. 2016; 7(1):15. https://doi.org/10.3390/info7010015

Chicago/Turabian Style

Asoodeh, Shahab, Mario Diaz, Fady Alajaji, and Tamás Linder. 2016. "Information Extraction Under Privacy Constraints" Information 7, no. 1: 15. https://doi.org/10.3390/info7010015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Extraction Under Privacy Constraints^†

Abstract

1. Introduction

1.1. Our Model and Main Contributions

1.2. Organization

2. Utility-Privacy Measures: Definitions and Properties

2.1. Mutual Information as Privacy Measure

2.2. Maximal Correlation as Privacy Measure

2.3. Non-Trivial Filters For Perfect Privacy

3. Operational Interpretations of the Rate-Privacy Function

3.1. Dependence Dilution

3.2. MMSE Estimation of Functions of Private Information

4. Observation Channels for Minimal and Maximal $g_{ε} (X; Y)$

4.1. Conditions for Minimal $g_{ε} (X; Y)$

4.2. Special Observation Channels

4.2.1. Observation Channels With Symmetric Reverse

4.2.2. Erasure Observation Channel

5. Rate-Privacy Function for Continuous Random Variables

5.1. General Properties of the Rate-Privacy Function

5.2. Gaussian Information

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Lemma 19

Appendix B. Completion of Proof of Theorem 25

Appendix C. Proof of Theorems 28 and 29

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Information Extraction Under Privacy Constraints †

Abstract

1. Introduction

1.1. Our Model and Main Contributions

1.2. Organization

2. Utility-Privacy Measures: Definitions and Properties

2.1. Mutual Information as Privacy Measure

2.2. Maximal Correlation as Privacy Measure

2.3. Non-Trivial Filters For Perfect Privacy

3. Operational Interpretations of the Rate-Privacy Function

3.1. Dependence Dilution

3.2. MMSE Estimation of Functions of Private Information

4. Observation Channels for Minimal and Maximal g ε ( X ; Y )

4.1. Conditions for Minimal g ε ( X ; Y )

4.2. Special Observation Channels

4.2.1. Observation Channels With Symmetric Reverse

4.2.2. Erasure Observation Channel

5. Rate-Privacy Function for Continuous Random Variables

5.1. General Properties of the Rate-Privacy Function

5.2. Gaussian Information

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Lemma 19

Appendix B. Completion of Proof of Theorem 25

Appendix C. Proof of Theorems 28 and 29

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Information Extraction Under Privacy Constraints^†

4. Observation Channels for Minimal and Maximal $g_{ε} (X; Y)$

4.1. Conditions for Minimal $g_{ε} (X; Y)$