B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy

Chen, Youqin; Xu, Zhengquan; Chen, Jianzhang; Jia, Shan

doi:10.3390/e24030404

Open AccessArticle

B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy

¹

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

²

College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China

³

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(3), 404; https://doi.org/10.3390/e24030404

Submission received: 11 February 2022 / Revised: 5 March 2022 / Accepted: 8 March 2022 / Published: 14 March 2022

(This article belongs to the Special Issue Adversarial Intelligence: Secrecy, Privacy, and Robustness)

Download

Browse Figures

Versions Notes

Abstract

:

Differential privacy (DP) has become a de facto standard to achieve data privacy. However, the utility of DP solutions with the premise of privacy priority is often unacceptable in real-world applications. In this paper, we propose the best-effort differential privacy (B-DP) to promise the preference for utility first and design two new metrics including the point belief degree and the regional average belief degree to evaluate its privacy from a new perspective of preference for privacy. Therein, the preference for privacy and utility is referred to as expected privacy protection (EPP) and expected data utility (EDU), respectively. We also investigate how to realize B-DP with an existing DP mechanism (KRR) and a newly constructed mechanism (EXP

_{Q}

) in the dynamic check-in data collection and publishing. Extensive experiments on two real-world check-in datasets verify the effectiveness of the concept of B-DP. Our newly constructed EXP

_{Q}

can also satisfy a better B-DP than KRR to provide a good trade-off between privacy and utility.

Keywords:

B-DP; DP; check-in data; dynamic collection and publishing

1. Introduction

The explosive progress of mobile Internet and location technology, LBS (Location Based Service) applications, including Brightkite, Gowalla, Facebook and other social network platforms, generate a large number of check-in data every day. Check-in data generally include information such as time, locations, PoI (Points of Interest) attributes, mood and comments, and hence the check-in data has become a carrier of a user’s life trajectory and interest tendency [1,2,3,4]. However, a data analyst’s mining and analysis of the check-in data may directly or indirectly expose the sensitive information of a data provider [5,6,7,8,9]. There have been many privacy protection methods [10,11,12,13,14,15,16]. Some of them [10,11] rely on specific attack assumptions and background knowledge, and some methods [12,13,14,15] are based on differential privacy (DP) [17]. DP provides provable privacy protection, which is independent of the background knowledge and computational power of an attacker. The protection level of DP is evaluated by privacy budget [17]. When the privacy budget is relatively small, it has strong privacy protection, but the utility is often poor [17]. With the gradual integration of DP on practical applications, utility has become the bottleneck of its development and popularization.

In general, there is a contradiction between privacy and utility and it is necessary to be a trade-off [18,19]. In [19], the authors discussed a monotone trade-off in the semi-honest model. Therein, when the utility becomes worse, the privacy protection becomes stronger, and on the other hand, when the utility gets better, the privacy protection gets weaker. In many other DP theoretical studies, including strict

ϵ

-DP [17] and relaxed

(ϵ, δ)

-DP [20], they often provide privacy priority and then make data more available or the best available, which is a kind of trade-off with satisfying utility as much as possible under the privacy guarantee. Unfortunately, the applications of DP in real-world do not seem to follow this principle completely. One of the best examples is the four applications in Apple’s MacOS Sierra (version 10.12), i.e., Emojis, New words, Deeplinks and Lookup Hints. When they collect the data, the privacy budget is set to only 1 or 2 per each datum, but the overall privacy budget for the four applications is as high as 16 per day [21]. Furthermore, Apple renews the available privacy budget every day, which would result in a potential privacy loss of 16 times the number of days that a user participated in DP data collection for the four applications [21]. It is far beyond the reasonable protection scope of DP [22].

Based on the above facts, when there exists a contradiction between privacy and utility, privacy is no longer a priority as suggested in the DP theoretical studies, but the most desirable way is to balance the preference for privacy and utility, where the preference for privacy and utility is referred to as expected privacy protection (EPP) and expected data utility (EDU), respectively. However, few researchers have proposed solutions to reasonably balance EPP and EDU except the authors in [23]. They proposed an adaptive DP and its mechanisms in a rational model, which can achieve a balance between the approximate EDU and the EPP by adding conditional filtering noise [23]. If the privacy protection intensity under the balance of the approximate EDU that is satisfied by the data analyst is not the expectation of the data provider, then it still cannot meet the EPP of the data provider. In addition, the absolute value range of the conditional filtering noise belongs to (0.5,1.5), which makes it easy to be attacked by background knowledge. Therefore, best-effort differential privacy (B-DP) is proposed to make the EDU satisfied first and then the EPP satisfied as much as possible in this paper. We face the following two basic challenges at least.

If the EDU is to be satisfied first, then privacy protection may be no longer to be guaranteed by DP, how does it evaluate the guarantee degree of satisfying EPP as much as possible under B-DP?
If there is a reasonable metric for the guarantee degree of satisfying EPP as much as possible under B-DP, does it exist an implementation mechanism (or algorithm) to realize B-DP?

With the challenges of B-DP above, this paper explores a typical application with dynamic collection and publishing of continuous check-in data, where the check-in scenario is a semi-honest model with an honest but curious data collector. Each check-in user visiting a POI generates a check-in state and perturbs his check-in state to a POI Center for his privacy protection, where the POI Center is a data collector. The frequency of check-in users are calculated by POI Center according to the received check-in states, which is approximately the check-in data distribution and used for publishing to data analysts. We assume that one check-in state is perturbed to only one check-in state and each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing, and moreover, the privacy to be protected is the check-in state of a user and the utility to be realized is the distribution of the check-in data with relative error as its metrics (see Section 4.1 for more details). In fact, since the relative error is used as a metric of the published distribution, it needs a distribution dependent privacy protection mechanism (or implementation) in order to satisfy the EPP as much as possible under the constraint of EDU. In addition, since each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing, it needs a algorithm to make the privacy protection under the constraint of EDU to be satisfied continuously as much as possible in the process of dynamic publishing. Therefore, this mechanism or algorithm will be proposed from a new perspective, which is different from the existing methods in literature.

1.1. Our Contributions

The main contributions of this paper are concluded as follows.

A privacy protection concept of B-DP and two metrics of privacy guarantee degree are put forward. B-DP discussed in this paper is an expansion of the concept of DP, which can satisfy the EDU first and then provide the EPP as much as possible to be usefull for real-world applications. It uses two new metrics including the point belief degree (see Definition 4) and the regional average belief degree (see Definition 5) to quantify the degree of privacy protection for any expected privacy budget (see Section 4.2), rather than for DP itself by the privacy budget $ϵ$ to evaluate only one EPP with the expected privacy budget equal to $ϵ$ . In addition, the regional average belief degree can be used as the average guarantee degree of the EPP in a region including multiple expected privacy budgets. To the best of our knowledge, it is a new discussion and definition of B-DP that is different from the existing literature, and it uses two new metrics to explore and analyze the performance of privacy from a new perspective of the preference for privacy.
An EXP $_{Q}$ mechanism is proposed (see Definition 10). The newly constructed EXP $_{Q}$ mechanism can be used to the categorical data for privacy protection, which smartly alters the privacy budget based on its probability in the data distribution to make itself to realize a better B-DP compared to the existing KRR mechanism [24,25]. Thereby, it also verifies that B-DP can be better realized to provide a good trade-off between privacy and utility.
The dynamic algorithm with the implementation algorithms of two perturbation mechanisms is proposed to realize the dynamic collection and publishing of continuous check-in data and meanwhile to satisfy B-DP. The two perturbation mechanisms include the newly constructed EXP $_{Q}$ and a classical DP mechanism KRR [25,26] (a simple local differential privacy (LDP) mechanism). We take KRR as an example to show how to realize B-DP based on the existing DP mechanisms for the categorical data. Moreover, the number of domain values of both KRR and EXP $_{Q}$ is more than 2 and both the randomized algorithms based on them only take one value as input and one value as output. In addition, the dynamic algorithm can also be used to other applications of social behavior except check-in data.

1.2. Outline

The remainder of this paper is organized as follows: Section 2 summarizes the related work on the trade-off methods, utility metrics of relative error and LDP mechanisms. Section 3 presents conceptual background of DP and details of KRR mechanism and utility metrics. Section 4 introduces the system model, the relevant definitions of B-DP, including two metrics of the guarantee degree, etc., and model symbolization of the check-in data. Section 5 introduces the design and implementation of B-DP mechanisms and Section 6 describes the design of B-DP mechanism algorithm in the dynamic collection and publishing. Section 7 provides the experimental evaluation of the dynamic collection and publishing algorithm based on both two B-DP mechanisms. Finally, we provide a discussion and conclusion in Section 8.

2. Related Work

DP has become a research hotspot in the field of privacy protection since Dwork [12] proposed it in 2006. The model of DP starts from the traditional centralization [15,18], gradually grows to be distributed [27], and develops to be localization [24,28] and even to be personalized localization [29] and so on. It is not only the evolution process of DP technique, but also the comprehensive embodiment of the gradual integration of DP technique with real-world applications. However, no matter how it evolves, the two themes running through DP are privacy and utility [18], which is also focused by this paper. Table 1 summarizes the mainly related work from the pespective of privacy and utility priority as well as their metrics, the used privacy mechanism and the focusing problem with EPP and EDU. It will be divided into three categories to show its details.

Trade-off model with utility first. The majority of DP research is based on the trade-off model with privacy first, while there are few relevant ones on the trade-off model with utility first. Therein, Katrina et al. [30] proposed a generalized “noise reduction” framework based on the modified “Above Threshold” algorithm [33] to minimize the empirical risk of privacy (ERM) on the premise of utility priority, but the scheme is only applicable to the framework that minimizes the empirical risk of privacy, where the privacy minimized may not be able to meet the EPP. Liu et al. proposed firstly that DP satisfies the monotonic trade-off between privacy and utility and its associated bounded monotone trade-off under the semi-honest model. They showed that there is no trade-off under the rational model, while unilateral trade-off could lead to utility disaster or privacy disaster [18,23,34]. They also presented an adaptive DP and its mechanisms under the rational model, which can realize the trade-off between approximately EDU and EPP by adding conditional filtering noise [23], but the mechanisms are probably not able to meet the expectation of data provider for privacy protection and are easily attacked by background knowledge because of the adding conditional filtering noise. Most importantly, the above two utility-first research [23,30] do not provide a quantitative metrics of the unmet privacy protection or the unmet degree of EPP, whereas this paper presents two detailed quantitative metrics including the point belief degree and the regional average belief degree to evaluate the privacy from a new perspective of preference for privacy.
Utility metrics of relative error. Maryam et al. [31] presented DP in real-world applications, which discussed how to add Laplace [12] noise from a view of utility. They studied the relationship between the cumulative probability of noise and the privacy level in Laplace mechanism and combined with the relative error metrics to discuss how to use a DP mechanism reasonably without losing the established utility. However, the literature does not delve into the details that how the guarantee degree of privacy protection will be changed when utility is satisfied. Xiao et al. [18] presented a DP publishing algorithm on a batch query using resampling technique of correlation noise to reduce noise added and improve data utility. When the algorithm picks the priority items each time, it is based on the intermediate results with noise, and the intermediate results with noise are not enough to reflect the original order of data. In this way, there is a bias in adjusting the privacy budget allocation, which may cause the query items that should be optimized to be not optimized, thus affecting the utility of published data. However, the literature is a classical example of optimizing utility with privacy first, which runs counter to the theme of this paper. In addition, the above two schemes are essentially based on the central DP and use continuous Laplace mechanism, which are different from the LDP (discrete) data statistics and release required by the check-in application in this paper. Therefore, these schemes cannot be directly applied to the applications this paper considers.
LDP mechanisms. In 1965, Warner first proposed the randomized response technique (W-RR) to collect statistical data on sensitive topics and keep the sensitive data of contributing individuals confidential [35]. Although W-RR can strictly satisfy $ϵ$ -LDP [25] in one survey statistics, multiple collections on the same survey individuals will weaken the privacy protection intensity [12]. Therefore, Erlingsson et al. [28] used a double perturbation scheme combining permanent randomized response with instantaneous randomized response, namely, RAPPOR, to expand the application of W-RR, and it has been used by Google in Chrome browser to collect users’ behavior data. In addition, RAPPOR also uses Bloom Filter technology [36] as the encoding method, which maps the statistical attributes into a binary vector. Finally, the mapping relation and Lasso regression method [37] are combined to reconstruct the frequency statistics corresponding to the original attribute string. Due to the high communication cost of RAPPOR, Bassily et al. [32] proposed the S-Hist method. In the method, each user first encodes his attributes, then randomly selects one of the bits and uses the randomized response technique to perturb it, and finally sends the result of the perturbation to the data collector, so as to reduce the communication cost. Chen et al. [29] proposed a PCEP mechanism and designed a PLDP (personalized LDP) applied to spatial data with it, aiming to protect the users’ location information and count the number of users in the area. Therein, the privacy budget of the scheme is determined by the users’ personalization, and hence the utility depends on the users’ individual behavior settings. In addition, the mechanism combines the S-Hist [32] method and adopts the random projection technique [38]. Although it can greatly reduce the communication cost, it still has the problem of unstable query precision. Based on the check-in application with multiple check-in spots in this paper, the KRR mechanism [24,25] just easily fits this application with no prior data distribution knowledge, but it is not very good for B-DP. In addition, DP has already been studied in these applications, such as social networks [39,40], recommender systems [41], data publishing [42,43,44], deep learning [45], reinforcement learning [46] and federated learning [47].

3. Preliminaries

In this section, the key notations used in this paper are given in Table 2.

3.1. Differential Privacy (DP)

Differential privacy (DP), broadly speaking, is a privacy protection technique that does not depend on an attacker’s background knowledge and computational power [17,20,48]. It can be generally divided into central DP and LDP depending on whether it is based on a trusted data collector [33]. The formal definitions of these two types of DP are given as follows.

Definition 1

((

ϵ, δ

)-(Central) DP [17,20]). A randomized algorithm M and a set S of all possible outputs of M, for a given dataset D and any adjacent dataset

D^{'}

that differ on at most one record, if M satisfies the following inequality, then it is said that M satisfies

(ϵ, δ)

-(central) DP.

P [M (D) \in S] \leq e^{ϵ} \times P [M (D^{'}) \in S] + δ,

(1)

where

P [\cdot]

represents the risk of privacy disclosure and is controlled by the randomness of algorithm M, the parameter ϵ is called privacy budget that represents the level of privacy protection, and δ represents the probability of failure to satisfy ϵ-(central) DP. When

δ = 0

, M satisfies the ϵ-(central) DP.

Definition 2

(

(ϵ, δ)

-LDP [25,26]). A randomized algorithm

K

, for a given dataset χ, any

x, x^{'} \in χ

and any

y \in R a n g e (K)

, is said to satisfy

(ϵ, δ)

-LDP if

K

satisfies

P [K (x) = y] \leq e^{ϵ} \times P [K (x^{'}) = y] + δ,

(2)

where

P [\cdot], ϵ

and δ have the similar meanings as above in Definition 1.

In the check-in application of this paper, the POI Center is an honest and curious data collector, even if the POI Center or other attackers can obtain the check-in state submitted by a user, they cannot conclusively infer the original check-in state of the user. If

K

can satisfy

(ϵ, δ)

-LDP to protect the check-in state of the user, then it needs to meet the following definition.

Definition 3

(Check-in state of

(ϵ, δ)

-LDP). A user u generates a check-in in a POI, whose check-in state variable is denoted as

s^{u}

with

s^{u} \in {S_{1}, S_{2}, \dots, S_{n}}

. Assume that the original check-in state of u is

S_{j}

or

S_{j^{'}}

for

j, j^{'} \in [1, n]

. Moreover,

S_{j}

and

S_{j^{'}}

generate the same check-in state

S_{i}

for

i \in [1, n]

after being perturbed by a randomized algorithm

K

, respectively, and the perturbed check-in state variable is

{\tilde{s}}^{u}

with

{\tilde{s}}^{u} \in {S_{1}, S_{2}, \dots, S_{n}}

. If there exists an

ϵ \in R^{+}

such that

K

satisfies the following constraints for

i, j, j^{'} \in [1, n]

,

P ({\tilde{s}}^{u} = S_{i} | s^{u} = S_{j}) \leq e^{ϵ} P ({\tilde{s}}^{u} = S_{i} | s^{u} = S_{j^{'}}) + δ,

(3)

where,

P ({\tilde{s}}^{u} = S_{i} | s^{u} = S_{j})

and

P ({\tilde{s}}^{u} = S_{i} | s^{u} = S_{j^{'}})

are the perturbation probabilities of the original check-in states

S_{j}

and

S_{j^{'}}

to the check-in state

S_{i}

, respectively, then

K

will enable the check-in state to satisfy

(ϵ, δ)

-LDP. When

δ = 0

,

K

satisfies ϵ-LDP.

Property 1

(Parallel composition [49]). Assume that randomized algorithms are

K_{1}, K_{2}, \dots, K_{n}

and their privacy budgets are

ϵ_{1}, ϵ_{2}, \dots, ϵ_{n}

, respectively, then for the disjoint datasets

D_{1}, D_{2}, \dots,

D_{n}

, an algorithm

K (K_{1} (D_{1}), K_{2} (D_{2}), \dots, K_{n} (D_{n}))

provides max

(ϵ_{i})

-(local) DP, and the level of privacy protection it provides depends on the largest privacy budget.

3.2. KRR Mechanism

KRR is a LDP mechanism [24,25], which satisfies the following probability distribution,

P (y | x) = \frac{1}{e^{ϵ} + k - 1} \{\begin{matrix} e^{ϵ}, y = x \\ 1, y \neq x \end{matrix}

(4)

where

x, y \in χ

and

| χ | = k .

KRR is a more general form of the randomized response mechanism of W-RR, that is, when

k = 2

, KRR degenerates into W-RR.

3.3. Utility Metrics

In this paper, the worst relative error of POIs in check-in statistics will be used to measure the overall utility of the check-in application, where the calculation formula of the relative error of

P O I_{i}

is as follows,

e r r (r_{i}, r_{i}^{*}) = \frac{| r_{i} - r_{i}^{*} |}{m a x {r_{i}, ϕ}},

(5)

where

r_{i}^{*}

is the estimated result of

P O I_{i}

check-in statistics after LDP protection,

r_{i}

is the real check-in result of the

P O I_{i}

, and the parameter

ϕ

is a constant to avoid the situations that

r_{i} = 0

causes the denominator to be 0 or

r_{i}

is too small [18,50,51]. For the convenience of analysis, this paper will use the relative root mean square error for the utility metrics approximately, the specific formula is as follows,

e r r (r_{i}, r_{i}^{*}) = \frac{\sqrt{ξ (r_{i}, r_{i}^{*})}}{m a x {r_{i}, ϕ}},

(6)

where

ξ (r_{i}, r_{i}^{*})

is the expectation of the mean square error between the real statistical result

r_{i}

and the statistical estimate result

r_{i}^{*}

after LDP protection, and the parameter

ϕ

is defined as above. Then, the formula for calculating the maximum relative error of n POIs is as follows,

e r r (r, r^{*}) = m a x (e r r (r_{i}, r_{i}^{*})) = m a x (\frac{\sqrt{ξ (r_{i}, r_{i}^{*})}}{m a x {r_{i}, ϕ}}) .

(7)

As above,

r_{i}, r_{i}^{*}

not only can represent data distribution, but also can represent frequency or counts.

4. Problem Formulations

4.1. System Model

As shown in Figure 1, there are three types of participants, namely, check-in users (data providers), POI Center (data collector), and data analysts (for example, POI managers) in the check-in model. Each check-in user visiting a POI generates a check-in state and sends it to POI Center through a terminal with the check-in APP, where each check-in state corresponds to a count and the check-in state belongs to catagory data. POI Center calculates the counts and frequency of check-in users’ visiting POIs according to the received check-in states, where frequency is approximately the check-in data distribution and used for publishing to data analysts. In addition, it is assumed that each check-in user is independent of each other and only one check-in state is submitted by a check-in user in one publishing. It is also assumed that the check-in scenario is a semi-honest model, in which the POI Center is an honest but curious data collector, and the check-in state of a user is sensitive. Hence, a user will adopt a perturbation mechanism (for example, LDP mechanism) to perturb his check-in state for his privacy protection, and then sends it to the POI Center. Therein, it is assumed that one check-in state is perturbed to only one check-in state.

In this paper, we focus on the dynamic collection and publishing of continuous check-in data with both privacy and utility requirements, where the privacy to be protected is the check-in state of a user and the utility to be realized is the distribution of the check-in data with relative error as its metrics. Therein, the privacy refers to EPP, which is the preference for privacy of a user, and the utility refers to EDU, which is the preference for utility of a data analyst. Moreover, each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing. Thereby, we adopt B-DP based on the LDP model including perturbation, aggregation, reconstruction and publishing, and we also need to have the process of initializing or updating the perturbation mechanism K at least to make every publishing to satisfy the EPP as much as possible under the EDU satisfied first in the dynamic publishing, as shown in Figure 1.

4.2. The Related Concepts of B-DP

In the concept of best-effort differential privacy (B-DP), there is an expected privacy protection (EPP) and an expected data utility (EDU), respectively. When the two cannot be satisfied simultaneously, the EDU should be satisfied first and the EPP should be satisfied as much as possible. Since the protection level of DP is evaluated by privacy budget [17], the preference for privacy also refers to the preference for the privacy budget in the B-DP. Hence, the EPP refers to a data provider’s preference for the privacy budget and we define this privacy budget as the expected privacy budget symbolized as

ϵ_{e}

. We use

R e g i o n (ϵ_{e})

to symbolize the expected privacy protection region, which refers to a data provider’s preference for a region including multiple expected privacy budgets.

We use

η

to symbolize the EDU. In this paper, the expectation of the maximum relative error of Formula (7) is used to measure data utility. When the expectation of the maximum relative error of Formula (7) is less than or equal to

η

, it means that the EDU is satisfied; when equal, it means that the EDU is just satisfied. The privacy budget of a DP mechanism that just satisfies the EDU

η

is symbolized as

ϵ_{η}

.

Definition 4

(

C_{ϵ_{e}}

-Point belief degree). It defines the guarantee degree of EPP under the expected privacy budget

ϵ_{e}

, which can be provided by the

ϵ_{η}

-DP mechanism, as the point belief degree, and the symbol is denoted as

C_{ϵ_{e}}

. Moreover,

C_{ϵ_{e}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})

, where n represents the number of POIs in check-in application,

{\tilde{p}}_{i}

represents the probability of

P O I_{i}

perturbed by

ϵ_{η}

-DP mechanism,

ϵ_{i}

represents the actual privacy budget of

P O I_{i}

, and

χ (ϵ_{i}, ϵ_{e})

represents an indicator function for whether the EPP is satisfied, which is defined as follows,

χ (ϵ_{i}, ϵ_{e}) = \{\begin{matrix} 0, ϵ_{i} > ϵ_{e} \\ 1, ϵ_{i} \leq ϵ_{e} \end{matrix} .

(8)

Definition 5

(

C_{R e g i o n (ϵ_{e})}

-Regional average belief degree). The average guarantee degree of the EPP under the expected privacy protection region

R e g i o n (ϵ_{e})

, which can be provided by the

ϵ_{η}

-DP mechanism, is defined as the regional average belief degree, and the symbol is denoted as

C_{R e g i o n (ϵ_{e})}

. When

R e g i o n (ϵ_{e}) = {ϵ_{e_{1}}, ϵ_{e_{2}}, \dots, ϵ_{e_{K}}}

and

ϵ_{e_{1}} < ϵ_{e_{2}} < \dots < ϵ_{e_{K}}

for

K \geq 2

, it defines

C_{R e g i o n (ϵ_{e})} = \frac{1}{ϵ_{e_{K}} - ϵ_{e_{1}}} \sum_{k = 1}^{K - 1} (ϵ_{e_{k + 1}} - ϵ_{e_{k}}) C_{ϵ_{e_{k}}},

(9)

where

C_{ϵ_{e_{k}}}

can refer the definition of point belief degree.

Definition 6

(

(ϵ_{η}, C_{ϵ_{e}})

-B-DP). The DP mechanism that just satisfies the EDU η with the point belief degree

C_{ϵ_{e}}

of the expected privacy budget

ϵ_{e}

is defined as

(ϵ_{η}, C_{ϵ_{e}})

-B-DP. Therein, the

(ϵ_{η}, C_{ϵ_{e}})

-B-DP, where the point belief degree

C_{ϵ_{e}}

is maximum, is defined as

(ϵ_{η}, C_{ϵ_{e}})

-Best-B-DP.

Definition 7

(

(ϵ_{η}, C_{R e g i o n (ϵ_{e})})

-B-DP). The DP mechanism that just satisfies the EDU η with the regional average belief degree

C_{R e g i o n (ϵ_{e})}

of the expected privacy protection region

R e g i o n (ϵ_{e})

is defined as

(ϵ_{η}, C_{R e g i o n (ϵ_{e})})

-B-DP. Therein, the

(ϵ_{η},

C_{R e g i o n (ϵ_{e})})

-B-DP, where the regional average belief degree

C_{R e g i o n (ϵ_{e})}

is maximum, is defined as

(ϵ_{η}, C_{R e g i o n (ϵ_{e})})

-Best-B-DP.

Note that, generally, B-DP includes both central B-DP and local B-DP, which depends on whether it is based on a trusted data collector the same as the DP. This paper focuses on local B-DP.

4.3. Model Symbolization

Let

P O I_{i}

with

i \in [1, n]

represent n POIs in check-in scenario, and the check-in state space is

S = {S_{1}, S_{2}, \dots, S_{n}}

where

S_{i}

is the check-in state of

P O I_{i}

. Let

s^{u}, {\tilde{s}}^{u}, {\hat{s}}^{u} \in S

be variables of the original check-in state, the perturbed check-in state and the estimated check-in state of the user u, respectively. Let

p, \tilde{p}

and

\hat{p}

be the probability distributions of the original check-ins, the perturbed check-ins and the estimated check-ins, respectively, where

p = {[p_{1}, p_{2}, \dots, p_{n}]}^{T}

,

\tilde{p} = {[{\tilde{p}}_{1}, {\tilde{p}}_{2}, \dots, {\tilde{p}}_{n}]}^{T}

and

\hat{p} = {[{\hat{p}}_{1}, {\hat{p}}_{2}, \dots, {\hat{p}}_{n}]}^{T}

. Assume that it is the same probability distribution law for all the users, that is,

p_{i} = P (s^{u} = S_{i})

,

{\tilde{p}}_{i} = P ({\tilde{s}}^{u} = S_{i})

and

{\hat{p}}_{i} = P ({\hat{s}}^{u} = S_{i})

for any

i \in [1, n]

and u.

h (S) = {[h (S_{1}), h (S_{2}), \dots, h (S_{n})]}^{T}

,

\tilde{h} (S) = {[\tilde{h} (S_{1}), \tilde{h} (S_{2}), \dots, \tilde{h} (S_{n})]}^{T}

and

\hat{h} (S) = {[\hat{h} (S_{1}), \hat{h} (S_{2}), \dots, \hat{h} (S_{n})]}^{T}

represent the original check-in counts vector, the perturbed check-in counts vector and the estimated check-in counts vector with

m \in N^{+}

users, respectively.

Definition 8

(Random perturbation and perturbation probability matrix Q). The process for any user u to change check-in state from

S_{j}

to

S_{i}

with a certain perturbation probability is called random perturbation, and the perturbation probability is denoted as

q_{i j} = P ({\tilde{s}}^{u} = S_{i} | s^{u} = S_{j})

with

S_{i}, S_{j} \in S

. The matrix composed of

q_{i j}

for any

i, j \in [1, n]

is called the perturbation probability matrix Q, where

Q = {(q_{i j})}_{n \times n}

and

\sum_{i = 1}^{n} q_{i j} = 1

for any

j \in [1, n]

.

Therefore, the perturbed probability distribution

\tilde{p}

, the original probability distribution

p

and the perturbation probability matrix Q have the following relationship

\begin{matrix} \tilde{p} & = Q p . \end{matrix}

(10)

From Equation (10), it can be seen that

\tilde{p_{i}} = \sum_{j = 1}^{n} q_{i j} p_{j}

for any

i \in [1, n]

. Obviously,

\tilde{p_{i}}

and

p_{i}

are not always equal, and hence the result of the perturbation is biased. Assume that Q is always reversible and its inverse matrix is defined as

R = Q^{- 1} = {(r_{i j})}_{n \times n}

. Therefore, it can get the following theorem.

Theorem 1.

The check-in counts vector

h (S)

is perturbed by the perturbation probability matrix Q to obtain the perturbed check-in counts vector

\tilde{h} (S)

and then it is corrected by the inverse matrix R. The estimated check-in counts vector

\hat{h} (S) = R \tilde{h} (S)

satisfies

E [\hat{h} (S)] = h (S)

.

Proof.

Since

E [\hat{h} (S)] = E [R \tilde{h} (S)] = E [R Q h (S)]

, and

R Q = I

, it has

E [\hat{h} (S)] = E [h (S)] = h (S)

. Therefore, the result follows. □

Theorem 1 states that the estimated check-in counts vector

\hat{h} (S)

obtained after the correction of the inverse matrix R is an unbiased estimate of the original check-in counts vector

h (S)

.

Here, the relative root mean square error

e r r (h (S_{i}), \hat{h} (S_{i}))

of the original check-in counts

h (S_{i})

and the estimated check-in counts

\hat{h} (S_{i})

for n POIs and m users on

P O I_{i}

can be calculated as follows,

\begin{matrix} e r r (h (S_{i}), \hat{h} (S_{i})) = \frac{\sqrt{V a r [\hat{h} (S_{i})]}}{E [\hat{h} (S_{i})]}, \end{matrix}

(11)

where

ϕ = 1

and

V a r [\hat{h} (S_{i})]

can be calculated as follows,

\begin{matrix} V a r [\hat{h} (S_{i})] = \sum_{j = 1}^{n} r_{i j}^{2} (\sum_{k = 1}^{n} q_{j k} h (S_{k})) - h (S_{i}) . \end{matrix}

(12)

Theorem 2.

The relative root mean square error between the original probability

p_{i}

of

P O I_{i}

check-ins and the estimated probability

{\hat{p}}_{i}

of

P O I_{i}

check-ins is

e r r (p_{i}, {\hat{p}}_{i}) = \frac{\sqrt{V a r [\hat{h} (S_{i})]}}{E [\hat{h} (S_{i})]}

, where

i \in [1, n] .

Proof.

The relative root mean square error between the original probability

p_{i}

of

P O I_{i}

check-ins and estimated probability

{\hat{p}}_{i}

of

P O I_{i}

check-ins can be represented as follows,

\begin{matrix} e r r (p_{i}, {\hat{p}}_{i}) = \frac{\sqrt{E [{(p_{i} - {\hat{p}}_{i})}^{2}]}}{p_{i}} = e r r (h (S_{i}), \hat{h} (S_{i})) . \end{matrix}

(13)

Therefore,

e r r (p_{i}, {\hat{p}}_{i}) = \frac{\sqrt{V a r [\hat{h} (S_{i})}]}{E [\hat{h} (S_{i})]}

, where

i \in [1, n] .

□

According to Formulas (11)–(13), it can be known

e r r (p_{i}, {\hat{p}}_{i})

, Q and

p

are related.

If

m a x (e r r (p_{i}, {\hat{p}}_{i})) = η

means that the EDU is just satisfied, then Q should satisfy the following constraints according to B-DP.

\begin{matrix} m a x (e r r (p_{i}, {\hat{p}}_{i})) \leq η, i \in [1, n], \\ q_{i j} \leq e^{ϵ_{i}} q_{i j^{'}}, \forall i, j, j^{'} \in [1, n], ϵ_{i} \geq 0, \\ \sum_{i = 1}^{n} q_{i j} = 1, \forall j \in [1, n] . \end{matrix}

(14)

Assume that there is a randomized algorithm

K

with a perturbation probability matrix Q, which can provide the expected privacy budget

ϵ_{e}

with the point belief degree

C_{ϵ_{e}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})

. Then, if

K

wants to satisfy

(ϵ_{η} = m a x (ϵ_{i}), C_{ϵ_{e}})

-Best-B-DP, it should still need to maximize

C_{ϵ_{e}}

. Therefore,

K

should satisfy the following optimization problem.

\begin{matrix} \underset{Q}{m a x i m i z e} C_{ϵ_{e}} \\ s . t . m a x (e r r (p_{i}, {\hat{p}}_{i})) \leq η, i \in [1, n], \\ q_{i j} \leq e^{ϵ_{i}} q_{i j^{'}}, \forall i, j, j^{'} \in [1, n], ϵ_{i} \geq 0, \\ \sum_{i = 1}^{n} q_{i j} = 1, \forall j \in [1, n] . \end{matrix}

(15)

Similarly, it is assumed that

K

can provide the expected privacy protection region

R e g i o n (ϵ_{e}) = {ϵ_{e_{1}}, ϵ_{e_{2}}, \dots, ϵ_{e_{K}}}

with the regional average belief degree

C_{R e g i o n (ϵ_{e})} = \frac{1}{ϵ_{e_{K}} - ϵ_{e_{1}}} \sum_{k = 1}^{K - 1} (ϵ_{e_{k + 1}} - ϵ_{e_{k}}) C_{ϵ_{e_{k}}}

, where

C_{ϵ_{e_{k}}}

is the point belief degree of the expected privacy budget

ϵ_{e_{k}}

. Then, if

K

wants to satisfy

(ϵ_{η} = m a x (ϵ_{i}), C_{R e g i o n (ϵ_{e})})

-Best-B-DP, it should still need to maximize

C_{R e g i o n (ϵ_{e})}

. Therefore,

K

should satisfy the following optimization problem.

\begin{matrix} \underset{Q}{m a x i m i z e} C_{R e g i o n (ϵ_{e})} \\ s . t . m a x (e r r (p_{i}, {\hat{p}}_{i})) \leq η, i \in [1, n], \\ q_{i j} \leq e^{ϵ_{i}} q_{i j^{'}}, \forall i, j, j^{'} \in [1, n], ϵ_{i} \geq 0, \\ \sum_{i = 1}^{n} q_{i j} = 1, \forall j \in [1, n] . \end{matrix}

(16)

From the above optimization Equations (15) and (16), in each optimization problem, it can be concluded that the perturbation probability matrix Q contains

n^{2}

unknown variables,

n^{3}

inequality constraints, n equality constraints and one EDU

η

constraint. Therefore, directly solving the optimization problem is a huge challenge, especially in the case of a large domain size n. Therefore, two simplified models are considered in this paper and we will present the details one by one in the following section.

5. Design and Implementation of B-DP Mechanism

This section includes the design of two B-DP mechanisms and their implementation algorithms. One is based on a classical LDP mechanism KRR, and the other is based on the newly constructed mechanism EXP

_{Q}

in this paper. The number of domain values of both two mechanisms is more than 2. Moreover, we combine three data distributions with the typical non-uniformity and two B-DP mechanisms to directly show and analyze the two metrics proposed in this paper, including the point belief degree and the regional average belief degree.

5.1. B-DP Mechanism Based on KRR

Without prior knowledge of the data distribution, we assume that it is uniform. Setting

ϵ_{i} = ϵ_{j} = ϵ_{η},

q_{i i} = p, q_{i j} = q

for

i, j \in [1, n],

j \neq i

and

q_{i i} \geq q_{i j}

, then it has

p = \frac{e^{ϵ_{η}}}{e^{ϵ_{η}} + n - 1}, q = \frac{1}{e^{ϵ_{η}} + n - 1}

, and

e r r (p, \hat{p}) = η

. Therefore, the privacy budget of KRR here is not arbitrary, which is constrained by the EDU of

e r r (p, \hat{p}) = η

.

Definition 9

(

ϵ_{η}

-KRR). KRR that just meets the EDU η is called

ϵ_{η}

-KRR.

Here, it can also derive the following theorem.

Theorem 3.

If there exists

ϵ_{η}

-KRR, then it satisfies

ϵ = ϵ_{η}

-LDP. Moreover, the point belief degree of

ϵ_{η}

-KRR is

C_{ϵ_{e}} = \{\begin{matrix} 0, ϵ_{e} < ϵ_{η} \\ 1, ϵ_{e} \geq ϵ_{η} \end{matrix} .

(17)

Proof.

If there exists

ϵ_{η}

-KRR, then it has

p = \frac{e^{ϵ_{η}}}{e^{ϵ_{η}} + n - 1}, q = \frac{1}{e^{ϵ_{η}} + n - 1}

. Since

ϵ = l n (\frac{p}{q}) = ϵ_{η}

, it satisfies

ϵ = ϵ_{η}

-LDP. According to

C_{ϵ_{e}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})

, when

ϵ_{e} < ϵ_{η}

,

χ (ϵ_{i}, ϵ_{e}) = χ (ϵ_{η}, ϵ_{e}) = 0

, it has

C_{ϵ_{e}} = 0

; when

ϵ_{e} \geq ϵ_{η}

,

χ (ϵ_{i}, ϵ_{e}) = χ (ϵ_{η}, ϵ_{e}) = 1

, it has

C_{ϵ_{e}} = 1

. Hence, the result follows. □

Thus, it is impossible for

ϵ_{η}

-KRR to provide the EPP

ϵ_{e}

, or provide the EPP

ϵ_{e}

with a 100% satisfaction. Therefore, what KRR can achieve is two distinct jumps of EPP with or without guarantee, that is, it is not a good B-DP mechanism.

5.2. B-DP Mechanism Based on EXP $_{Q}$

Since relative error is used as utility metrics about the check-in data distribution in this paper, and a privacy budget of DP usually determines absolute error which is the numerator of relative error, thus the privacy budget of every POI should vary with its probability in the distribution of check-ins, that is, the value of the privacy budget should be reduced when the corresponding probability in the data distribution becomes larger, and increased when the corresponding probability in the data distribution becomes smaller. In this way, the small amounts of check-ins can also satisfy the EDU, while the large amounts of check-ins can also satisfy the EPP in priority, so as to better realize B-DP. It defines the following perturbation mechanism EXP

_{Q}

.

Definition 10

(Perturbation mechanism EXP

_{Q}

). Given the data distribution

p = [p_{1}, p_{2}, \dots,

p_{n}]^{T}

, where

p_{1} \geq p_{2} \geq \dots \geq p_{n}

. Call the randomized algorithm with Q as perturbation mechanism EXP

_{Q}

if Q satisfies

q_{i j} \propto e^{- γ u (j, i)}

, where

q_{i j}

is the probability that the check-in state is perturbed from

S_{j}

to

S_{i}

for

i, j \in [1, n]

and

γ \geq 0

is the privacy setting parameter, where

u (j, i)

satisfies Formulas (18)–(20), and

κ_{n} \in [0, n]

is the parameter of privacy protection intensity change point.

(i)

When

κ_{n} = 0

,

u (j, i) = \{\begin{matrix} 0, i = j \\ 1 + p_{n - i + 1}, i \neq j \end{matrix} .

(18)

(i i)

When

κ_{n} \in [1, n - 1]

,

u (j, i) = \{\begin{matrix} 0, i = j \\ 1 - p_{i}, i \neq j and i \leq κ_{n} \\ 1 + p_{n - i + κ_{n} + 1}, i \neq j and i > κ_{n} \end{matrix} .

(19)

(i i i)

When

κ_{n} = n

,

u (j, i) = \{\begin{matrix} 0, i = j \\ 1 - p_{i}, i \neq j \end{matrix} .

(20)

Theorem 4.

In perturbation mechanism EXP

_{Q}

, its Q satisfies the following properties.

(i)

When

κ_{n} = 0

and the normalized factor perturbing from

P O I_{j}

to a POI is

Ω_{j} = 1 + \sum_{k = 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + 1})}

for

i, j \in [1, n]

,

q_{i j} = \{\begin{matrix} \frac{1}{Ω_{j}}, i = j \\ \frac{e^{- γ (1 + p_{n - i + 1})}}{Ω_{j}}, i \neq j \end{matrix} .

(21)

(i i)

When

κ_{n} \in [1, n - 1]

and the normalized factor perturbing from

P O I_{j}

to a POI is

Ω_{j} = 1 + \sum_{k = 1, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}

for

i, j \in [1, n]

,

q_{i j} = \{\begin{matrix} \frac{1}{Ω_{j}}, i = j \\ \frac{e^{- γ (1 - p_{i})}}{Ω_{j}}, i \neq j and i \leq κ_{n} \\ \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{Ω_{j}}, i \neq j and i > κ_{n} \end{matrix} .

(22)

(i i i)

When

κ_{n} = n

and the normalized factor perturbing from

P O I_{j}

to a POI is

Ω_{j} = 1 + \sum_{k = 1, k \neq j}^{n} e^{- γ (1 - p_{k})}

for

i, j \in [1, n]

,

q_{i j} = \{\begin{matrix} \frac{1}{Ω_{j}}, i = j \\ \frac{e^{- γ (1 - p_{i})}}{Ω_{j}}, i \neq j \end{matrix} .

(23)

Proof.

According to Definition 10, it can be seen that

q_{i j} \propto e^{- γ u (j, i)}

is the probability that the check-in state of

P O I_{j}

is perturbed to that of

P O I_{i}

for

i, j \in [1, n]

. Moreover, since

\sum_{k = 1}^{n} q_{k j} = 1

, it is easy to obtain the result of Theorem 4. □

Definition 11.

(

ϵ_{η}

-EXP

_{Q}

) EXP

_{Q}

that just satisfies the EPU η is called

ϵ_{η}

-EXP

_{Q}

where

ϵ_{η} = m a x_{1 \leq i \leq n} (ϵ_{i})

and

ϵ_{i}

is the actual privacy budget for each

P O I_{i}

, that is,

e^{- ϵ_{i}} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{ϵ_{i}}

for

j, j^{'} \in [1, n]

.

Theorem 5.

Based on the definition of

ϵ_{η}

-EXP

_{Q}

,

(1) if there exists

ϵ_{η}

-EXP

_{Q}

, then it satisfies

ϵ = ϵ_{η}

-LDP;

(2) when

κ_{n}

is fixed and the point belief degree of

ϵ_{η}

-EXP

_{Q}

is

C_{ϵ_{e}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})

, where

ϵ_{i}

is the actual privacy budget for each

P O I_{i}

,

ϵ_{η}

-EXP

_{Q}

is the approximately optimal

(ϵ_{η}, C_{ϵ_{e}})

-Best-B-DP, where the indicator function

χ (ϵ_{i}, ϵ_{e})

is

χ (ϵ_{i}, ϵ_{e}) = \{\begin{matrix} 0, ϵ_{i} > ϵ_{e} \\ 1, ϵ_{i} \leq ϵ_{e} \end{matrix};

(24)

(3) if there exists

ϵ_{η}

-EXP

_{Q}

and its point belief degree is

C_{ϵ_{e}}

, then it satisfies

(ϵ_{e}, 1 - C_{ϵ_{e}})

-LDP.

Proof.

See Appendix A. □

5.3. Implementation of B-DP Machanism

For the check-ins scenario, two B-DP mechanisms based on KRR and EXP

_{Q}

are proposed and realized in this paper. KRR is one of the classical mechanisms of LDP, but it cannot well realize B-DP. EXP

_{Q}

is newly proposed in this paper, which can not only provide the protection of approximately optimal

(ϵ_{η}, C_{ϵ_{e}})

-Best-B-DP, but also provides the protection of relaxed

(ϵ_{e}, 1 - C_{ϵ_{e}})

-LDP to satisfy the EDU. The pseudo codes of the two B-DP mechanisms are given in Algorithms 1 and 2, respectively.

Algorithm 1 B-DP machanism based on KRR.

Input: Probability distribution

p = {[p_{1}, p_{2}, \dots, p_{n}]}^{T}

, sample size m and expected data utility (EDU)

η

Output: Privacy budget

ϵ_{η}

and perturbation probability matrix Q

1:: Initialize $ϵ_{η} > 0$ , iteration step size $Δ ϵ_{η} > 0$ and the worst utility $M a x R E = 1$ ;
2:: while $M a x R E > η$ do
3:: Q is constructed by KRR with $ϵ_{η}$ ;
4:: According to the relative error formula, the current worst relative error CurrentMaxRE $= m a x_{i \in [1, n]} (\frac{\sqrt{\sum_{j = 1}^{n} r_{i j}^{2} (\sum_{k = 1}^{n} q_{j k} p_{k}) - m p_{i}}}{m p_{i}})$ is obtained, where $R = {(r_{i^{'} j^{'}})}_{n \times n} = Q^{- 1},$ $Q = {(q_{i^{'} j^{'}})}_{n \times n}$ , referred to Formulas (11)–(12) and Theorems 1 and 2 for details.
5:: if $C u r r e n t M a x R E < M a x R E$ then
6:: $M a x R E = C u r r e n t M a x R E$ ;
7:: end if
8:: if $M a x R E > η$ then
9:: $ϵ_{η} = ϵ_{η} + Δ ϵ_{η}$ ;
10:: end if
11:: end while
12:: return $ϵ_{η}$ and Q

Algorithm 2 B-DP mechanism based on EXP

_{Q}

.

Input: Probability distribution

p = {[p_{1}, p_{2}, \dots, p_{n}]}^{T}

, sample size m, expected data utility (EDU)

η

, expected privacy budget

ϵ_{e}

(or expected privacy protection region

R e g i o n (ϵ_{e}) = {ϵ_{e_{1}}, ϵ_{e_{2}}, \dots, ϵ_{e_{K}}}

with

ϵ_{e_{1}} < ϵ_{e_{2}} < \dots < ϵ_{e_{K}}

)

Output: Privacy setting parameter

γ_{η}

, the parameter of privacy protection intensity change point

κ_{n}

, perturbation probability matrix Q and actual privacy budget

ϵ_{i}

of

P O I_{i}

for

i \in [1, n]

1:: Initialize privacy setting parameter $γ_{0} > 0$ and the iteration step size $Δ γ_{η} > 0$ ;
2:: Initialize $κ_{n} = n$ and $t a g = 0$ , where $t a g$ is used to identify whether it exists a comparatively reasonable result or not;
3:: while $κ_{n} \geq 0$ do
4:: Initialize $γ_{η} = γ_{0}$ , $ϵ_{i} = 0$ for $i \in [1, n]$ , the worst utility $M a x R E = 1$ and $C_{ϵ_{e}} = 0$ (here, the initialization of regional average belief degree $C_{R e g i o n (ϵ_{e})}$ is also uniformly recorded as $C_{ϵ_{e}} = 0$ );
5:: while $M a x R E > η$ do
6:: Q is constructed by EXP $_{Q}$ with $p, γ = γ_{η}$ and $κ_{n}$ , where the row represents the perturbed check-in state and the column represents the original check-in state;
7:: According to Q, use $l n \frac{m a x (q_{i j})}{m i n (q_{i j^{'}})}$ to update the value $ϵ_{i}$ , where $Q = {(q_{i j})}_{n \times n}$ and $i, j, j^{'} \in [1, n]$ ;
8:: According to the relative error formula, the current worst relative error is obtained $C u r r e n t M a x R E = m a x_{i \in [1, n]}$ $(\frac{\sqrt{\sum_{j = 1}^{n} r_{i j}^{2} (\sum_{k = 1}^{n} q_{j k} p_{k}) - m p_{i}}}{m p_{i}})$ , where $R = {(r_{i^{'} j^{'}})}_{n \times n} = Q^{- 1}$ , referred to Formulas (11)–(12) and Theorems 1 and 2 for details;
9:: if $C u r r e n t M a x R E < M a x R E$ then
10:: $M a x R E = C u r r e n t M a x R E$ ;
11:: end if
12:: if $M a x R E > η$ then
13:: $γ_{η} = γ_{η} + Δ γ_{η}$ ;
14:: end if
15:: end while
16:: if $ϵ_{i}$ is not all zero for $i \in [1, n]$ then
17:: Calculate the current point belief degree value according to $C_{ϵ_{e}}^{*} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})$ $= \sum_{i = 1}^{n} \sum_{j = 1}^{n}$ $q_{i j} p_{j} χ (ϵ_{i}, ϵ_{e})$ , and set $C_{ϵ_{e}}^{♯} = C_{ϵ_{e}}^{*}$ (or, calculate the current regional average belief degree value according to $C_{R e g i o n (ϵ_{e})}^{*} = \frac{1}{ϵ_{e_{K}} - ϵ_{e_{1}}} \sum_{k = 1}^{K - 1}$ $(ϵ_{e_{k + 1}} - ϵ_{e_{k}}) C_{ϵ_{e_{k}}}$ , where $C_{ϵ_{e_{k}}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e_{k}}) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} q_{i j} p_{j} χ (ϵ_{i}, ϵ_{e_{k}})$ , and set $C_{ϵ_{e}}^{♯} = C_{R e g i o n (ϵ_{e})}^{*}$ );
18:: if $C_{ϵ_{e}} < C_{ϵ_{e}}^{♯}$ then
19:: Update $t a g = 1$ , $C_{ϵ_{e}} = C_{ϵ_{e}}^{♯}$ , $γ_{o p t} = γ_{η}$ , $κ_{o p t} = κ_{n}$ , $Q_{o p t} = Q$ and $ϵ_{i}^{o p t} = ϵ_{i} (i \in [1, n])$ ;
20:: end if
21:: end if
22:: Update $κ_{n} = κ_{n} - 1$ ;
23:: end while
24:: if $t a g = 0$ then
25:: Update $κ_{n} = 0$ , and update the value $ϵ_{i}$ according to Step 7;
26:: else
27:: Record $γ_{η} = γ_{o p t}$ , $κ_{n} = κ_{o p t}$ , $Q = Q_{o p t}$ and $ϵ_{i} = ϵ_{i}^{o p t} (i \in [1, n])$ ;
28:: end if
29:: return $γ_{η}$ , $κ_{n}$ , Q and $ϵ_{i}$ for $i \in [1, n]$

5.4. Case Analysis of Point Belief Degree and Regional Average Beleif Degree

The above description theoretically analyzes two metrics, including the point belief degree and the regional average belief degree, on the two B-DP mechanisms based on KRR and EXP

_{Q}

. In order to show the two metrics more clearly, the following of this section will use three data distributions with typical non-uniformity for analysis. For simplicity, in the following of this section, the KRR-based B-DP mechanism is represented by KRR and the EXP

_{Q}

-based B-DP mechanism is represented by EXP

_{Q}

, including the diagram descriptions.

(1) Three data distributions with typical non-uniformity.

The data distribution in this section is set as Pareto distribution, where the discrete case of Pareto distribution satisfies

p_{j} \propto \frac{1}{x_{j}^{θ + 1}}

for

j \in [1, n],

θ > 0

and

x_{j} > 0

. Three data distributions of

n = 20

,

θ =

1.55, 1.17 and 0.52 are shown in Figure 2 and are, respectively, denoted as

P 1

,

P 2

and

P 3

, where

x_{j} = x_{1} + (j - 1) Δ x, x_{1} = 2, Δ x = 0.2

. Figure 2 shows both ordered and disordered cases of Pareto distribution, where the disordered case illustrates that the identification of scenic spots is independent of the order of probability. It also shows the corresponding Gini coefficient of

P 1

,

P 2

and

P 3

, which is calculated according to the method of Gini mean difference [52]. Gini coefficient is used to indicate the degree of unevenness of data distribution. There exists a quantitative relationship between Pareto distribution parameter and Gini coefficient in Table 3. As shown in Figure 2, the data distribution of

θ = 1.55

is pretty uneven, and the data distribution of

θ = 1.17

is relatively reasonably uneven, while the data distribution of

θ = 0.52

is relatively even.

(2) Point belief degree

In the point belief degree

C_{ϵ_{e}}

in KRR, let

ϵ_{e} = ϵ_{η}

(

ϵ_{η}

is used as the expected privacy budget or used as a basis for division of the expected privacy protection region, just for better comparison between KRR and EXP

_{Q}

) determined by

ϵ_{η}

-KRR (see definition of

ϵ_{η}

-KRR and Algorithm 1 for details), which equals the

ϵ_{e}

-coordinate of the jump point shown by the dotted line in Figure 3. It is also combined with the same

ϵ_{e}

and

η

to determine the perturbation probability and related parameters with EXP

_{Q}

(see Algorithm 2 for details). For example, when the EDU

η = 0.1

, the point belief degree

C_{ϵ_{e}}

of KRR and EXP

_{Q}

is shown in Figure 3.

From Figure 3, it can be seen that under the same EDU, if the expected privacy budget of any data provider is

ϵ_{e} \geq ϵ_{η}

, KRR can provide

ϵ_{e}

-DP with belief degree of 1. On the other hand, if the expected privacy budget of any data provider is

ϵ_{e} < ϵ_{η}

, its belief degree is 0. However, in EXP

_{Q}

, if the expected privacy budget of any data provider is

ϵ_{e} \geq ϵ_{η}

, it indicates that it cannot satisfy the EPP when

ϵ_{e}

is closer to

ϵ_{η}

, and when

ϵ_{e}

is large enough, it can also provide

ϵ_{e}

-DP with belief degree of 1. Conversely, if the expected privacy budget of all data providers is

ϵ_{e} < ϵ_{η}

, it indicates that it can satisfy the EPP when

ϵ_{e}

is closer to

ϵ_{η}

, and the degree of providing

ϵ_{e}

-DP is greater when

ϵ_{e}

is closer to

ϵ_{η}

. Therefore, in the case of EDU first, EXP

_{Q}

can provide a privacy guarantee degree between 0 and 1 for the EPP, while KRR can only provide either 0 or 1. Moreover, EXP

_{Q}

can provide more privacy protection than KRR, especially when the EDU and the EPP are contradictory, and when the EPP of all data providers is not fully (partially) satisfied.

(3) Regional average belief degree

In the regional average belief degree

C_{R e g i o n (ϵ_{e})}

in KRR, maximizing

C_{R e g i o n (ϵ_{e})}

equals to maximize

C_{ϵ_{e}}

, and hence it is the same as Algorithm 1. According to the approximately optimal expected privacy budget

ϵ_{η}

under satisfying the EDU

η

, the data provider’s expected privacy protection region

R e g i o n (ϵ_{e})

can be roughly divided into three categories:

{\forall ϵ_{e} \in R e g i o n (ϵ_{e}) > ϵ_{η}}

,

{\forall ϵ_{e} \in R e g i o n (ϵ_{e}) < ϵ_{η}}

, and

{{ϵ_{η}} \subset R e g i o n (ϵ_{e})}

.

Similarly, EXP

_{Q}

can provide different levels of optimal privacy protection for the three categories of expected privacy protection region (see Algorithm 2 for details). Generally speaking, the regional average privacy protection degree of EXP

_{Q}

in region

{\forall ϵ_{e} \in R e g i o n (ϵ_{e}) > ϵ_{η}}

is less than or equal to that of KRR. However, the regional average privacy protection degree of EXP

_{Q}

in region

{\forall ϵ_{e} \in R e g i o n (ϵ_{e}) < ϵ_{η}}

is greater than or equal to that of KRR. For region

{{ϵ_{η}} \subset R e g i o n (ϵ_{e})}

, it may exist the situation where there is a contradiction between the EPP and the EDU. As shown in Figure 4, there is the regional average belief degree

C_{R e g i o n (ϵ_{e})}

of both mechanisms with

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 4]

and the data distributions

P 1

,

P 2

and

P 3

, respectively, where

ϵ_{η}

is determined by

ϵ_{η}

-KRR with

η = 0.1

(see the dotted line in Figure 4 where the value of

ϵ_{e}

whose

C_{R e g i o n (ϵ_{e})}

is the first non-zero value is equal to

ϵ_{η}

).

As can be seen from Figure 4, under the same EDU and the same expected privacy protection region, EXP

_{Q}

is more capable of offering data providers with a certain degree of privacy protection than KRR.

6. B-DP Dynamic Collection and Publishing Algorithm Design

Algorithms 1 and 2 are implemented with KRR and EXP

_{Q}

under the known data distribution, and moreover, the point belief degree and the regional average belief degree under B-DP are analyzed. In real-world, there is often no prior data distribution at the beginning or accurate prior data distribution cannot be obtained. This means the implementation of two B-DP mechanisms of Algorithms 1 and 2 cannot be directly applied to the collection and publishing of continuous check-in data with relative error as utility metrics. Therefore, this paper designs an iterative update algorithm to adaptively update the data distribution in order to realize the two B-DP mechanisms, so as to adaptively realize B-DP dynamic collection and publishing of continuous check-in data. See pseudo codes of Algorithm 3 for more details. Therein, Algorithms 1 or 2 is a main part of Algorithm 3.

Algorithm 3 B-DP dynamic collection and publishing of check-in data algorithm—(KRR/EXP

_{Q}

).

Initialization process: The data collector initializes the perturbation probability matrix Q, and the estimated data distribution ${\hat{p}}^{(0)} = {[{\hat{p}}_{1}^{(0)}, {\hat{p}}_{2}^{(0)}, \dots, {\hat{p}}_{n}^{(0)}]}^{T}$ , and the perturbation probability matrix Q is broadcasted to the data provider.
1: For KRR, initialize $p_{i} = \frac{1}{n}$ for $i \in [1, n]$ and $ϵ_{η} = l n (\frac{1 + (n - 1) \sqrt{\frac{n - 1}{m η^{2} + n - 1}}}{1 - \sqrt{\frac{n - 1}{m η^{2} + n - 1}}})$ ; For EXP $_{Q}$ , initialization $p_{i} = \frac{1}{n}$ for $i \in [1, n]$ , $κ_{n} = 0$ and $γ = \frac{n}{n + 1} l n (\frac{1 + (n - 1) \sqrt{\frac{n - 1}{m η^{2} + n - 1}}}{1 - \sqrt{\frac{n - 1}{m η^{2} + n - 1}}})$ ;
2: Q is constructed according to KRR/EXP $_{Q}$ ;
3: Initialize ${\hat{p}}_{i}^{(0)} = 0$ for $i \in [1, n]$ ;
4: Q is broadcasted to the data provider.
Perturbation process
1: The data provider uses Q to perturb the check-in data;
2: The perturbed check-in data are sent to the data collector.
Statistics and update processes
(1) Statistical process: including aggregation and reconstruction procedures
1: After collecting the check-in data of time slice T, the data collector carries out frequency statistics to get the perturbed data distribution $\tilde{p}$ . Assuming that the current time slice is the tth, it is recorded as ${\tilde{p}}_{t}$ ;
2: From the inverse estimation formula ${\hat{p}}_{t} = Q^{- 1} {\tilde{p}}_{t}$ , it obtains the estimated distribution ${\hat{p}}_{t}$ ;
3: Correcting the estimated data distribution ${\hat{p}}_{t}$ , it gets the corrected estimate data distribution ${\hat{p}}^{(t)}$ of the tth time slice (to be released);
if ${\hat{p}}^{(t - 1)} \neq 0$ then
${\hat{p}}^{(t)} = (1 - w) {\hat{p}}^{(t - 1)} + w {\hat{p}}^{(t)}$ , where $w \in (0, 1)$ is the corrected estimate parameter of a positive real number;
else
${\hat{p}}^{(t)} = {\hat{p}}_{t}$ ;
end if
(2) Update process
1: The data collector calculates maximum relative error $R e = m a x (| \frac{{\hat{p}}^{(t)} - {\hat{p}}^{(t - 1)}}{{\hat{p}}^{(t - 1)}} |)$ based on ${\hat{p}}^{(t)}$ and ${\hat{p}}^{(t - 1)}$ ;
2: Initialize $R e_{t h r e d t h o l d} \in (0, 1)$ , which is a update threshold parameter of a positive real number;
if $R e > R e_{t h r e d t h o l d}$ then
2-1: Start a process of updating Q to get a new Q, as shown in Algorithm 1/Algorithm 2, where the input data distribution of Algorithm 1/Algorithm 2 is ${\hat{p}}^{(t)}$ ;
2-2: The data collector broadcasts the new Q to the data provider.
end if

Since the original data distribution is assumed to be uniform during initialization, it is possible to calculate the privacy setting parameter

ϵ_{η}

or

γ

with a closed-form expression that satisfies the EDU

η

, as shown in the example with EXP

_{Q}

below. According to Corollary A1 of Appendix A, in the case of uniform data distribution, EXP

_{Q}

degenerates into KRR. Let

κ_{n} = 0

, the probability

q_{i j}

of Q be calculated as follows, where

γ_{η}

is

γ

that makes

m a x (e r r (p_{i}, {\hat{p}}_{i})) = η

true.

q_{i j} = \{\begin{matrix} \frac{e^{γ_{η} (1 + \frac{1}{n})}}{e^{γ_{η} (1 + \frac{1}{n})} + n - 1}, i = j \\ \frac{1}{e^{γ_{η} (1 + \frac{1}{n})} + n - 1}, i \neq j \end{matrix} .

(25)

Let

p = \frac{e^{γ_{η} (1 + \frac{1}{n})}}{e^{γ_{η} (1 + \frac{1}{n})} + n - 1}, q = \frac{1}{e^{γ_{η} (1 + \frac{1}{n})} + n - 1}

, and the inverse matrix R of Q can be expressed as

r_{i j} = \{\begin{matrix} \frac{1 - q}{p - q}, i = j \\ - \frac{q}{p - q}, i \neq j \end{matrix} .

(26)

Therefore, p, q and

γ_{η}

that satisfy the EDU

η

can be calculated. Since

p \geq q

, it has

q = - \frac{1}{n} \sqrt{\frac{n - 1}{m η^{2} + n - 1}} + \frac{1}{n}

and

p = \frac{n - 1}{n} \sqrt{\frac{n - 1}{m η^{2} + n - 1}} + \frac{1}{n}

. Hence,

γ_{η} = \frac{n}{n + 1} l n (\frac{p}{q})

, and

γ = γ_{η} = \frac{n}{n + 1} l n (\frac{1 + (n - 1) \sqrt{\frac{n - 1}{m η^{2} + n - 1}}}{1 - \sqrt{\frac{n - 1}{m η^{2} + n - 1}}})

.

7. Experimental Evaluation of B-DP Dynamic Collection and Publishing Algorithm

In this paper, the check-in data uses relative error as its utility metrics and the implementation of the two B-DP mechanisms based on KRR and EXP

_{Q}

needs to rely on the data distribution. Therein, the number of domain values of both KRR and EXP

_{Q}

is more than 2, and moverover, both the randomized algorithms based on them only take one value as input and one value as output. Thereby, KRR and EXP

_{Q}

are fit for the check-in perturbation model we consider in this paper. In this section, we evaluate the performance of the dynamic algorithm based on the two B-DP mechanisms in terms of validity and robustness as well as privacy and utility. For simplicity, in the following of this section, we use KRR and EXP

_{Q}

to represent B-DP mechanism based on KRR and B-DP mechanism based on EXP

_{Q}

in the dynamic algorithm, respectively, including the diagram descriptions.

7.1. Experimental Settings

(1) Datasets

Two datasets with real-world data from location-based social networking platforms are used to verify the algorithms.

Brightkite [6]: It contains 4,491,143 check-ins over the period of April 2008–October 2010. In this paper, we used the check-ins of June 2008–September 2008, September 2009–December 2009 and January 2010—April 2010 from Brightkite to construct three types of check-in distributions with different uniformity degree according to the unified longitude and latitude division method, which are, respectively, abbreviated as B1, B2 and B3, and the number of regions is 12.

Gowalla [6]: It contains 6,442,890 check-ins over the period of Feburary 2009–October 2010. We used the check-ins of January 2010–April 2010 of Gowalla to construct three types of check-in distributions with different uniformity degree according to different partitioning methods of longitude and latitude, which are, respectively, abbreviated as G1, G2 and G3, and the number of regions is 25, respectively.

The average data distribution and the corresponding Gini coefficient of the data are shown in Figure 5 and Table 4, respectively. Therein, Gini coefficient is used to indicate the degree of unevenness of data distribution, which is calculated according to the method of Gini mean difference [52].

Figure 5 and Table 4 both show that the daily check-in data in two datasets fluctuates greatly, meaning a high diveristy. We verify the effectiveness of our algorithms on these real-world datasets in our experiment.

(2) Utility/Privacy Metrics

Utility Metrics: The utility uses the maximum relative error as its metrics (see Section 3.3 for details). In this paper, it uses the mean and deviation of the maximum relative error to evaluate the same EDU between KRR and EXP

_{Q}

in the dynamic algorithm.

Privacy Metrics: The privacy uses two new metrics including the point belief degree and the regional average belief degree (see Defintions 4 and 5 for details). In this paper, it needs to compare the privacy gurantee degree about the expected privacy protection (EPP) under the same expected data utility (EDU) using these two privacy metrics between KRR and EXP

_{Q}

in the dynamic algorithm.

(3) Parameter Settings

We evaluate our solutions through experiments using two real-world datasets. The experiments are performed on an Intel Core CPU 2.50-GHz Windows 10 machine equipped with 8 GB of main memory by matlab. In the experiments, the total check-in amount of statistical validity is

m = 100, 000

. Three kinds of EDU are

η = 0.1

,

0.08

and

0.05

. The expected privacy protection region is

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 4]

or

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 10]

. The modified estimate parameter w is set as Table 5. The update threshold parameter is

R e_{t h r e d t h o l d} = 0.02

, and the remaining relevant parameters

ϵ_{0}

and

Δ ϵ_{η}

are set to 0.5 and 0.005, respectively.

7.2. Validity and Robustness Evaluation

The performance of validity and robustness of the corresponding dynamic algorithm with KRR and EXP

_{Q}

is examined through the dynamic statistics process with two real-world datasets.

Figure 6 and Figure 7 show the mean values and deviations of the maximum relative error

e r r (p, \hat{p})

under the three kinds of EDU (

η = 0.1

,

0.08

and

0.05

), which are shown by the statistics of the corresponding data subsets under B1, B2, B3, G1, G2 and G3 according to the frequency of once a day. Moreover, the frequency of each day is different and each result is repeated 10 times. In both Figure 6 and Figure 7, the horizontal axis of each graph represents the number of time slices in continuously, and the vertical axis represents the maximum relative error

e r r (p, \hat{p}) = m a x (e r r (p_{i}, {\hat{p}}_{i}))

between the original data distribution and the estimated data distribution for any

i \in [1, n]

. As can be seen from the left small graphs of Figure 6 and Figure 7, the corresponding dynamic algorithm with KRR and EXP

_{Q}

can converge quickly and maintain the corresponding unified convergence stable state under different data distributions of B1, B2, B3, G1, G2 and G3. This verifies that the dynamic algorithm has a good validity and robustness.

7.3. Utility and Privacy Evaluation

The performance of utility and privacy of the corresponding dynamic algorithm with two B-DP mechanisms based on KRR and EXP

_{Q}

is also examined through the dynamic statistics process with two real-world datasets. As can also be seen from the right small graphs of Figure 6 and Figure 7, a part of the left small graphs of Figure 6 and Figure 7, it shows clearly that the dynamic algorithm can satisfy the utility even during the dynamic process.

In addition, Figure 8 and Figure 9 show the point belief degree and the regional average belief degree of each subset of two datasets under the three kinds of EDU (

η = 0.1

,

0.08

and

0.05

). In Figure 8 and Figure 9, the horizontal axis of each graph represents the EPP with different expected privacy buget

ϵ_{e}

and the vertical axis represents the gurantee degree of the EPP satisfied. It shows that the gurantee degree of the EPP satisfied varies with the data distribution and EDU. For example, from the point belief degree of all small graphs in the left of Figure 8 and Figure 9, the gurantee degree of the EPP satisfied becomes better until its value up to 1 when the expected privacy buget becomes bigger, and the more evener distribution can support the EPP with the smaller

ϵ_{e}

to provide a better privacy protection in the same EDU (as Figure 10 shown). The smaller value of EDU, i.e., the lower utility, can generally support the EPP with the smaller

ϵ_{e}

to provide a better privacy protection in the same data distribution (as Figure 11 shown).

In Figure 10, for EXP

_{Q}

on G1, G2 and G3 with the given EDU (such as

η = 0.1

), it shows clearly that the minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

is the smallest on G1 and the largest on G3. According to Table 4, G3 is pretty uneven, while G1 is relatively even. It is the same for EXP

_{Q}

on B1, B2 and B3, KRR on B1, B2 and B3 as well as on G1, G2 and G3. In Figure 11, for EXP

_{Q}

on G1, G2 and G3 with the given data distribution (such as G1), it also shows clearly that the minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

is the smallest on the EDU with

η = 0.1

and the largest on the EDU with

η = 0.05

. Similar trends can be observed for EXP

_{Q}

on B1, B2 and B3, KRR on B1, B2 and B3 as well as on G1, G2 and G3.

For the regional average belief degree, the similar results can be concluded from all small graphs in the right of Figure 8 and Figure 9. Moreover, in Figure 12, it shows the maximum difference of

C_{(R e g i o n (ϵ_{e}))}

in EXP

_{Q}

minus

C_{(R e g i o n (ϵ_{e}))}

in KRR with different

η

on each subset. It shows that the more unevener the data distribution is, the more bigger the maximum difference is. It means that EXP

_{Q}

is more adapt to the unevener data distribution than KRR.

Furthermore, in order to be more objective evaluation of the privacy performance of KRR and EXP

_{Q}

, it extends to use the privacy metrics of DP to compare the

ϵ_{η}

on each subset shown in Table 6, where

ϵ_{η}

refers to the privacy budget of a DP mechanism that just satisfies the EDU

η

. As can be seen from Table 6, except for

η = 0.08

and

η = 0.1

on B3, all the

ϵ_{η}

of EXP

_{Q}

is a little greater than those of KRR, which means that EXP

_{Q}

provides a little worse DP than KRR. However, EXP

_{Q}

could provide better B-DP from a new perspective of preference for privacy and utility than KRR to provide a good trade-off between them.

8. Discussions and Conclusions

This paper proposes a concept of best-effort differential privacy (B-DP) with the expected data utility (EDU) satisfied first and then with the expected privacy protection (EPP) satisfied as much as possible, and designs two new metrics including point belief degree and regional average belief degree to measure the guarantee degree of satisfying the EPP. Moreover, we also provide implementation algorithms, including the corresponding dynamic algorithm of two B-DP mechanisms based on KRR and a newly constructed mechanism EXP

_{Q}

. Extensive experiments on two real-world check-in datasets verify the effectiveness of the concept of B-DP. It also verifies that the dynamic algorithm has a good validity and robustness, and can satisfy the utility even during the dynamic process. Besides, EXP

_{Q}

is more adapt to the unevener data distribution and satisfies a better B-DP than KRR to provide a good trade-off between privacy and utility.

Specifically, the point belief degree measures the guarantee degree of privacy protection for any one expected privacy budget, and the regional average belief degree measures the average guarantee degree of the EPP in a region including multiple expected privacy budgets. Compared with the (

ϵ, δ

)-DP, the latter can measure only one EPP with the expected privacy budget equal to

ϵ

and cannot directly measure the average guarantee degree of the EPP, that is, the (

ϵ, δ

)-DP can only measure the guarantee degree of the EPP when

ϵ_{e} = ϵ

, i.e.,

1 - δ

. In addition, many real-world applications can only provide an approximate value of

ϵ_{e}

as their EPP, and hence a neighborhood interval with

ϵ_{e}

can be regarded as their EPP. Therefore, the regional average belief degree introduced in this paper is very necessary.

Moreover, two B-DP mechanisms based on KRR and newly constructed EXP

_{Q}

in this paper are applied to the dynamic collection and publishing of check-in data with relative error as its utility metrics. Therein, KRR itself does not depend on the data distribution, but the dynamic collection and publishing algorithm with B-DP mechanism based on KRR needs to, where the privacy setting parameter has to be adjusted with the influence of data distribution to realize the utility guaranteed firstly in real time. In addition, EXP

_{Q}

itself is dependent on the data distribution to realize some of its outputs having strong privacy protection and some having weak privacy protection, which is different from KRR to provide consistent privacy protection intensity. Thus, the dynamic collection and publishing algorithm based on these two B-DP mechanisms needs to depend on the data distribution, and then it has to face the challenges of algorithm validity and robustness with unknown data distribution. Fortunately, the experimental results have already verified that the algorithm can solve both challenges and is promising for the typical application of check-in data.

Besides, if the scenic spots use EXP

_{Q}

for privacy protection, the data provider may be more inclined to visit these scenic spots with a large number of visitors, because the regions where these scenic spots are located may have a stronger privacy protection. Compared with the algorithms based on the existing DP mechanisms with consistent privacy protection intensity to realize B-DP, such as KRR in this paper, they maybe do not achieve the EPP at all, but the algorithm based on EXP

_{Q}

newly proposed in this paper can achieve the EPP partly at least, that is, EXP

_{Q}

can satisfy a better B-DP to provide a good trade-off between privacy and utility.

In a word, although the B-DP dynamic collection and publishing algorithm based on KRR or EXP

_{Q}

is not necessarily perfect, it fully proves the feasibility of the concept of B-DP in this paper. It is not only a great step forward for the basic theory of DP, but also provides two feasible solutions for the implementation of DP in practical applications. The two solutions take check-in data as an example, but are not limited to it. They can also be used to other category data for privacy protection where the perturbation model is one input and one output. In the future work, we will make a further discussion on other mechanisms with binary inputs in LDP, where the perturbation model can support one input is perpurbed to multiple outputs, such as RAPPOR, and design them to achieve better B-DP. Moreover, it is an interesting problem about correlated B-DP.

Author Contributions

The problem was conceived by Y.C. and Z.X. The theoretical analysis and experimental verification were performed by Y.C., Z.X. and J.C. Y.C. and J.C. wrote the paper. S.J. reviewed the writing on grammar and structure of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

Natural Science Foundation of China (No.41971407), China Postdoctoral Science Foundation (No.2018M633354), Natural Science Foundation of Fujian Province, China (Nos.2020J01571, 2016J01281) and Science and Technology Innovation Special Fund of Fujian Agriculture and Forestry University (No.CXZX2019119S).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Two publicly available datasets were analyzed in this study. Both datasets can be found here: http://snap.stanford.edu/data/loc-gowalla.html and http://snap.stanford.edu/data/loc-brightkite.html (accessed on 5 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 5

Proof.

(1) From Definition 11,

ϵ_{η} = m a x_{1 \leq i \leq n} (ϵ_{i})

. According to the definition of LDP, it can be seen that

ϵ_{η}

-EXP

_{Q}

satisfies

ϵ = ϵ_{η}

-LDP.

If it wants to proof (2) and (3) of Theorem 5, it needs the following theorems and corollary first.

Theorem A1.

In perturbation mechanism EXP

_{Q}

, where

i, j \in [1, n]

and

κ_{n} \in [0, n]

, it has

(a)

q_{j j} \geq q_{i j}

; if

i_{1}, i_{2} \in [1, n]

,

i_{1} \leq i_{2}

and

i_{1}, i_{2} \neq j

, then

q_{i_{1} j} \geq q_{i_{2} j}

.

(b)

q_{i i} \geq q_{i j}

; if

j_{1}, j_{2} \in [1, n]

,

j_{1} \leq j_{2}

and

j_{1}, j_{2} \neq i

, then

q_{i j_{1}} \geq q_{i j_{2}}

.

(c) for

j_{1} \leq j_{2},

q_{j_{1} j_{1}} \geq q_{j_{2} j_{2}}

.

(d) for

i_{1} \leq i_{2},

{\tilde{p}}_{i_{1}} \geq {\tilde{p}}_{i_{2}}

, where

{\tilde{p}}_{i_{1}}, {\tilde{p}}_{i_{2}}

are the distribution probabilities of

P O I_{i_{1}}

and

P O I_{i_{2}}

after perturbation, respectively.

Proof.

See Appendix B. □

Theorem A2.

In perturbation mechanism EXP

_{Q}

, it satisfies the following inequalities, where

i \in [1, n]

.

(1)

For

κ_{n} \in [1, n - 1]

,

(i) if

1 \leq i \leq κ_{n}

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 - p_{i + 1})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 - p_{i + 1})}

;

(ii) if

κ_{n} < i \leq n - 1

,

κ_{n} \leq n - 2

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 + p_{n - i + κ_{n}})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 + p_{n - i + κ_{n}})}

;

(iii) if

κ_{n} < i = n

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 + p_{κ_{n} + 1})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 + p_{κ_{n} + 1})}

.

(2)

For

κ_{n} = 0

,

(i) if

1 \leq i \leq n - 1

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 + p_{n - i})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 + p_{n - i})}

;

(ii) if

i = n

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 + p_{1})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 + p_{1})}

.

(3)

For

κ_{n} = n

,

(i) if

1 \leq i \leq n - 1

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 - p_{i + 1})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 - p_{i + 1})}

;

(ii) if

i = n

and

j, j^{'} \in [1, n]

, then

e^{- γ (1 - p_{n})} \leq \frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 - p_{n})}

.

Proof.

See Appendix C. □

From Theorems A1 and A2, it has following Corollary A1.

Corollary A1.

For any

i \in [1, n]

, let

ϵ_{i}

be the actual privacy budget provided by EXP

_{Q}

for

P O I_{i}

, it has the following properties.

(1)

For

κ_{n} \in [1, n - 1]

,

(i): if $1 \leq i \leq κ_{n}$ , then $ϵ_{i} = γ (1 - p_{i + 1})$ ;
(ii): if $κ_{n} < i \leq n - 1$ and $κ_{n} \leq n - 2$ , then $ϵ_{i} = γ (1 + p_{n - i + κ_{n}})$ ;
(iii): if $κ_{n} < i = n$ , then $ϵ_{n} = γ (1 + p_{κ_{n} + 1})$ .

(2)

For

κ_{n} = 0

,

(i): if $1 \leq i \leq n - 1$ , then $ϵ_{i} = γ (1 + p_{n - i})$ ;
(ii): if $i = n$ , then $ϵ_{n} = γ (1 + p_{1})$ .

(3)

For

κ_{n} = n

,

(i): if $1 \leq i \leq n - 1$ , then $ϵ_{i} = γ (1 - p_{i + 1})$ ;
(ii): if $i = n$ , then $ϵ_{n} = γ (1 - p_{n})$ .

Let

γ = γ_{η}

, where

γ_{η}

is the privacy setting parameter

γ

of EXP

_{Q}

that satisfies the EDU

e r r (p, \hat{p}) = η

. According to Theorem A2, the actual privacy budget for each

P O I_{i}

can be set to

ϵ_{i} = φ (γ_{η}, p)

, which is a function of

γ_{η}

and

p

. It is easy to obtain that

ϵ_{i}

is a monotonic non-decreasing function according to Theorem A2 and Corollary A1 for a fixed

κ_{n}

.

Therefore, the

ϵ_{η} = m a x_{1 \leq i \leq n} (ϵ_{i})

-EXP

_{Q}

satisfies (2) and (3) of Theorem 5 that can be proved as follows.

(2) When

κ_{n}

is fixed and the point belief degree of

ϵ_{η}

-EXP

_{Q}

is

C_{ϵ_{e}} = \sum_{i = 1}^{n} {\tilde{p}}_{i} χ (ϵ_{i}, ϵ_{e})

, if

C_{ϵ_{e}}

is maximized, then

ϵ_{η}

-EXP

_{Q}

satisfies

(ϵ_{η}, C_{ϵ_{e}})

-Best-B-DP. According to Corollary A1, it can be known that the larger

p_{i}

is, the smaller

ϵ_{i}

is, indicating that in the same

ϵ_{e}

,

ϵ_{i}

can satisfy

ϵ_{e}

at first, and the proportion that it satisfies is

{\tilde{p}}_{i}

. According to Theorem A2, when

p_{i}

is larger,

{\tilde{p}}_{i}

will always be larger too. Hence,

C_{ϵ_{e}}

is also maximized under a fixed

κ_{n}

.

(3) According to Definition 10 and Corollary A1, in EXP

_{Q}

, the data distribution

p

satisfies

p_{1} \geq \dots \geq p_{i} \geq \dots \geq p_{n}

and the actual privacy budget satisfies

ϵ_{1} \leq \dots \leq ϵ_{i} \leq \dots \leq ϵ_{n}

. Suppose

ϵ_{i} \leq ϵ_{e} < ϵ_{i + 1}

, then the probability of satisfying the EPP

ϵ_{e}

under the EDU

η

is

C_{ϵ_{e}} = \sum_{j = 1}^{n} {\tilde{p}}_{j} χ (ϵ_{j}, ϵ_{e}) = \sum_{j = 1}^{i} {\tilde{p}}_{j} .

Let

R 1 = {1, 2, \dots, i}, R 2 = {i + 1, i + 2, \dots, n}

, then

P (k \in R 1) = C_{ϵ_{e}}, P (k \in R 2) = 1 - C_{ϵ_{e}}

. Hence, For any user u and

k, j^{'}, j \in [1, n]

, it has

\begin{matrix} P (| l n (\frac{q_{k j^{'}}}{q_{k j}}) | \leq ϵ_{e}) \\ = P (| l n (\frac{q_{k j^{'}}}{q_{k j}}) | \leq ϵ_{e} | k \in R 1) P (k \in R 1) \\ + P (| l n (\frac{q_{k j^{'}}}{q_{k j}}) | \leq ϵ_{e} | k \in R 2) P (k \in R 2) \\ = 1 \cdot C_{ϵ_{e}} + 0 \cdot (1 - C_{ϵ_{e}}) \\ = C_{ϵ_{e}} . \end{matrix}

(A1)

Therefore, it is easy to obtain

\begin{matrix} q_{k j^{'}} \leq e^{ϵ_{e}} q_{k j} + (1 - C_{ϵ_{e}}) \end{matrix}

(A2)

for any user u and

k, j^{'}, j \in [1, n]

.

Therefore, according to Definitions 2, 3 and Theorem A2,

ϵ_{η}

-EXP

_{Q}

with the point belief degree

C_{ϵ_{e}}

satisfies

(ϵ_{e}, 1 - C_{ϵ_{e}})

-LDP.

Appendix B. Proof of A1

Proof.

Because of the perturbation mechanism EXP

_{Q}

, the check-in data distribution

p = {[p_{1}, p_{2}, \dots, p_{n}]}^{T}

satisfies

p_{1} \geq p_{2} \geq \dots \geq p_{n}

. In accordance with the

κ_{n} \in [1, n - 1], κ_{n} = n

and

κ_{n} = 0

, it can be discussed for three cases. For

κ_{n} \in [1, n - 1]

, if the perturbation mechanism EXP

_{Q}

satisfies Theorem A1, then the other two cases,

κ_{n} = n

and

κ_{n} = 0

, are also proved by using the same method.

For

κ_{n} \in [1, n - 1]

, it can be discussed as follows based on Theorem 4.

(a)

For any

i \in [1, n]

, the probability

q_{i j}

that the check-in state of

P O I_{j}

is perturbed to that of

P O I_{i}

satisfies the following cases.

(i)

For

i \leq κ_{n}

and

i \neq j

, it has

q_{j j} = \frac{1}{Ω_{j}} \geq q_{i j} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j}}

. Moreover, for

i_{1} \leq i_{2}

,

i_{1}, i_{2} \neq j

and

i_{1}, i_{2} \leq κ_{n}

, it has

q_{i_{1} j} = \frac{e^{- γ (1 - p_{i_{1}})}}{Ω_{j}} \geq q_{i_{2} j} = \frac{e^{- γ (1 - p_{i_{2}})}}{Ω_{j}}

.

(i i)

For

i > κ_{n}

and

i \neq j

, it has

q_{j j} = \frac{1}{Ω_{j}} \geq q_{i j} = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{Ω_{j}}

. Moreover, for

i_{1}, i_{2} > κ_{n}

,

i_{1} \leq i_{2}

, and

i_{1}, i_{2} \neq j

, it has

q_{i_{1} j} = \frac{e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{j}} \geq q_{i_{2} j} = \frac{e^{- γ (1 + p_{n - i_{2} + κ_{n} + 1})}}{Ω_{j}}

.

(i i i)

For

i_{1} \neq j

,

i_{1} \leq κ_{n}

,

i_{2} \neq j

and

i_{2} > κ_{n}

, it has

q_{i_{1} j} = \frac{e^{- γ (1 - p_{i_{1}})}}{Ω_{j}} \geq q_{i_{2} j} = \frac{e^{- γ (1 + p_{n - i_{2} + κ_{n} + 1})}}{Ω_{j}}

.

From the above discussions (i–iii), the property

(a)

in this theorem holds for

i_{1}, i_{2} \neq j

and

κ_{n} \in [1, n - 1]

.

(b . 1)

For

j \in [1, n]

, the probability

q_{i j}

that the check-in state of

P O I_{j}

is perturbed to that of

P O I_{i}

satisfies the following cases.

(i)

For

i, j \leq κ_{n}

and

j \neq i

, it has

\begin{matrix} q_{i i} & = \frac{1}{Ω_{i}} = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} Ω_{i}} \\ = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} (1 + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} (1 + e^{- γ (1 - p_{j})} + \sum_{k = 1, k \neq i, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ \geq \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j}} = q_{i j} . \end{matrix}

(A3)

(i i)

For

j \neq i

,

i \leq κ_{n}

and

j > κ_{n}

, it has

\begin{matrix} q_{i i} & = \frac{1}{Ω_{i}} = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} Ω_{i}} \\ = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} (1 + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ = \frac{e^{- γ (1 - p_{i})}}{e^{- γ (1 - p_{i})} (1 + e^{- γ (1 + p_{n - j + κ_{n} + 1})} + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ \geq \frac{e^{- γ (1 - p_{i})}}{e^{- γ (2 - p_{i} + p_{n - j + κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j}} = q_{i j} . \end{matrix}

(A4)

(i i i)

For

j \neq i

,

i > κ_{n}

and

j \leq κ_{n}

, it has

\begin{matrix} q_{i i} & = \frac{1}{Ω_{i}} = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (1 + p_{n - i + κ_{n} + 1})} Ω_{i}} \\ = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (1 + p_{n - i + κ_{n} + 1})} (1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (1 + p_{n - i + κ_{n} + 1})} (1 + e^{- γ (1 - p_{j})} + \sum_{k = 1, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ \geq \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (2 + p_{n - i + κ_{n} + 1} - p_{j})} + \sum_{k = 1, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{1 + \sum_{k = 1, k \neq j}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{Ω_{j}} \\ = q_{i j} . \end{matrix}

(A5)

(i v)

For

j \neq i

,

i > κ_{n}

and

j > κ_{n}

, it has

\begin{matrix} q_{i i} & = \frac{1}{Ω_{i}} \\ = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (1 + p_{n - i + κ_{n} + 1})} Ω_{i}} \\ = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (1 + p_{n - i + κ_{n} + 1})} (1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})})} \\ \geq \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{e^{- γ (2 + p_{n - i + κ_{n} + 1} + p_{n - j + κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{Ω_{j}} = q_{i j} . \end{matrix}

(A6)

From the above discussions (i–iv), it can be seen that

q_{i i} \geq q_{i j}

for

j \in [1, n]

.

(b . 2)

For

j_{1} \leq j_{2}

,

j_{1}, j_{2} \neq i

and

j_{1}, j_{2} \in [1, n]

, the probability

q_{i j_{1}}

that the check-in state of

P O I_{j_{1}}

is perturbed to that of

P O I_{i}

and the probability

q_{i j_{2}}

that the check-in state of

P O I_{j_{2}}

is perturbed to that of

P O I_{i}

satisfy the following cases.

(i)

For

i \leq κ_{n}

and

j_{1}, j_{2} \leq κ_{n}

, it has

\begin{matrix} q_{i j_{1}} & = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{1}}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1, k \neq j_{1}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 - p_{j_{2}})} + \sum_{k = 1, k \neq j_{1}, k \neq j_{2}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 - p_{j_{1}})} + \sum_{k = 1, k \neq j_{1}, k \neq j_{2}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1, k \neq j_{2}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{2}}} = q_{i j_{2}} . \end{matrix}

(A7)

(i i)

For

j_{1}, j_{2} > κ_{n}

and

i \leq κ_{n}

, it has

\begin{matrix} q_{i j_{1}} & = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{1}}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{1}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 + p_{n - j_{2} + κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{1}, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 + p_{n - j_{1} + κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{1}, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{2}}} = q_{i j_{2}} . \end{matrix}

(A8)

(i i i)

For

i \leq κ_{n}

,

j_{1} \leq κ_{n}

and

j_{2} > κ_{n}

, it has

\begin{matrix} q_{i j_{1}} & = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{1}}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1, k \neq j_{1}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 + p_{n - j_{2} + κ_{n} + 1})} + \sum_{k = 1, k \neq j_{1}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \geq \frac{e^{- γ (1 - p_{i})}}{1 + e^{- γ (1 - p_{j_{1}})} + \sum_{k = 1, k \neq j_{1}}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ = \frac{e^{- γ (1 - p_{i})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq j_{2}}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} = \frac{e^{- γ (1 - p_{i})}}{Ω_{j_{2}}} = q_{i j_{2}} . \end{matrix}

(A9)

From the above discussions (i–iii) of

(b . 2)

, when

i \leq κ_{n}

, for

j_{1}, j_{2} \neq i

,

j_{1} \leq j_{2}

and

j_{1}, j_{2} \in [1, n]

,

q_{i j_{1}} \geq q_{i j_{2}}

. When

i > κ_{n}

, for

j_{1}, j_{2} \neq i

,

j_{1} \leq j_{2}

and

j_{1}, j_{2} \in [1, n]

, we can use the similar method to get

q_{i j_{1}} \geq q_{i j_{2}}

. Then it has

q_{i j_{1}} \geq q_{i j_{2}}

for

i, j_{1}, j_{2} \in [1, n]

,

j_{1} \leq j_{2}

and

j_{1}, j_{2} \neq i

.

Hence, from

(b . 1 \sim b . 2)

, when

κ_{n} \in [1, n - 1]

, the part

(b)

of Theorem A1 holds.

(c)

According to the property

(b)

and its proof process, it is easy to get

\frac{1}{Ω_{j_{1}}} \geq \frac{1}{Ω_{j_{2}}}

, i.e.,

q_{j_{1} j_{1}} \geq q_{j_{2} j_{2}}

holds for

j_{1} \leq j_{2}

and

j_{1}, j_{2} \neq i

.

(d)

Since

{\tilde{p}}_{i} = \sum_{j = 1}^{n} q_{i j} p_{j}

for

i \in [1, n],

it has

i_{1} \leq i_{2}

that implies

p_{i_{1}} \geq p_{i_{2}}

, and it has Formula (A10). According the property

(a)

q_{i_{1} j} \geq q_{i_{2} j}

and Formula (A10), it can be seen that

\sum_{j = 1, j \neq i_{1}, j \neq i_{2}}^{n} (q_{i_{1} j} - q_{i_{2} j}) p_{j} \geq 0

.

\begin{matrix} {\tilde{p}}_{i_{1}} - {\tilde{p}}_{i_{2}} & = \sum_{j = 1}^{n} q_{i_{1} j} p_{j} - \sum_{j = 1}^{n} q_{i_{2} j} p_{j} \\ = \sum_{j = 1, j \neq i_{1}, j \neq i_{2}}^{n} (q_{i_{1} j} - q_{i_{2} j}) p_{j} + q_{i_{1} i_{1}} p_{i_{1}} + q_{i_{1} i_{2}} p_{i_{2}} - q_{i_{2} i_{1}} p_{i_{1}} - q_{i_{2} i_{2}} p_{i_{2}} \\ = \sum_{j = 1, j \neq i_{1}, j \neq i_{2}}^{n} (q_{i_{1} j} - q_{i_{2} j}) p_{j} + (q_{i_{1} i_{1}} - q_{i_{2} i_{1}}) p_{i_{1}} - (q_{i_{2} i_{2}} - q_{i_{1} i_{2}}) p_{i_{2}} . \end{matrix}

(A10)

If it wants to prove

{\tilde{p}}_{i_{1}} \geq {\tilde{p}}_{i_{2}}

always stands up for

i_{1} \leq i_{2}

, then it just has to prove

(q_{i_{1} i_{1}} - q_{i_{2} i_{1}}) p_{i_{1}} - (q_{i_{2} i_{2}} - q_{i_{1} i_{2}}) p_{i_{2}} \geq 0

. Similarly, it only wants to discuss the following case

κ_{n} \in [1, n - 1]

, and the other two cases, that is,

κ_{n} = 0

and

κ_{n} = n

can also be proved by using the same method.

(i)

For

1 \leq i_{1} \leq i_{2} \leq κ_{n}

, it has

\begin{matrix} (q_{i_{1} i_{1}} - q_{i_{2} i_{1}}) p_{i_{1}} - (q_{i_{2} i_{2}} - q_{i_{1} i_{2}}) p_{i_{2}} \\ = \frac{1 - e^{- γ (1 - p_{i_{2}})}}{Ω_{i_{1}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ = \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} (p_{i_{1}} - p_{i_{2}}) \geq 0 . \end{matrix}

(A11)

(i i)

For

1 \leq i_{1} \leq κ_{n}

and

n \geq i_{2} > κ_{n}

, it can also obtain

\begin{matrix} (q_{i_{1} i_{1}} - q_{i_{2} i_{1}}) p_{i_{1}} - (q_{i_{2} i_{2}} - q_{i_{1} i_{2}}) p_{i_{2}} \\ = \frac{1 - e^{- γ (1 + p_{n - i_{2} + κ_{n} + 1})}}{Ω_{i_{1}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 - p_{i_{2}})}}{Ω_{i_{1}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 - p_{i_{2}})}}{Ω_{i_{2}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{1}} - \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} p_{i_{2}} \\ = \frac{1 - e^{- γ (1 - p_{i_{1}})}}{Ω_{i_{2}}} (p_{i_{1}} - p_{i_{2}}) \geq 0 . \end{matrix}

(A12)

(i i i)

For

κ_{n} < i_{1} \leq i_{2} \leq n

, it can also get

\begin{matrix} (q_{i_{1} i_{1}} - q_{i_{2} i_{1}}) p_{i_{1}} - (q_{i_{2} i_{2}} - q_{i_{1} i_{2}}) p_{i_{2}} \\ = \frac{1 - e^{- γ (1 + p_{n - i_{2} + κ_{n} + 1})}}{Ω_{i_{1}}} p_{i_{1}} - \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{1}}} p_{i_{1}} - \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{2}}} p_{i_{2}} \\ \geq \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{2}}} p_{i_{1}} - \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{2}}} p_{i_{2}} \\ = \frac{1 - e^{- γ (1 + p_{n - i_{1} + κ_{n} + 1})}}{Ω_{i_{2}}} (p_{i_{1}} - p_{i_{2}}) \geq 0, \end{matrix}

(A13)

From the above discussions

(i \sim i i i)

, the part

(d)

in Theorem A1 is true.

Therefore, from

(a \sim d)

, the result follows. □

Appendix C. Proof of Theorem A2

Proof.

It is only to deal with case (1) here, and a similar statement can be made of cases (2) or (3). From Theorem A1, it can be seen that

q_{i i} \geq q_{i j}

for

i, j \in [1, n]

and it has

q_{i j_{1}} \geq q_{i j_{2}}

for

j_{1} \leq j_{2}

,

j_{1}, j_{2} \in [1, n]

and

j_{1}, j_{2} \neq i

. Hence, it has the following cases to discuss for

κ_{n} \in [1, n - 1]

.

(i)

If

i \leq κ_{n}

and

j, j^{'} \in [1, n]

, then it has Formula (A14).

\begin{matrix} \frac{q_{i j}}{q_{i j^{'}}} \leq \frac{q_{i i}}{q_{i n}} = \frac{\frac{1}{Ω_{i}}}{\frac{e^{- γ (1 - p_{i})}}{Ω_{n}}} \\ = e^{γ (1 - p_{i})} \frac{Ω_{n}}{Ω_{i}} \\ = e^{γ (1 - p_{i})} \frac{1 + e^{- γ (1 - p_{i})} + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}}{1 + e^{- γ (1 + p_{κ_{n} + 1})} + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \leq e^{γ (1 - p_{i + 1})} \\ \times \frac{1 + e^{- γ (1 - p_{i})} + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}}{e^{γ (p_{i} - p_{i + 1})} + e^{γ (p_{i} - p_{i + 1})} e^{- γ (1 + p_{κ_{n} + 1})} + \sum_{k = 1, k \neq i}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} . \end{matrix}

(A14)

Therefore, according Formula (A14), if it wants to show

\frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 - p_{i + 1})}

, then it needs to show

1 + e^{- γ (1 - p_{i})} \leq e^{γ (p_{i} - p_{i + 1})} + e^{γ (p_{i} - p_{i + 1})} e^{- γ (1 + p_{κ_{n} + 1})}

, i.e.,

1 + e^{- γ (1 - p_{i})} - e^{γ (p_{i} - p_{i + 1})} - e^{γ (p_{i} - p_{i + 1})} e^{- γ (1 + p_{κ_{n} + 1})} \leq 0

(A15)

always stands up. Let

f (x) = 1 + e^{- γ (1 - x)} - e^{γ (x - p_{i + 1})} - e^{γ (x - p_{i + 1})} e^{- γ (1 + p_{κ_{n} + 1})}

, where

x \in [0, p_{κ_{n}}], γ > 0

and

0 \leq p_{i + 1} \leq x

. If it takes the derivative of

f (x)

with respect to x, it gets

f {(x)}^{'} = γ e^{- γ (1 - x)} - γ e^{γ (x - p_{i + 1})} - γ e^{- γ (1 + p_{κ_{n} + 1})} e^{γ (x - p_{i + 1})} \leq 0

, i.e.,

f (x)

decreases monotonically with x. It implies that

f (x) \leq f (0) = 0

always stands up. Therefore, the inequation

1 + e^{- γ (1 - p_{i})} - e^{γ (p_{i} - p_{i + 1})} - e^{γ (p_{i} - p_{i + 1})} e^{- γ (1 + p_{κ_{n} + 1})} \leq 0

always stands up for

p_{i} \in [0, p_{k_{n}}]

. From the above discussion, it can be seen that

\begin{matrix} \frac{q_{i j}}{q_{i j^{'}}} \leq \frac{q_{i i}}{q_{i n}} \leq e^{γ (1 - p_{i + 1})} . \end{matrix}

(A16)

Similarly, it has

\begin{matrix} \frac{q_{i j}}{q_{i j^{'}}} & \geq \frac{q_{i j}}{q_{i i}} \geq \frac{q_{i n}}{q_{i i}} \geq e^{- γ (1 - p_{i + 1})} . \end{matrix}

(A17)

(i i)

For

n > i > κ_{n}

,

κ_{n} \leq n - 2

and

j, j^{'} \in [1, n]

, it has

\begin{matrix} \frac{q_{i j}}{q_{i j^{'}}} \leq \frac{q_{i i}}{q_{i n}} \\ = \frac{\frac{1}{Ω_{i}}}{\frac{e^{- γ (1 + p_{n - i + κ_{n} + 1})}}{Ω_{n}}} \\ = e^{γ (1 + p_{n - i + κ_{n} + 1})} \frac{Ω_{n}}{Ω_{i}} \\ = e^{γ (1 + p_{n - i + κ_{n} + 1})} \frac{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}}{1 + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \leq e^{γ (1 + p_{n - i + κ_{n}})} (1 + e^{- γ (1 + p_{n - i + κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}) \times \\ \frac{1}{e^{γ (p_{n - i + κ_{n}} - p_{n - i + κ_{n} + 1})} (1 + e^{- γ (1 + p_{κ_{n} + 1})}) + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq i, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} . \end{matrix}

(A18)

Hence, if we want to show

\frac{q_{i j}}{q_{i j^{'}}} \leq e^{γ (1 + p_{n - i + κ_{n}})}

according the Formula (A18), then it needs to show that

1 + e^{- γ (1 + p_{n - i + κ_{n} + 1})}

\leq e^{γ (p_{n - i + κ_{n}} - p_{n - i + κ_{n} + 1})} (1 + e^{- γ (1 + p_{κ_{n} + 1})})

, i.e.,

1 + e^{- γ (1 + p_{n - i + κ_{n} + 1})} - e^{γ (p_{n - i + κ_{n}} - p_{n - i + κ_{n} + 1})} (1 + e^{- γ (1 + p_{κ_{n} + 1})}) \leq 0

(A19)

always stand up. Let

f (x) = 1 + e^{- γ (1 + x)} - e^{γ (p_{n - i + κ_{n}} - x)} (1 + e^{- γ (1 + p_{κ_{n} + 1})})

, where

x \in [p_{n}, p_{k_{n} + 1}] (i \in [k_{n} + 1, n]), γ > 0

and

p_{n - i + κ_{n}} \geq x

. If we take the derivative of

f (x)

with respect to x, we get

f {(x)}^{'} = - γ e^{- γ (1 + x)} + γ e^{γ (p_{n - i + κ_{n}} - x)} (1 + e^{- γ (1 + p_{κ_{n} + 1})}) \geq 0

, that is,

f (x)

increases monotonically with x. Then

f (x) \leq f (p_{k_{n} + 1}) = 1 + e^{- γ (1 + p_{k_{n} + 1})} - e^{γ (p_{n - i + κ_{n}} - p_{k_{n} + 1})} (1 + e^{- γ (1 + p_{κ_{n} + 1})}) \leq 0

always stand up. Therefore, the inequation

1 + e^{- γ (1 + p_{n - i + κ_{n} + 1})} - e^{γ (p_{n - i + κ_{n}} - p_{n - i + κ_{n} + 1})} (1 + e^{- γ (1 + p_{κ_{n}})}) \leq 0

always stand up for

p_{n - i + κ_{n} + 1}

\in [p_{n}, p_{k_{n} + 1}) (i \in [k_{n} + 1, n))

.

Similarly, it has

\begin{matrix} \frac{q_{i j}}{q_{i j^{'}}} & \geq \frac{q_{i j}}{q_{i i}} \geq \frac{q_{i n}}{q_{i i}} \geq e^{- γ (1 + p_{n - i + κ_{n}})} . \end{matrix}

(A20)

(i i i)

For

i = n

and

j, j^{'} \in [1, n]

, it has

\begin{matrix} \frac{q_{n j}}{q_{n j^{'}}} & \leq \frac{q_{n n}}{q_{n (n - 1)}} \\ = \frac{\frac{1}{Ω_{n}}}{\frac{e^{- γ (1 + p_{κ_{n} + 1})}}{Ω_{(} n - 1)}} \\ = e^{γ (1 + p_{κ_{n} + 1})} \frac{Ω_{n - 1}}{Ω_{n}} \\ = e^{γ (1 + p_{κ_{n} + 1})} \frac{1 + e^{- γ (1 + p_{κ_{n} + 1})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n - 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}}{1 + e^{- γ (1 + p_{κ_{n} + 2})} + \sum_{k = 1}^{κ_{n}} e^{- γ (1 - p_{k})} + \sum_{k = κ_{n} + 1, k \neq n - 1, k \neq n}^{n} e^{- γ (1 + p_{n - k + κ_{n} + 1})}} \\ \leq e^{γ (1 + p_{κ_{n} + 1})} . \end{matrix}

(A21)

Similarly, it has

\begin{matrix} \frac{q_{n j}}{q_{n j^{'}}} & \geq \frac{q_{n j}}{q_{n n}} \geq \frac{q_{n (n - 1)}}{q_{n n}} \geq e^{- γ (1 + p_{κ_{n} + 1})} . \end{matrix}

(A22)

From the above discussion of

(i) \sim (i i i)

, it can be seen that the part

(1)

of Theorem A2 holds. □

References

Patil, S.; Norcie, G.; Kapadia, A.; Lee, A.J. Reasons, rewards, regrets: Privacy considerations in location sharing as an interactive practice. In Proceedings of the 8th Symposium on Usable Privacy and Security (SOUPS), Washington, DC, USA, 11–13 July 2012; pp. 1–15. [Google Scholar]
Patil, S.; Norcie, G.; Kapadia, A.; Lee, A. “Check out where I am!”: Location-sharing motivations, preferences, and practices. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI), Austin, TX, USA, 5–10 May 2012; pp. 1997–2002. [Google Scholar]
Lindqvist, J.; Cranshaw, J.; Wiese, J.; Hong, J.; Zimmerman, J. I’m the mayor of my house: Examining why people use foursquare-a social-driven location sharing application. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Vancouver, BC, Canada, 7–12 May 2011; pp. 2409–2418. [Google Scholar]
Guha, S.; Birnholtz, J. Can you see me now?: Location, visibility and the management of impressions on foursquare. In Proceedings of the 15th International Conference on Human-computer Interaction with Mobile Devices and Services ( MobileHCI), Munich, Germany, 27–30 August 2013; pp. 183–192. [Google Scholar]
Gruteser, M.; Grunwald, D. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MOBISYSP), San Francisco, CA, USA, 5–8 May 2003; pp. 31–42. [Google Scholar]
Cho, E.; Myers, S.A.; Leskovec, J. Friendship mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, CA, USA, 21–24 August 2011; pp. 1082–1090. [Google Scholar]
Huo, Z.; Meng, X.; Zhang, R. Feel free to check-in: Privacy alert against hidden location inference attacks in GeoSNs. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), Wuhan, China, 22–25 April 2013; pp. 377–391. [Google Scholar]
Naghizade, E.; Bailey, J.; Kulik, L.; Tanin, E. How private can I be among public users? In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UBICOMP), Osaka, Japan, 7–11 September 2015; pp. 1137–1141. [Google Scholar]
Rossi, L.; Williams, M.J.; Stich, C.; Musolesi, M. Privacy and the city: User identification and location semantics in location-based social networks. In Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM), Oxford, UK, 26–29 May 2015; pp. 387–396. [Google Scholar]
Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz. 2002, 10, 557–570. [Google Scholar] [CrossRef] [Green Version]
Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymitty. ACM Trans. Knowl. Discov. Data 2007, 1, 3–54. [Google Scholar] [CrossRef]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC), New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
Hay, M.; Rastogi, V.; Miklau, G.; Dan, S. Boosting the accuracy of differentially private histograms through consistency. arXiv 2010, arXiv:0904.0942v5. [Google Scholar] [CrossRef] [Green Version]
Xiao, X.; Wang, G.; Gehrke, J. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 2011, 23, 1200–1214. [Google Scholar] [CrossRef]
Rastogi, V.; Nath, S. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), Indianapolis, IN, USA, 6–10 June 2010; pp. 735–746. [Google Scholar]
Dwivedi, A.D.; Singh, R.; Ghosh, U.; Mukkamala, R.R.; Tolba, A.; Said, O. Privacy preserving authentication system based on non-interactive zero knowledge proof suitable for internet of things. J. Amb. Intel. Hum. Comp. 2021, in press. [Google Scholar]
Dwork, C. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP), Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
Xiao, X.; Bender, G.; Hay, M.; Gehrke, J. iReduct: Differential privacy with reduced relative errors. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD), Athens, Greece, 12–16 June 2011; pp. 229–240. [Google Scholar]
Liu, H.; Wu, Z.; Zhou, Y.; Peng, C.; Tian, F.; Lu, L. Privacy-preserving monotonicity of differential privacy mechanisms. Appl. Sci. 2018, 8, 2081. [Google Scholar] [CrossRef] [Green Version]
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), St. Petersburg, Russia, 28 May–1 June 2006; pp. 486–503. [Google Scholar]
Tang, J.; Korolova, A.; Bai, X.; Wang, X.; Wang, X. Privacy loss in Apple’s implementation of differential privacy on MacOS 10.12. arXiv 2017, arXiv:1709.02753. [Google Scholar]
Dwork, C. Differential privacy: A survey of results. In Proceedings of the International Conference on Theory and Applications of Models of Computation (TAMC), Xi’an, China, 25–29 April 2008; pp. 1–19. [Google Scholar]
Liu, H.; Wu, Z.; Peng, C.; Tian, F.; Lu, H. Adaptive gaussian mechanism based on expected data utility under conditional filtering noise. KSII Trans. Internet. Inf. 2018, 12, 3497–3515. [Google Scholar]
Kairouz, P.; Bonawitz, K.; Ramage, D. Discrete distribution estimation under local privacy. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 2436–2444. [Google Scholar]
Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Cambridge, MA, USA, 8–13 December 2014; pp. 2879–2887. [Google Scholar]
Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local privacy and statistical minimax rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar]
Hale, M.T.; Egerstedty, M. Differentially private cloud-based multi-agent optimization with constraints. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 1235–1240. [Google Scholar]
Erlingsson, U.; Pihur, V.; Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computer and Communications Security (CCS), Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
Chen, R.; Li, H.; Qin, A.K.; Kasiviswanathan, S.P.; Jin, H. Private spatial data aggregation in the local setting. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; pp. 289–300. [Google Scholar]
Ligett, K.; Neel, S.; Roth, A.; Bo, W.; Wu, Z.S. Accuracy first: Selecting a differential privacy level for accuracy-constrained ERM. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA, 4–9 December 2017; pp. 2563–2573. [Google Scholar]
Shoaran, M.; Thomo, A.; Weber, J. Differential privacy in practice. In Proceedings of the Workshop on Secure Data Management (SDM), Istanbul, Turkey, 27 August 2012; pp. 14–24. [Google Scholar]
Bassily, R.; Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the 47th annual ACM symposium on Theory of Computing (STOC), Portland, OR, USA, 14–17 June 2015; pp. 127–135. [Google Scholar]
Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy; Now Publisher: Norwell, MA, USA, 2014; pp. 28–64. [Google Scholar]
Liu, H.; Wu, Z.; Zhang, L. A Differential Privacy Incentive Compatible Mechanism and Equilibrium Analysis. In Proceedings of the 2016 International Conference on Networking and Network Applications (NaNA), Hakodate, Hokkaido, Japan, 23–25 July 2016; pp. 260–266. [Google Scholar]
Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. ACM Commun. 1970, 13, 422–426. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Blum, A.; Ligett, K.; Roth, A. A learning theory approach to noninteractive database privacy. J. ACM 2013, 60, 1–25. [Google Scholar] [CrossRef]
Huang, H.; Zhang, D.; Xiao, F.; Wang, K.; Gu, J.; Wang, R. Privacy-preserving approach PBCN in social network with differential privacy. IEEE Trans. Netw. Serv. Man. 2020, 17, 931–945. [Google Scholar] [CrossRef]
Hu, X.; Zhu, T.; Zhai, X.; Zhou, W.; Zhao, W. Privacy data propagation and preservation in social media: A real-world case study. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar]
Shin, H.; Kim, S.; Shin, J.; Xiao, X. Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Trans. Knowl. Data Eng. 2018, 30, 1770–1782. [Google Scholar] [CrossRef]
Huang, W.; Zhou, S.; Zhu, T.; Liao, Y. Privately publishing internet of things data: Bring personalized sampling into differentially private mechanisms. IEEE Internet Things 2008, 9, 80–91. [Google Scholar] [CrossRef]
Ou, L.; Qin, Z.; Liao, S.; Hong, Y.; Jia, X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Trans. Depend. Secur. Comput. 2020, 17, 1109–1123. [Google Scholar] [CrossRef]
Ren, X.; Yu, C.M.; Yu, W.; Yang, S.; Yang, X.; McCann, J.A.; Yu, P.S. LoPub: High-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2151–2166. [Google Scholar] [CrossRef] [Green Version]
Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S.; Atiquzzaman, M. Local differential privacy for deep learning. IEEE Internet Things 2020, 7, 5827–5842. [Google Scholar]
Ye, D.; Zhu, T.; Cheng, Z.; Zhou, W.; Yu, P.S. Differential advising in multiagent reinforcement learning. IEEE Trans. Cybern. 2020, in press. [Google Scholar]
Ying, C.; Jin, H.; Wang, X.; Luo, Y. Double Insurance: Incentivized federated learning with differential privacy in mobile crowdsensing. In Proceedings of the 2020 International Symposium on Reliable Distributed Systems (SRDS), Shanghai, China, 21–24 September 2020; pp. 81–90. [Google Scholar]
McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Providence, RI, USA, 21–23 October 2007; pp. 94–103. [Google Scholar]
McSherry, F.D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), New York, NY, USA, 29 June–2 July 2009; pp. 19–30. [Google Scholar]
Garofalakis, M.; Kumar, A. Wavelet synopses for general error metrics. ACM Trans. Database Syst. 2005, 30, 888–928. [Google Scholar] [CrossRef]
Vitter, J.S.; Wang, M. Approximate computation of multidimensional aggregates of sparse data using wavelets. ACM Sigm. Rec. 1999, 28, 193–204. [Google Scholar] [CrossRef]
Gini, C. Measurement of inequality of incomes. Econ. J. 1921, 31, 124–126. [Google Scholar] [CrossRef]

Figure 1. POI check-in model. Therein,

S_{i}, S_{j}

and

S_{k}

represent check-in states.

\tilde{h} (S)

and

\tilde{p}

represent the check-in counts and the check-in frequency (data distribution) in perturbation phase, respectively, while

\hat{h} (S)

and

\hat{p}

represent the check-in counts and the check-in frequency (data distribution) in construction phase, respectively. K represents a perturbation mechanism. The more details can also be seen in Section 4.3.

Figure 1. POI check-in model. Therein,

S_{i}, S_{j}

and

S_{k}

represent check-in states.

\tilde{h} (S)

and

\tilde{p}

represent the check-in counts and the check-in frequency (data distribution) in perturbation phase, respectively, while

\hat{h} (S)

and

\hat{p}

represent the check-in counts and the check-in frequency (data distribution) in construction phase, respectively. K represents a perturbation mechanism. The more details can also be seen in Section 4.3.

Figure 2. Pareto distribution.

Figure 3. Point belief degree

(C_{ϵ_{e}})

with

η = 0.1

.

Figure 3. Point belief degree

(C_{ϵ_{e}})

with

η = 0.1

.

Figure 4. Regional average belief degree

(C_{R e g i o n (ϵ_{e})})

with

η = 0.1

.

Figure 4. Regional average belief degree

(C_{R e g i o n (ϵ_{e})})

with

η = 0.1

.

Figure 5. The average data distribution of Brightkite VS. Gowalla.

Figure 6. The mean and deviation of

e r r (p, \hat{p})

on B1, B2 and B3 subsets of Brightkite. Three

η

settings of EDU, including

0.1

,

0.08

and

0.05

, are compared. (a) B1; (b) B2; (c) B3.

Figure 6. The mean and deviation of

e r r (p, \hat{p})

on B1, B2 and B3 subsets of Brightkite. Three

η

settings of EDU, including

0.1

,

0.08

and

0.05

, are compared. (a) B1; (b) B2; (c) B3.

Figure 7. The mean and deviation of

e r r (p, \hat{p})

on G1, G2 and G3 subsets of Gowalla. Three

η

settings of EDU, including

0.1

,

0.08

and

0.05

, are compared. (a) G1; (b) G2; (c) G3.

Figure 7. The mean and deviation of

e r r (p, \hat{p})

on G1, G2 and G3 subsets of Gowalla. Three

η

settings of EDU, including

0.1

,

0.08

and

0.05

, are compared. (a) G1; (b) G2; (c) G3.

Figure 8. The belief degree on B1, B2 and B3 subsets of Brightkite, where the belief degree includes the point belief degree and the regional average belief degree. (a) B1; (b) B2; (c) B3.

Figure 9. The belief degree on G1, G2 and G3 subsets of Gowalla, where the belief degree includes the point belief degree and the regional average belief degree. (a) G1; (b) G2; (c) G3.

Figure 10. The minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

based on the same EDU (

η

) and different data distributions, where

C_{ϵ_{e}}

is the point belief degree on the EPP of

ϵ_{e}

, and moreover,

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

Figure 10. The minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

based on the same EDU (

η

) and different data distributions, where

C_{ϵ_{e}}

is the point belief degree on the EPP of

ϵ_{e}

, and moreover,

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

Figure 11. The minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

based on the same data distribution and different EDU (

η

), where

C_{ϵ_{e}}

is the point belief degree on the EPP of

ϵ_{e}

, and moreover,

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

Figure 11. The minimum

ϵ_{e}

with

C_{ϵ_{e}} > 0

based on the same data distribution and different EDU (

η

), where

C_{ϵ_{e}}

is the point belief degree on the EPP of

ϵ_{e}

, and moreover,

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

Figure 12. The maximum difference of

C_{(R e g i o n (ϵ_{e}))}

in EXP

_{Q}

minus

C_{(R e g i o n (ϵ_{e}))}

in KRR with different

η

, where

C_{(R e g i o n (ϵ_{e}))}

is the regional average belief degree on the region of

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 4]

or

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 10]

, and

η = 0.1

,

0.08

and

0.05

represents three kinds of EDU.

Figure 12. The maximum difference of

C_{(R e g i o n (ϵ_{e}))}

in EXP

_{Q}

minus

C_{(R e g i o n (ϵ_{e}))}

in KRR with different

η

, where

C_{(R e g i o n (ϵ_{e}))}

is the regional average belief degree on the region of

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 4]

or

R e g i o n (ϵ_{e}) = [1, 1.001, 1.002, \dots, 10]

, and

η = 0.1

,

0.08

and

0.05

represents three kinds of EDU.

Table 1. Comparison of existing literature with the method proposed in this paper.

Selected Papers	Mechanism	Utility First	Privacy First	Privacy Metrics	Utility Metrics	EPP & EDU Involved
Katrina et al. [30]	Laplace	Yes	No	Central DP	Absolute error	No
Liu et al. [23]	Gauss with conditional filtering noise	Yes	No	Central DP	Relative error	Yes, but it may not provide EPP as much as possible
Maryam et al. [31]	Laplace	No	Yes	Central DP	Relative error	No
Xiao et al. [18]	Laplace	No	Yes	Central DP	Relative error	No
Kairouz et al. [25]	W-RR	No	Yes	LDP	KL divergence	No
Erlingsson et al. [28]	RAPPOR	No	Yes	LDP	Standard deviation	No
Bassily et al. [32]	S-Hist	No	Yes	LDP	Absolute error	No
Chen et al. [29]	PCEP	No	Yes	PLDP	KL divergence/relative error	No
Kairouz et al. [24,25]	KRR	No	Yes	LDP	KL divergence	No
Our paper	EXP $_{Q}$	Yes	No	(Local) B-DP	Relative error	Yes, and it provides EPP as much as possible

Table 2. Notations.

Symbol	Description
EDU	Expected data utility
EPP	Expected privacy protection
$ε_{e}$	Expected privacy budget
Region( $ϵ_{e}$ )	Expected privacy protection region around $ϵ_{e}$
$η$	Expected data utility
$ϵ_{η}$	The privacy budget of a differential privacy mechanism that just meets the expected data utility $η$
$C_{ϵ_{e}}$	Point belief degree of $ϵ_{e}$
$C_{R e g i o n (ϵ_{e})}$	Regional average belief degree of Region( $ϵ_{e}$ )
$p$	Original data distribution
$\tilde{p}$	Perturbed data distribution
$\hat{p}$	Estimated data distribution
S	Check-in state space
$h (S)$	Original check-in counts vector
$\tilde{h} (S)$	Perturbed check-in counts vector
$\hat{h} (S)$	Estimated check-in counts vector
Q	Perturbation probability matrix
$q_{i j}$	The perturbation probability of the original check-in state $S_{j}$ to the check-instate $S_{i}$
KRR	k-ary randomized response mechanism
EXP $_{Q}$	Perturbation mechanism
$γ$	Privacy setting parameter
$γ_{η}$	Privacy setting parameter with satisfying the expected data utility $η$
$κ_{n}$	The parameter of privacy protection intensity change point
w	Modified estimate parameter
Re $_{t h r e d t h o l d}$	Update threshold parameter
$e r r (p, \hat{p})$	The maximum relative error between $p$ and $\hat{p}$

Table 3. Gini coefficient VS. Pareto parameter of

θ

.

Table 3. Gini coefficient VS. Pareto parameter of

θ

.

Pareto Distribution	$θ$	Gini Coefficient
P1	1.55	0.4471
P2	1.17	0.3884
P3	0.52	0.2784

Table 4. Gini coefficient of data in Brightkite and Gowalla.

Datasets	Data Distributions	Gini Coefficient
Gowalla	G1	0.2357
	G2	0.3488
	G3	0.4465
Brightkite	B1	0.2849
	B2	0.3488
	B3	0.4628

Table 5. All kinds of modified estimate parameter w used in the dynamic algorithm.

Mechanisms	Brightkite			Gowalla
Mechanisms	B1	B2	B3	G1	G2	G3
EXP $_{Q}$	0.045	0.06	0.15	0.18	0.25	0.4
KRR	0.035	0.055	0.15	0.18	0.25	0.4

Table 6.

ϵ_{η}

on each subset, where

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

Table 6.

ϵ_{η}

on each subset, where

η = 0.1

,

0.08

and

0.05

represent three kinds of EDU.

$η$	Mechanisms	Data Distributions
$η$	Mechanisms	B1	B2	B3	G1	G2	G3
0.1	EXP $_{Q}$	1.738	1.874	6.627	2.928	4.429	5.774
0.1	KRR	1.55	1.83	6.71	2.65	4.09	5.695
0.08	EXP $_{Q}$	2.017	2.128	6.967	3.204	4.688	6.072
0.08	KRR	1.91	2.01	7.065	2.945	4.505	5.955
0.05	EXP $_{Q}$	2.568	2.751	7.995	3.858	5.451	7.041
0.05	KRR	2.435	2.57	7.95	3.66	5.375	6.995

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Xu, Z.; Chen, J.; Jia, S. B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy 2022, 24, 404. https://doi.org/10.3390/e24030404

AMA Style

Chen Y, Xu Z, Chen J, Jia S. B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy. 2022; 24(3):404. https://doi.org/10.3390/e24030404

Chicago/Turabian Style

Chen, Youqin, Zhengquan Xu, Jianzhang Chen, and Shan Jia. 2022. "B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy" Entropy 24, no. 3: 404. https://doi.org/10.3390/e24030404

APA Style

Chen, Y., Xu, Z., Chen, J., & Jia, S. (2022). B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy, 24(3), 404. https://doi.org/10.3390/e24030404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy

Abstract

1. Introduction

1.1. Our Contributions

1.2. Outline

2. Related Work

3. Preliminaries

3.1. Differential Privacy (DP)

3.2. KRR Mechanism

3.3. Utility Metrics

4. Problem Formulations

4.1. System Model

4.2. The Related Concepts of B-DP

4.3. Model Symbolization

5. Design and Implementation of B-DP Mechanism

5.1. B-DP Mechanism Based on KRR

5.2. B-DP Mechanism Based on EXP Q

5.3. Implementation of B-DP Machanism

5.4. Case Analysis of Point Belief Degree and Regional Average Beleif Degree

6. B-DP Dynamic Collection and Publishing Algorithm Design

7. Experimental Evaluation of B-DP Dynamic Collection and Publishing Algorithm

7.1. Experimental Settings

7.2. Validity and Robustness Evaluation

7.3. Utility and Privacy Evaluation

8. Discussions and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 5

Appendix B. Proof of A1

Appendix C. Proof of Theorem A2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2. B-DP Mechanism Based on EXP $_{Q}$