Next Article in Journal
Tipping the Balance: A Criticality Perspective
Next Article in Special Issue
Medical Image Authentication Method Based on the Wavelet Packet and Energy Entropy
Previous Article in Journal
A Novel Approach to the Partial Information Decomposition
Previous Article in Special Issue
Blind and Secured Adaptive Digital Image Watermarking Approach for High Imperceptibility and Robustness
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy

1
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
2
College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
3
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(3), 404; https://doi.org/10.3390/e24030404
Submission received: 11 February 2022 / Revised: 5 March 2022 / Accepted: 8 March 2022 / Published: 14 March 2022
(This article belongs to the Special Issue Adversarial Intelligence: Secrecy, Privacy, and Robustness)

Abstract

:
Differential privacy (DP) has become a de facto standard to achieve data privacy. However, the utility of DP solutions with the premise of privacy priority is often unacceptable in real-world applications. In this paper, we propose the best-effort differential privacy (B-DP) to promise the preference for utility first and design two new metrics including the point belief degree and the regional average belief degree to evaluate its privacy from a new perspective of preference for privacy. Therein, the preference for privacy and utility is referred to as expected privacy protection (EPP) and expected data utility (EDU), respectively. We also investigate how to realize B-DP with an existing DP mechanism (KRR) and a newly constructed mechanism (EXP Q ) in the dynamic check-in data collection and publishing. Extensive experiments on two real-world check-in datasets verify the effectiveness of the concept of B-DP. Our newly constructed EXP Q can also satisfy a better B-DP than KRR to provide a good trade-off between privacy and utility.

1. Introduction

The explosive progress of mobile Internet and location technology, LBS (Location Based Service) applications, including Brightkite, Gowalla, Facebook and other social network platforms, generate a large number of check-in data every day. Check-in data generally include information such as time, locations, PoI (Points of Interest) attributes, mood and comments, and hence the check-in data has become a carrier of a user’s life trajectory and interest tendency [1,2,3,4]. However, a data analyst’s mining and analysis of the check-in data may directly or indirectly expose the sensitive information of a data provider [5,6,7,8,9]. There have been many privacy protection methods [10,11,12,13,14,15,16]. Some of them [10,11] rely on specific attack assumptions and background knowledge, and some methods [12,13,14,15] are based on differential privacy (DP) [17]. DP provides provable privacy protection, which is independent of the background knowledge and computational power of an attacker. The protection level of DP is evaluated by privacy budget [17]. When the privacy budget is relatively small, it has strong privacy protection, but the utility is often poor [17]. With the gradual integration of DP on practical applications, utility has become the bottleneck of its development and popularization.
In general, there is a contradiction between privacy and utility and it is necessary to be a trade-off [18,19]. In [19], the authors discussed a monotone trade-off in the semi-honest model. Therein, when the utility becomes worse, the privacy protection becomes stronger, and on the other hand, when the utility gets better, the privacy protection gets weaker. In many other DP theoretical studies, including strict ϵ -DP [17] and relaxed ( ϵ , δ ) -DP [20], they often provide privacy priority and then make data more available or the best available, which is a kind of trade-off with satisfying utility as much as possible under the privacy guarantee. Unfortunately, the applications of DP in real-world do not seem to follow this principle completely. One of the best examples is the four applications in Apple’s MacOS Sierra (version 10.12), i.e., Emojis, New words, Deeplinks and Lookup Hints. When they collect the data, the privacy budget is set to only 1 or 2 per each datum, but the overall privacy budget for the four applications is as high as 16 per day [21]. Furthermore, Apple renews the available privacy budget every day, which would result in a potential privacy loss of 16 times the number of days that a user participated in DP data collection for the four applications [21]. It is far beyond the reasonable protection scope of DP [22].
Based on the above facts, when there exists a contradiction between privacy and utility, privacy is no longer a priority as suggested in the DP theoretical studies, but the most desirable way is to balance the preference for privacy and utility, where the preference for privacy and utility is referred to as expected privacy protection (EPP) and expected data utility (EDU), respectively. However, few researchers have proposed solutions to reasonably balance EPP and EDU except the authors in [23]. They proposed an adaptive DP and its mechanisms in a rational model, which can achieve a balance between the approximate EDU and the EPP by adding conditional filtering noise [23]. If the privacy protection intensity under the balance of the approximate EDU that is satisfied by the data analyst is not the expectation of the data provider, then it still cannot meet the EPP of the data provider. In addition, the absolute value range of the conditional filtering noise belongs to (0.5,1.5), which makes it easy to be attacked by background knowledge. Therefore, best-effort differential privacy (B-DP) is proposed to make the EDU satisfied first and then the EPP satisfied as much as possible in this paper. We face the following two basic challenges at least.
  • If the EDU is to be satisfied first, then privacy protection may be no longer to be guaranteed by DP, how does it evaluate the guarantee degree of satisfying EPP as much as possible under B-DP?
  • If there is a reasonable metric for the guarantee degree of satisfying EPP as much as possible under B-DP, does it exist an implementation mechanism (or algorithm) to realize B-DP?
With the challenges of B-DP above, this paper explores a typical application with dynamic collection and publishing of continuous check-in data, where the check-in scenario is a semi-honest model with an honest but curious data collector. Each check-in user visiting a POI generates a check-in state and perturbs his check-in state to a POI Center for his privacy protection, where the POI Center is a data collector. The frequency of check-in users are calculated by POI Center according to the received check-in states, which is approximately the check-in data distribution and used for publishing to data analysts. We assume that one check-in state is perturbed to only one check-in state and each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing, and moreover, the privacy to be protected is the check-in state of a user and the utility to be realized is the distribution of the check-in data with relative error as its metrics (see Section 4.1 for more details). In fact, since the relative error is used as a metric of the published distribution, it needs a distribution dependent privacy protection mechanism (or implementation) in order to satisfy the EPP as much as possible under the constraint of EDU. In addition, since each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing, it needs a algorithm to make the privacy protection under the constraint of EDU to be satisfied continuously as much as possible in the process of dynamic publishing. Therefore, this mechanism or algorithm will be proposed from a new perspective, which is different from the existing methods in literature.

1.1. Our Contributions

The main contributions of this paper are concluded as follows.
  • A privacy protection concept of B-DP and two metrics of privacy guarantee degree are put forward. B-DP discussed in this paper is an expansion of the concept of DP, which can satisfy the EDU first and then provide the EPP as much as possible to be usefull for real-world applications. It uses two new metrics including the point belief degree (see Definition 4) and the regional average belief degree (see Definition 5) to quantify the degree of privacy protection for any expected privacy budget (see Section 4.2), rather than for DP itself by the privacy budget ϵ to evaluate only one EPP with the expected privacy budget equal to ϵ . In addition, the regional average belief degree can be used as the average guarantee degree of the EPP in a region including multiple expected privacy budgets. To the best of our knowledge, it is a new discussion and definition of B-DP that is different from the existing literature, and it uses two new metrics to explore and analyze the performance of privacy from a new perspective of the preference for privacy.
  • An EXP Q mechanism is proposed (see Definition 10). The newly constructed EXP Q mechanism can be used to the categorical data for privacy protection, which smartly alters the privacy budget based on its probability in the data distribution to make itself to realize a better B-DP compared to the existing KRR mechanism [24,25]. Thereby, it also verifies that B-DP can be better realized to provide a good trade-off between privacy and utility.
  • The dynamic algorithm with the implementation algorithms of two perturbation mechanisms is proposed to realize the dynamic collection and publishing of continuous check-in data and meanwhile to satisfy B-DP. The two perturbation mechanisms include the newly constructed EXP Q and a classical DP mechanism KRR [25,26] (a simple local differential privacy (LDP) mechanism). We take KRR as an example to show how to realize B-DP based on the existing DP mechanisms for the categorical data. Moreover, the number of domain values of both KRR and EXP Q is more than 2 and both the randomized algorithms based on them only take one value as input and one value as output. In addition, the dynamic algorithm can also be used to other applications of social behavior except check-in data.

1.2. Outline

The remainder of this paper is organized as follows: Section 2 summarizes the related work on the trade-off methods, utility metrics of relative error and LDP mechanisms. Section 3 presents conceptual background of DP and details of KRR mechanism and utility metrics. Section 4 introduces the system model, the relevant definitions of B-DP, including two metrics of the guarantee degree, etc., and model symbolization of the check-in data. Section 5 introduces the design and implementation of B-DP mechanisms and Section 6 describes the design of B-DP mechanism algorithm in the dynamic collection and publishing. Section 7 provides the experimental evaluation of the dynamic collection and publishing algorithm based on both two B-DP mechanisms. Finally, we provide a discussion and conclusion in Section 8.

2. Related Work

DP has become a research hotspot in the field of privacy protection since Dwork [12] proposed it in 2006. The model of DP starts from the traditional centralization [15,18], gradually grows to be distributed [27], and develops to be localization [24,28] and even to be personalized localization [29] and so on. It is not only the evolution process of DP technique, but also the comprehensive embodiment of the gradual integration of DP technique with real-world applications. However, no matter how it evolves, the two themes running through DP are privacy and utility [18], which is also focused by this paper. Table 1 summarizes the mainly related work from the pespective of privacy and utility priority as well as their metrics, the used privacy mechanism and the focusing problem with EPP and EDU. It will be divided into three categories to show its details.
  • Trade-off model with utility first. The majority of DP research is based on the trade-off model with privacy first, while there are few relevant ones on the trade-off model with utility first. Therein, Katrina et al. [30] proposed a generalized “noise reduction” framework based on the modified “Above Threshold” algorithm [33] to minimize the empirical risk of privacy (ERM) on the premise of utility priority, but the scheme is only applicable to the framework that minimizes the empirical risk of privacy, where the privacy minimized may not be able to meet the EPP. Liu et al. proposed firstly that DP satisfies the monotonic trade-off between privacy and utility and its associated bounded monotone trade-off under the semi-honest model. They showed that there is no trade-off under the rational model, while unilateral trade-off could lead to utility disaster or privacy disaster [18,23,34]. They also presented an adaptive DP and its mechanisms under the rational model, which can realize the trade-off between approximately EDU and EPP by adding conditional filtering noise [23], but the mechanisms are probably not able to meet the expectation of data provider for privacy protection and are easily attacked by background knowledge because of the adding conditional filtering noise. Most importantly, the above two utility-first research [23,30] do not provide a quantitative metrics of the unmet privacy protection or the unmet degree of EPP, whereas this paper presents two detailed quantitative metrics including the point belief degree and the regional average belief degree to evaluate the privacy from a new perspective of preference for privacy.
  • Utility metrics of relative error. Maryam et al. [31] presented DP in real-world applications, which discussed how to add Laplace [12] noise from a view of utility. They studied the relationship between the cumulative probability of noise and the privacy level in Laplace mechanism and combined with the relative error metrics to discuss how to use a DP mechanism reasonably without losing the established utility. However, the literature does not delve into the details that how the guarantee degree of privacy protection will be changed when utility is satisfied. Xiao et al. [18] presented a DP publishing algorithm on a batch query using resampling technique of correlation noise to reduce noise added and improve data utility. When the algorithm picks the priority items each time, it is based on the intermediate results with noise, and the intermediate results with noise are not enough to reflect the original order of data. In this way, there is a bias in adjusting the privacy budget allocation, which may cause the query items that should be optimized to be not optimized, thus affecting the utility of published data. However, the literature is a classical example of optimizing utility with privacy first, which runs counter to the theme of this paper. In addition, the above two schemes are essentially based on the central DP and use continuous Laplace mechanism, which are different from the LDP (discrete) data statistics and release required by the check-in application in this paper. Therefore, these schemes cannot be directly applied to the applications this paper considers.
  • LDP mechanisms. In 1965, Warner first proposed the randomized response technique (W-RR) to collect statistical data on sensitive topics and keep the sensitive data of contributing individuals confidential [35]. Although W-RR can strictly satisfy ϵ -LDP [25] in one survey statistics, multiple collections on the same survey individuals will weaken the privacy protection intensity [12]. Therefore, Erlingsson et al. [28] used a double perturbation scheme combining permanent randomized response with instantaneous randomized response, namely, RAPPOR, to expand the application of W-RR, and it has been used by Google in Chrome browser to collect users’ behavior data. In addition, RAPPOR also uses Bloom Filter technology [36] as the encoding method, which maps the statistical attributes into a binary vector. Finally, the mapping relation and Lasso regression method [37] are combined to reconstruct the frequency statistics corresponding to the original attribute string. Due to the high communication cost of RAPPOR, Bassily et al. [32] proposed the S-Hist method. In the method, each user first encodes his attributes, then randomly selects one of the bits and uses the randomized response technique to perturb it, and finally sends the result of the perturbation to the data collector, so as to reduce the communication cost. Chen et al. [29] proposed a PCEP mechanism and designed a PLDP (personalized LDP) applied to spatial data with it, aiming to protect the users’ location information and count the number of users in the area. Therein, the privacy budget of the scheme is determined by the users’ personalization, and hence the utility depends on the users’ individual behavior settings. In addition, the mechanism combines the S-Hist [32] method and adopts the random projection technique [38]. Although it can greatly reduce the communication cost, it still has the problem of unstable query precision. Based on the check-in application with multiple check-in spots in this paper, the KRR mechanism [24,25] just easily fits this application with no prior data distribution knowledge, but it is not very good for B-DP. In addition, DP has already been studied in these applications, such as social networks [39,40], recommender systems [41], data publishing [42,43,44], deep learning [45], reinforcement learning [46] and federated learning [47].

3. Preliminaries

In this section, the key notations used in this paper are given in Table 2.

3.1. Differential Privacy (DP)

Differential privacy (DP), broadly speaking, is a privacy protection technique that does not depend on an attacker’s background knowledge and computational power [17,20,48]. It can be generally divided into central DP and LDP depending on whether it is based on a trusted data collector [33]. The formal definitions of these two types of DP are given as follows.
Definition 1
(( ϵ , δ )-(Central) DP [17,20]). A randomized algorithm M and a set S of all possible outputs of M, for a given dataset D and any adjacent dataset D that differ on at most one record, if M satisfies the following inequality, then it is said that M satisfies ( ϵ , δ ) -(central) DP.
P [ M ( D ) S ] e ϵ × P [ M ( D ) S ] + δ ,
where P [ · ] represents the risk of privacy disclosure and is controlled by the randomness of algorithm M, the parameter ϵ is called privacy budget that represents the level of privacy protection, and δ represents the probability of failure to satisfy ϵ-(central) DP. When δ = 0 , M satisfies the ϵ-(central) DP.
Definition 2
( ( ϵ , δ ) -LDP [25,26]). A randomized algorithm K , for a given dataset χ, any x , x χ and any y R a n g e ( K ) , is said to satisfy ( ϵ , δ ) -LDP if K satisfies
P [ K ( x ) = y ] e ϵ × P [ K ( x ) = y ] + δ ,
where P [ · ] , ϵ and δ have the similar meanings as above in Definition 1.
In the check-in application of this paper, the POI Center is an honest and curious data collector, even if the POI Center or other attackers can obtain the check-in state submitted by a user, they cannot conclusively infer the original check-in state of the user. If K can satisfy ( ϵ , δ ) -LDP to protect the check-in state of the user, then it needs to meet the following definition.
Definition 3
(Check-in state of ( ϵ , δ ) -LDP). A user u generates a check-in in a POI, whose check-in state variable is denoted as s u with s u { S 1 , S 2 , , S n } . Assume that the original check-in state of u is S j or S j for j , j [ 1 , n ] . Moreover, S j and S j generate the same check-in state S i for i [ 1 , n ] after being perturbed by a randomized algorithm K , respectively, and the perturbed check-in state variable is s ˜ u with s ˜ u { S 1 , S 2 , , S n } . If there exists an ϵ R + such that K satisfies the following constraints for i , j , j [ 1 , n ] ,
P ( s ˜ u = S i | s u = S j ) e ϵ P ( s ˜ u = S i | s u = S j ) + δ ,
where, P ( s ˜ u = S i | s u = S j ) and P ( s ˜ u = S i | s u = S j ) are the perturbation probabilities of the original check-in states S j and S j to the check-in state S i , respectively, then K will enable the check-in state to satisfy ( ϵ , δ ) -LDP. When δ = 0 , K satisfies ϵ-LDP.
Property 1
(Parallel composition [49]). Assume that randomized algorithms are K 1 , K 2 , , K n and their privacy budgets are ϵ 1 , ϵ 2 , , ϵ n , respectively, then for the disjoint datasets D 1 , D 2 , , D n , an algorithm K ( K 1 ( D 1 ) , K 2 ( D 2 ) , , K n ( D n ) ) provides max ( ϵ i ) -(local) DP, and the level of privacy protection it provides depends on the largest privacy budget.

3.2. KRR Mechanism

KRR is a LDP mechanism [24,25], which satisfies the following probability distribution,
P ( y | x ) = 1 e ϵ + k 1 e ϵ , y = x 1 , y x
where x , y χ and | χ | = k .
KRR is a more general form of the randomized response mechanism of W-RR, that is, when k = 2 , KRR degenerates into W-RR.

3.3. Utility Metrics

In this paper, the worst relative error of POIs in check-in statistics will be used to measure the overall utility of the check-in application, where the calculation formula of the relative error of P O I i is as follows,
e r r ( r i , r i ) = | r i r i | m a x { r i , ϕ } ,
where r i is the estimated result of P O I i check-in statistics after LDP protection, r i is the real check-in result of the P O I i , and the parameter ϕ is a constant to avoid the situations that r i = 0 causes the denominator to be 0 or r i is too small [18,50,51]. For the convenience of analysis, this paper will use the relative root mean square error for the utility metrics approximately, the specific formula is as follows,
e r r ( r i , r i ) = ξ ( r i , r i ) m a x { r i , ϕ } ,
where ξ ( r i , r i ) is the expectation of the mean square error between the real statistical result r i and the statistical estimate result r i after LDP protection, and the parameter ϕ is defined as above. Then, the formula for calculating the maximum relative error of n POIs is as follows,
e r r ( r , r ) = m a x ( e r r ( r i , r i ) ) = m a x ( ξ ( r i , r i ) m a x { r i , ϕ } ) .
As above, r i , r i not only can represent data distribution, but also can represent frequency or counts.

4. Problem Formulations

4.1. System Model

As shown in Figure 1, there are three types of participants, namely, check-in users (data providers), POI Center (data collector), and data analysts (for example, POI managers) in the check-in model. Each check-in user visiting a POI generates a check-in state and sends it to POI Center through a terminal with the check-in APP, where each check-in state corresponds to a count and the check-in state belongs to catagory data. POI Center calculates the counts and frequency of check-in users’ visiting POIs according to the received check-in states, where frequency is approximately the check-in data distribution and used for publishing to data analysts. In addition, it is assumed that each check-in user is independent of each other and only one check-in state is submitted by a check-in user in one publishing. It is also assumed that the check-in scenario is a semi-honest model, in which the POI Center is an honest but curious data collector, and the check-in state of a user is sensitive. Hence, a user will adopt a perturbation mechanism (for example, LDP mechanism) to perturb his check-in state for his privacy protection, and then sends it to the POI Center. Therein, it is assumed that one check-in state is perturbed to only one check-in state.
In this paper, we focus on the dynamic collection and publishing of continuous check-in data with both privacy and utility requirements, where the privacy to be protected is the check-in state of a user and the utility to be realized is the distribution of the check-in data with relative error as its metrics. Therein, the privacy refers to EPP, which is the preference for privacy of a user, and the utility refers to EDU, which is the preference for utility of a data analyst. Moreover, each publishing is required to satisfy the EDU first and then satisfy EPP as much as possible in the dynamic publishing. Thereby, we adopt B-DP based on the LDP model including perturbation, aggregation, reconstruction and publishing, and we also need to have the process of initializing or updating the perturbation mechanism K at least to make every publishing to satisfy the EPP as much as possible under the EDU satisfied first in the dynamic publishing, as shown in Figure 1.

4.2. The Related Concepts of B-DP

In the concept of best-effort differential privacy (B-DP), there is an expected privacy protection (EPP) and an expected data utility (EDU), respectively. When the two cannot be satisfied simultaneously, the EDU should be satisfied first and the EPP should be satisfied as much as possible. Since the protection level of DP is evaluated by privacy budget [17], the preference for privacy also refers to the preference for the privacy budget in the B-DP. Hence, the EPP refers to a data provider’s preference for the privacy budget and we define this privacy budget as the expected privacy budget symbolized as ϵ e . We use R e g i o n ( ϵ e ) to symbolize the expected privacy protection region, which refers to a data provider’s preference for a region including multiple expected privacy budgets.
We use η to symbolize the EDU. In this paper, the expectation of the maximum relative error of Formula (7) is used to measure data utility. When the expectation of the maximum relative error of Formula (7) is less than or equal to η , it means that the EDU is satisfied; when equal, it means that the EDU is just satisfied. The privacy budget of a DP mechanism that just satisfies the EDU η is symbolized as ϵ η .
Definition 4
( C ϵ e -Point belief degree). It defines the guarantee degree of EPP under the expected privacy budget ϵ e , which can be provided by the ϵ η -DP mechanism, as the point belief degree, and the symbol is denoted as C ϵ e . Moreover, C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) , where n represents the number of POIs in check-in application, p ˜ i represents the probability of P O I i perturbed by ϵ η -DP mechanism, ϵ i represents the actual privacy budget of P O I i , and χ ( ϵ i , ϵ e ) represents an indicator function for whether the EPP is satisfied, which is defined as follows,
χ ( ϵ i , ϵ e ) = 0 , ϵ i > ϵ e 1 , ϵ i ϵ e .
Definition 5
( C R e g i o n ( ϵ e ) -Regional average belief degree). The average guarantee degree of the EPP under the expected privacy protection region R e g i o n ( ϵ e ) , which can be provided by the ϵ η -DP mechanism, is defined as the regional average belief degree, and the symbol is denoted as C R e g i o n ( ϵ e ) . When R e g i o n ( ϵ e ) = { ϵ e 1 , ϵ e 2 , , ϵ e K } and ϵ e 1 < ϵ e 2 < < ϵ e K for K 2 , it defines
C R e g i o n ( ϵ e ) = 1 ϵ e K ϵ e 1 k = 1 K 1 ( ϵ e k + 1 ϵ e k ) C ϵ e k ,
where C ϵ e k can refer the definition of point belief degree.
Definition 6
( ( ϵ η , C ϵ e ) -B-DP). The DP mechanism that just satisfies the EDU η with the point belief degree C ϵ e of the expected privacy budget ϵ e is defined as ( ϵ η , C ϵ e ) -B-DP. Therein, the ( ϵ η , C ϵ e ) -B-DP, where the point belief degree C ϵ e is maximum, is defined as ( ϵ η , C ϵ e ) -Best-B-DP.
Definition 7
( ( ϵ η , C R e g i o n ( ϵ e ) ) -B-DP). The DP mechanism that just satisfies the EDU η with the regional average belief degree C R e g i o n ( ϵ e ) of the expected privacy protection region R e g i o n ( ϵ e ) is defined as ( ϵ η , C R e g i o n ( ϵ e ) ) -B-DP. Therein, the ( ϵ η ,   C R e g i o n ( ϵ e ) ) -B-DP, where the regional average belief degree C R e g i o n ( ϵ e ) is maximum, is defined as ( ϵ η , C R e g i o n ( ϵ e ) ) -Best-B-DP.
Note that, generally, B-DP includes both central B-DP and local B-DP, which depends on whether it is based on a trusted data collector the same as the DP. This paper focuses on local B-DP.

4.3. Model Symbolization

Let P O I i with i [ 1 , n ] represent n POIs in check-in scenario, and the check-in state space is S = { S 1 , S 2 , , S n } where S i is the check-in state of P O I i . Let s u , s ˜ u , s ^ u S be variables of the original check-in state, the perturbed check-in state and the estimated check-in state of the user u, respectively. Let p , p ˜ and p ^ be the probability distributions of the original check-ins, the perturbed check-ins and the estimated check-ins, respectively, where p = [ p 1 , p 2 , , p n ] T , p ˜ = [ p ˜ 1 , p ˜ 2 , , p ˜ n ] T and p ^ = [ p ^ 1 , p ^ 2 , , p ^ n ] T . Assume that it is the same probability distribution law for all the users, that is, p i = P ( s u = S i ) , p ˜ i = P ( s ˜ u = S i ) and p ^ i = P ( s ^ u = S i ) for any i [ 1 , n ] and u. h ( S ) = [ h ( S 1 ) , h ( S 2 ) , , h ( S n ) ] T , h ˜ ( S ) = [ h ˜ ( S 1 ) , h ˜ ( S 2 ) , , h ˜ ( S n ) ] T and h ^ ( S ) = [ h ^ ( S 1 ) , h ^ ( S 2 ) , , h ^ ( S n ) ] T represent the original check-in counts vector, the perturbed check-in counts vector and the estimated check-in counts vector with m N + users, respectively.
Definition 8
(Random perturbation and perturbation probability matrix Q). The process for any user u to change check-in state from S j to S i with a certain perturbation probability is called random perturbation, and the perturbation probability is denoted as q i j = P ( s ˜ u = S i | s u = S j ) with S i , S j S . The matrix composed of q i j for any i , j [ 1 , n ] is called the perturbation probability matrix Q, where Q = ( q i j ) n × n and i = 1 n q i j = 1 for any j [ 1 , n ] .
Therefore, the perturbed probability distribution p ˜ , the original probability distribution p and the perturbation probability matrix Q have the following relationship
p ˜ = Q p .
From Equation (10), it can be seen that p i ˜ = j = 1 n q i j p j for any i [ 1 , n ] . Obviously, p i ˜ and p i are not always equal, and hence the result of the perturbation is biased. Assume that Q is always reversible and its inverse matrix is defined as R = Q 1 = ( r i j ) n × n . Therefore, it can get the following theorem.
Theorem 1.
The check-in counts vector h ( S ) is perturbed by the perturbation probability matrix Q to obtain the perturbed check-in counts vector h ˜ ( S ) and then it is corrected by the inverse matrix R. The estimated check-in counts vector h ^ ( S ) = R h ˜ ( S ) satisfies E [ h ^ ( S ) ] = h ( S ) .
Proof. 
Since E [ h ^ ( S ) ] = E [ R h ˜ ( S ) ] = E [ R Q h ( S ) ] , and R Q = I , it has E [ h ^ ( S ) ] = E [ h ( S ) ] = h ( S ) . Therefore, the result follows. □
Theorem 1 states that the estimated check-in counts vector h ^ ( S ) obtained after the correction of the inverse matrix R is an unbiased estimate of the original check-in counts vector h ( S ) .
Here, the relative root mean square error e r r ( h ( S i ) , h ^ ( S i ) ) of the original check-in counts h ( S i ) and the estimated check-in counts h ^ ( S i ) for n POIs and m users on P O I i can be calculated as follows,
e r r ( h ( S i ) , h ^ ( S i ) ) = V a r [ h ^ ( S i ) ] E [ h ^ ( S i ) ] ,
where ϕ = 1 and V a r [ h ^ ( S i ) ] can be calculated as follows,
V a r [ h ^ ( S i ) ] = j = 1 n r i j 2 ( k = 1 n q j k h ( S k ) ) h ( S i ) .
Theorem 2.
The relative root mean square error between the original probability p i of P O I i check-ins and the estimated probability p ^ i of P O I i check-ins is e r r ( p i , p ^ i ) = V a r [ h ^ ( S i ) ] E [ h ^ ( S i ) ] , where i [ 1 , n ] .
Proof. 
The relative root mean square error between the original probability p i of P O I i check-ins and estimated probability p ^ i of P O I i check-ins can be represented as follows,
e r r ( p i , p ^ i ) = E [ ( p i p ^ i ) 2 ] p i = e r r ( h ( S i ) , h ^ ( S i ) ) .
Therefore, e r r ( p i , p ^ i ) = V a r [ h ^ ( S i ) ] E [ h ^ ( S i ) ] , where i [ 1 , n ] .  □
According to Formulas (11)–(13), it can be known e r r ( p i , p ^ i ) , Q and p are related.
If m a x ( e r r ( p i , p ^ i ) ) = η means that the EDU is just satisfied, then Q should satisfy the following constraints according to B-DP.
m a x ( e r r ( p i , p ^ i ) ) η , i [ 1 , n ] , q i j e ϵ i q i j , i , j , j [ 1 , n ] , ϵ i 0 , i = 1 n q i j = 1 , j [ 1 , n ] .
Assume that there is a randomized algorithm K with a perturbation probability matrix Q, which can provide the expected privacy budget ϵ e with the point belief degree C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) . Then, if K wants to satisfy ( ϵ η = m a x ( ϵ i ) , C ϵ e ) -Best-B-DP, it should still need to maximize C ϵ e . Therefore, K should satisfy the following optimization problem.
m a x i m i z e Q C ϵ e s . t . m a x ( e r r ( p i , p ^ i ) ) η , i [ 1 , n ] , q i j e ϵ i q i j , i , j , j [ 1 , n ] , ϵ i 0 , i = 1 n q i j = 1 , j [ 1 , n ] .
Similarly, it is assumed that K can provide the expected privacy protection region R e g i o n ( ϵ e ) = { ϵ e 1 , ϵ e 2 , , ϵ e K } with the regional average belief degree C R e g i o n ( ϵ e ) = 1 ϵ e K ϵ e 1 k = 1 K 1 ( ϵ e k + 1 ϵ e k ) C ϵ e k , where C ϵ e k is the point belief degree of the expected privacy budget ϵ e k . Then, if K wants to satisfy ( ϵ η = m a x ( ϵ i ) , C R e g i o n ( ϵ e ) ) -Best-B-DP, it should still need to maximize C R e g i o n ( ϵ e ) . Therefore, K should satisfy the following optimization problem.
m a x i m i z e Q C R e g i o n ( ϵ e ) s . t . m a x ( e r r ( p i , p ^ i ) ) η , i [ 1 , n ] , q i j e ϵ i q i j , i , j , j [ 1 , n ] , ϵ i 0 , i = 1 n q i j = 1 , j [ 1 , n ] .
From the above optimization Equations (15) and (16), in each optimization problem, it can be concluded that the perturbation probability matrix Q contains n 2 unknown variables, n 3 inequality constraints, n equality constraints and one EDU η constraint. Therefore, directly solving the optimization problem is a huge challenge, especially in the case of a large domain size n. Therefore, two simplified models are considered in this paper and we will present the details one by one in the following section.

5. Design and Implementation of B-DP Mechanism

This section includes the design of two B-DP mechanisms and their implementation algorithms. One is based on a classical LDP mechanism KRR, and the other is based on the newly constructed mechanism EXP Q in this paper. The number of domain values of both two mechanisms is more than 2. Moreover, we combine three data distributions with the typical non-uniformity and two B-DP mechanisms to directly show and analyze the two metrics proposed in this paper, including the point belief degree and the regional average belief degree.

5.1. B-DP Mechanism Based on KRR

Without prior knowledge of the data distribution, we assume that it is uniform. Setting ϵ i = ϵ j = ϵ η , q i i = p , q i j = q for i , j [ 1 , n ] , j i and q i i q i j , then it has p = e ϵ η e ϵ η + n 1 , q = 1 e ϵ η + n 1 , and e r r ( p , p ^ ) = η . Therefore, the privacy budget of KRR here is not arbitrary, which is constrained by the EDU of e r r ( p , p ^ ) = η .
Definition 9
( ϵ η -KRR). KRR that just meets the EDU η is called ϵ η -KRR.
Here, it can also derive the following theorem.
Theorem 3.
If there exists ϵ η -KRR, then it satisfies ϵ = ϵ η -LDP. Moreover, the point belief degree of ϵ η -KRR is
C ϵ e = 0 , ϵ e < ϵ η 1 , ϵ e ϵ η .
Proof. 
If there exists ϵ η -KRR, then it has p = e ϵ η e ϵ η + n 1 , q = 1 e ϵ η + n 1 . Since ϵ = l n ( p q ) = ϵ η , it satisfies ϵ = ϵ η -LDP. According to C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) , when ϵ e < ϵ η , χ ( ϵ i , ϵ e ) = χ ( ϵ η , ϵ e ) = 0 , it has C ϵ e = 0 ; when ϵ e ϵ η , χ ( ϵ i , ϵ e ) = χ ( ϵ η , ϵ e ) = 1 , it has C ϵ e = 1 . Hence, the result follows. □
Thus, it is impossible for ϵ η -KRR to provide the EPP ϵ e , or provide the EPP ϵ e with a 100% satisfaction. Therefore, what KRR can achieve is two distinct jumps of EPP with or without guarantee, that is, it is not a good B-DP mechanism.

5.2. B-DP Mechanism Based on EXP Q

Since relative error is used as utility metrics about the check-in data distribution in this paper, and a privacy budget of DP usually determines absolute error which is the numerator of relative error, thus the privacy budget of every POI should vary with its probability in the distribution of check-ins, that is, the value of the privacy budget should be reduced when the corresponding probability in the data distribution becomes larger, and increased when the corresponding probability in the data distribution becomes smaller. In this way, the small amounts of check-ins can also satisfy the EDU, while the large amounts of check-ins can also satisfy the EPP in priority, so as to better realize B-DP. It defines the following perturbation mechanism EXP Q .
Definition 10
(Perturbation mechanism EXP Q ). Given the data distribution p = [ p 1 , p 2 , , p n ] T , where p 1 p 2 p n . Call the randomized algorithm with Q as perturbation mechanism EXP Q if Q satisfies q i j e γ u ( j , i ) , where q i j is the probability that the check-in state is perturbed from S j to S i for i , j [ 1 , n ] and γ 0 is the privacy setting parameter, where u ( j , i ) satisfies Formulas (18)–(20), and κ n [ 0 , n ] is the parameter of privacy protection intensity change point.
( i ) When κ n = 0 ,
u ( j , i ) = 0 , i = j 1 + p n i + 1 , i j .
( i i ) When κ n [ 1 , n 1 ] ,
u ( j , i ) = 0 , i = j 1 p i , i j and i κ n 1 + p n i + κ n + 1 , i j and i > κ n .
( i i i ) When κ n = n ,
u ( j , i ) = 0 , i = j 1 p i , i j .
Theorem 4.
In perturbation mechanism EXP Q , its Q satisfies the following properties.
( i ) When κ n = 0 and the normalized factor perturbing from P O I j to a POI is Ω j = 1 + k = 1 , k j n e γ ( 1 + p n k + 1 ) for i , j [ 1 , n ] ,
q i j = 1 Ω j , i = j e γ ( 1 + p n i + 1 ) Ω j , i j .
( i i ) When κ n [ 1 , n 1 ] and the normalized factor perturbing from P O I j to a POI is Ω j = 1 + k = 1 , k j κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) for i , j [ 1 , n ] ,
q i j = 1 Ω j , i = j e γ ( 1 p i ) Ω j , i j and i κ n e γ ( 1 + p n i + κ n + 1 ) Ω j , i j and i > κ n .
( i i i ) When κ n = n and the normalized factor perturbing from P O I j to a POI is Ω j = 1 + k = 1 , k j n e γ ( 1 p k ) for i , j [ 1 , n ] ,
q i j = 1 Ω j , i = j e γ ( 1 p i ) Ω j , i j .
Proof. 
According to Definition 10, it can be seen that q i j e γ u ( j , i ) is the probability that the check-in state of P O I j is perturbed to that of P O I i for i , j [ 1 , n ] . Moreover, since k = 1 n q k j = 1 , it is easy to obtain the result of Theorem 4. □
Definition 11.
( ϵ η -EXP Q ) EXP Q that just satisfies the EPU η is called ϵ η -EXP Q where ϵ η = m a x 1 i n ( ϵ i ) and ϵ i is the actual privacy budget for each P O I i , that is, e ϵ i q i j q i j e ϵ i for j , j [ 1 , n ] .
Theorem 5.
Based on the definition of ϵ η -EXP Q ,
(1) if there exists ϵ η -EXP Q , then it satisfies ϵ = ϵ η -LDP;
(2) when κ n is fixed and the point belief degree of ϵ η -EXP Q is C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) , where ϵ i is the actual privacy budget for each P O I i , ϵ η -EXP Q is the approximately optimal ( ϵ η , C ϵ e ) -Best-B-DP, where the indicator function χ ( ϵ i , ϵ e ) is
χ ( ϵ i , ϵ e ) = 0 , ϵ i > ϵ e 1 , ϵ i ϵ e ;
(3) if there exists ϵ η -EXP Q and its point belief degree is C ϵ e , then it satisfies ( ϵ e , 1 C ϵ e ) -LDP.
Proof. 
See Appendix A. □

5.3. Implementation of B-DP Machanism

For the check-ins scenario, two B-DP mechanisms based on KRR and EXP Q are proposed and realized in this paper. KRR is one of the classical mechanisms of LDP, but it cannot well realize B-DP. EXP Q is newly proposed in this paper, which can not only provide the protection of approximately optimal ( ϵ η , C ϵ e ) -Best-B-DP, but also provides the protection of relaxed ( ϵ e , 1 C ϵ e ) -LDP to satisfy the EDU. The pseudo codes of the two B-DP mechanisms are given in Algorithms 1 and 2, respectively.
Algorithm 1 B-DP machanism based on KRR.
Input: Probability distribution p = [ p 1 , p 2 , , p n ] T , sample size m and expected data utility (EDU) η
Output: Privacy budget ϵ η and perturbation probability matrix Q
 1:
Initialize ϵ η > 0 , iteration step size Δ ϵ η > 0 and the worst utility M a x R E = 1 ;
 2:
while M a x R E > η do
 3:
Q is constructed by KRR with ϵ η ;
 4:
 According to the relative error formula, the current worst relative error CurrentMaxRE = m a x i [ 1 , n ] ( j = 1 n r i j 2 ( k = 1 n q j k p k ) m p i m p i ) is obtained, where R = ( r i j ) n × n = Q 1 , Q = ( q i j ) n × n , referred to Formulas (11)–(12) and Theorems 1 and 2 for details.
 5:
if C u r r e n t M a x R E < M a x R E then
 6:
   M a x R E = C u r r e n t M a x R E ;
 7:
end if
 8:
if M a x R E > η then
 9:
   ϵ η = ϵ η + Δ ϵ η ;
10:
end if
11:
end while
12:
return ϵ η and Q
Algorithm 2 B-DP mechanism based on EXP Q .
Input: Probability distribution p = [ p 1 , p 2 , , p n ] T , sample size m, expected data utility (EDU) η , expected privacy budget ϵ e (or expected privacy protection region R e g i o n ( ϵ e ) = { ϵ e 1 , ϵ e 2 , , ϵ e K } with ϵ e 1 < ϵ e 2 < < ϵ e K )
Output: Privacy setting parameter γ η , the parameter of privacy protection intensity change point κ n , perturbation probability matrix Q and actual privacy budget ϵ i of P O I i for i [ 1 , n ]
 1:
Initialize privacy setting parameter γ 0 > 0 and the iteration step size Δ γ η > 0 ;
 2:
Initialize κ n = n and t a g = 0 , where t a g is used to identify whether it exists a comparatively reasonable result or not;
 3:
while κ n 0 do
 4:
 Initialize γ η = γ 0 , ϵ i = 0 for i [ 1 , n ] , the worst utility M a x R E = 1 and C ϵ e = 0 (here, the initialization of regional average belief degree C R e g i o n ( ϵ e ) is also uniformly recorded as C ϵ e = 0 );
 5:
while M a x R E > η do
 6:
  Q is constructed by EXP Q with p , γ = γ η and κ n , where the row represents the perturbed check-in state and the column represents the original check-in state;
 7:
  According to Q, use l n m a x ( q i j ) m i n ( q i j ) to update the value ϵ i , where Q = ( q i j ) n × n and i , j , j [ 1 , n ] ;
 8:
  According to the relative error formula, the current worst relative error is obtained C u r r e n t M a x R E = m a x i [ 1 , n ] ( j = 1 n r i j 2 ( k = 1 n q j k p k ) m p i m p i ) , where R = ( r i j ) n × n = Q 1 , referred to Formulas (11)–(12) and Theorems 1 and 2 for details;
 9:
  if C u r r e n t M a x R E < M a x R E then
10:
    M a x R E = C u r r e n t M a x R E ;
11:
  end if
12:
  if M a x R E > η then
13:
    γ η = γ η + Δ γ η ;
14:
  end if
15:
end while
16:
if ϵ i is not all zero for i [ 1 , n ]  then
17:
  Calculate the current point belief degree value according to C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) = i = 1 n j = 1 n q i j p j χ ( ϵ i , ϵ e ) , and set C ϵ e = C ϵ e (or, calculate the current regional average belief degree value according to C R e g i o n ( ϵ e ) = 1 ϵ e K ϵ e 1 k = 1 K 1 ( ϵ e k + 1 ϵ e k ) C ϵ e k , where C ϵ e k = i = 1 n p ˜ i χ ( ϵ i , ϵ e k ) = i = 1 n j = 1 n q i j p j χ ( ϵ i , ϵ e k ) , and set C ϵ e = C R e g i o n ( ϵ e ) );
18:
  if C ϵ e < C ϵ e then
19:
   Update t a g = 1 , C ϵ e = C ϵ e , γ o p t = γ η , κ o p t = κ n , Q o p t = Q and ϵ i o p t = ϵ i ( i [ 1 , n ] ) ;
20:
  end if
21:
end if
22:
 Update κ n = κ n 1 ;
23:
end while
24:
if t a g = 0 then
25:
 Update κ n = 0 , and update the value ϵ i according to Step 7;
26:
else
27:
 Record γ η = γ o p t , κ n = κ o p t , Q = Q o p t and ϵ i = ϵ i o p t ( i [ 1 , n ] ) ;
28:
end if
29:
return γ η , κ n , Q and ϵ i for i [ 1 , n ]

5.4. Case Analysis of Point Belief Degree and Regional Average Beleif Degree

The above description theoretically analyzes two metrics, including the point belief degree and the regional average belief degree, on the two B-DP mechanisms based on KRR and EXP Q . In order to show the two metrics more clearly, the following of this section will use three data distributions with typical non-uniformity for analysis. For simplicity, in the following of this section, the KRR-based B-DP mechanism is represented by KRR and the EXP Q -based B-DP mechanism is represented by EXP Q , including the diagram descriptions.
(1) Three data distributions with typical non-uniformity.
The data distribution in this section is set as Pareto distribution, where the discrete case of Pareto distribution satisfies p j 1 x j θ + 1 for j [ 1 , n ] , θ > 0 and x j > 0 . Three data distributions of n = 20 , θ = 1.55, 1.17 and 0.52 are shown in Figure 2 and are, respectively, denoted as P 1 , P 2 and P 3 , where x j = x 1 + ( j 1 ) Δ x , x 1 = 2 , Δ x = 0.2 . Figure 2 shows both ordered and disordered cases of Pareto distribution, where the disordered case illustrates that the identification of scenic spots is independent of the order of probability. It also shows the corresponding Gini coefficient of P 1 , P 2 and P 3 , which is calculated according to the method of Gini mean difference [52]. Gini coefficient is used to indicate the degree of unevenness of data distribution. There exists a quantitative relationship between Pareto distribution parameter and Gini coefficient in Table 3. As shown in Figure 2, the data distribution of θ = 1.55 is pretty uneven, and the data distribution of θ = 1.17 is relatively reasonably uneven, while the data distribution of θ = 0.52 is relatively even.
(2) Point belief degree
In the point belief degree C ϵ e in KRR, let ϵ e = ϵ η ( ϵ η is used as the expected privacy budget or used as a basis for division of the expected privacy protection region, just for better comparison between KRR and EXP Q ) determined by ϵ η -KRR (see definition of ϵ η -KRR and Algorithm 1 for details), which equals the ϵ e -coordinate of the jump point shown by the dotted line in Figure 3. It is also combined with the same ϵ e and η to determine the perturbation probability and related parameters with EXP Q (see Algorithm 2 for details). For example, when the EDU η = 0.1 , the point belief degree C ϵ e of KRR and EXP Q is shown in Figure 3.
From Figure 3, it can be seen that under the same EDU, if the expected privacy budget of any data provider is ϵ e ϵ η , KRR can provide ϵ e -DP with belief degree of 1. On the other hand, if the expected privacy budget of any data provider is ϵ e < ϵ η , its belief degree is 0. However, in EXP Q , if the expected privacy budget of any data provider is ϵ e ϵ η , it indicates that it cannot satisfy the EPP when ϵ e is closer to ϵ η , and when ϵ e is large enough, it can also provide ϵ e -DP with belief degree of 1. Conversely, if the expected privacy budget of all data providers is ϵ e < ϵ η , it indicates that it can satisfy the EPP when ϵ e is closer to ϵ η , and the degree of providing ϵ e -DP is greater when ϵ e is closer to ϵ η . Therefore, in the case of EDU first, EXP Q can provide a privacy guarantee degree between 0 and 1 for the EPP, while KRR can only provide either 0 or 1. Moreover, EXP Q can provide more privacy protection than KRR, especially when the EDU and the EPP are contradictory, and when the EPP of all data providers is not fully (partially) satisfied.
(3) Regional average belief degree
In the regional average belief degree C R e g i o n ( ϵ e ) in KRR, maximizing C R e g i o n ( ϵ e ) equals to maximize C ϵ e , and hence it is the same as Algorithm 1. According to the approximately optimal expected privacy budget ϵ η under satisfying the EDU η , the data provider’s expected privacy protection region R e g i o n ( ϵ e ) can be roughly divided into three categories: { ϵ e R e g i o n ( ϵ e ) > ϵ η } , { ϵ e R e g i o n ( ϵ e ) < ϵ η } , and { { ϵ η } R e g i o n ( ϵ e ) } .
Similarly, EXP Q can provide different levels of optimal privacy protection for the three categories of expected privacy protection region (see Algorithm 2 for details). Generally speaking, the regional average privacy protection degree of EXP Q in region { ϵ e R e g i o n ( ϵ e ) > ϵ η } is less than or equal to that of KRR. However, the regional average privacy protection degree of EXP Q in region { ϵ e R e g i o n ( ϵ e ) < ϵ η } is greater than or equal to that of KRR. For region { { ϵ η } R e g i o n ( ϵ e ) } , it may exist the situation where there is a contradiction between the EPP and the EDU. As shown in Figure 4, there is the regional average belief degree C R e g i o n ( ϵ e ) of both mechanisms with R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 4 ] and the data distributions P 1 , P 2 and P 3 , respectively, where ϵ η is determined by ϵ η -KRR with η = 0.1 (see the dotted line in Figure 4 where the value of ϵ e whose C R e g i o n ( ϵ e ) is the first non-zero value is equal to ϵ η ).
As can be seen from Figure 4, under the same EDU and the same expected privacy protection region, EXP Q is more capable of offering data providers with a certain degree of privacy protection than KRR.

6. B-DP Dynamic Collection and Publishing Algorithm Design

Algorithms 1 and 2 are implemented with KRR and EXP Q under the known data distribution, and moreover, the point belief degree and the regional average belief degree under B-DP are analyzed. In real-world, there is often no prior data distribution at the beginning or accurate prior data distribution cannot be obtained. This means the implementation of two B-DP mechanisms of Algorithms 1 and 2 cannot be directly applied to the collection and publishing of continuous check-in data with relative error as utility metrics. Therefore, this paper designs an iterative update algorithm to adaptively update the data distribution in order to realize the two B-DP mechanisms, so as to adaptively realize B-DP dynamic collection and publishing of continuous check-in data. See pseudo codes of Algorithm 3 for more details. Therein, Algorithms 1 or 2 is a main part of Algorithm 3.
Algorithm 3 B-DP dynamic collection and publishing of check-in data algorithm—(KRR/EXP Q ).
  • Initialization process: The data collector initializes the perturbation probability matrix Q, and the estimated data distribution p ^ ( 0 ) = [ p ^ 1 ( 0 ) , p ^ 2 ( 0 ) , , p ^ n ( 0 ) ] T , and the perturbation probability matrix Q is broadcasted to the data provider.
  • 1: For KRR, initialize p i = 1 n for i [ 1 , n ] and ϵ η = l n ( 1 + ( n 1 ) n 1 m η 2 + n 1 1 n 1 m η 2 + n 1 ) ; For EXP Q , initialization p i = 1 n for i [ 1 , n ] , κ n = 0 and γ = n n + 1 l n ( 1 + ( n 1 ) n 1 m η 2 + n 1 1 n 1 m η 2 + n 1 ) ;
  • 2: Q is constructed according to KRR/EXP Q ;
  • 3: Initialize p ^ i ( 0 ) = 0 for i [ 1 , n ] ;
  • 4: Q is broadcasted to the data provider.
  • Perturbation process
  • 1: The data provider uses Q to perturb the check-in data;
  • 2: The perturbed check-in data are sent to the data collector.
  • Statistics and update processes
  • (1) Statistical process: including aggregation and reconstruction procedures
  • 1: After collecting the check-in data of time slice T, the data collector carries out frequency statistics to get the perturbed data distribution p ˜ . Assuming that the current time slice is the tth, it is recorded as p ˜ t ;
  • 2: From the inverse estimation formula p ^ t = Q 1 p ˜ t , it obtains the estimated distribution p ^ t ;
  • 3: Correcting the estimated data distribution p ^ t , it gets the corrected estimate data distribution p ^ ( t ) of the tth time slice (to be released);
  • if p ^ ( t 1 ) 0 then
  • p ^ ( t ) = ( 1 w ) p ^ ( t 1 ) + w p ^ ( t ) , where w ( 0 , 1 ) is the corrected estimate parameter of a positive real number;
  • else
  • p ^ ( t ) = p ^ t ;
  • end if
  • (2) Update process
  • 1: The data collector calculates maximum relative error R e = m a x ( | p ^ ( t ) p ^ ( t 1 ) p ^ ( t 1 ) | ) based on p ^ ( t ) and p ^ ( t 1 ) ;
  • 2: Initialize R e t h r e d t h o l d ( 0 , 1 ) , which is a update threshold parameter of a positive real number;
  • if R e > R e t h r e d t h o l d then
  •  2-1: Start a process of updating Q to get a new Q, as shown in Algorithm 1/Algorithm 2, where the input data distribution of Algorithm 1/Algorithm 2 is p ^ ( t ) ;
  •  2-2: The data collector broadcasts the new Q to the data provider.
  • end if
Since the original data distribution is assumed to be uniform during initialization, it is possible to calculate the privacy setting parameter ϵ η or γ with a closed-form expression that satisfies the EDU η , as shown in the example with EXP Q below. According to Corollary A1 of Appendix A, in the case of uniform data distribution, EXP Q degenerates into KRR. Let κ n = 0 , the probability q i j of Q be calculated as follows, where γ η is γ that makes m a x ( e r r ( p i , p ^ i ) ) = η true.
q i j = e γ η ( 1 + 1 n ) e γ η ( 1 + 1 n ) + n 1 , i = j 1 e γ η ( 1 + 1 n ) + n 1 , i j .
Let p = e γ η ( 1 + 1 n ) e γ η ( 1 + 1 n ) + n 1 , q = 1 e γ η ( 1 + 1 n ) + n 1 , and the inverse matrix R of Q can be expressed as
r i j = 1 q p q , i = j q p q , i j .
Therefore, p, q and γ η that satisfy the EDU η can be calculated. Since p q , it has q = 1 n n 1 m η 2 + n 1 + 1 n and p = n 1 n n 1 m η 2 + n 1 + 1 n . Hence, γ η = n n + 1 l n ( p q ) , and γ = γ η = n n + 1 l n ( 1 + ( n 1 ) n 1 m η 2 + n 1 1 n 1 m η 2 + n 1 ) .

7. Experimental Evaluation of B-DP Dynamic Collection and Publishing Algorithm

In this paper, the check-in data uses relative error as its utility metrics and the implementation of the two B-DP mechanisms based on KRR and EXP Q needs to rely on the data distribution. Therein, the number of domain values of both KRR and EXP Q is more than 2, and moverover, both the randomized algorithms based on them only take one value as input and one value as output. Thereby, KRR and EXP Q are fit for the check-in perturbation model we consider in this paper. In this section, we evaluate the performance of the dynamic algorithm based on the two B-DP mechanisms in terms of validity and robustness as well as privacy and utility. For simplicity, in the following of this section, we use KRR and EXP Q to represent B-DP mechanism based on KRR and B-DP mechanism based on EXP Q in the dynamic algorithm, respectively, including the diagram descriptions.

7.1. Experimental Settings

(1) Datasets
Two datasets with real-world data from location-based social networking platforms are used to verify the algorithms.
Brightkite [6]: It contains 4,491,143 check-ins over the period of April 2008–October 2010. In this paper, we used the check-ins of June 2008–September 2008, September 2009–December 2009 and January 2010—April 2010 from Brightkite to construct three types of check-in distributions with different uniformity degree according to the unified longitude and latitude division method, which are, respectively, abbreviated as B1, B2 and B3, and the number of regions is 12.
Gowalla [6]: It contains 6,442,890 check-ins over the period of Feburary 2009–October 2010. We used the check-ins of January 2010–April 2010 of Gowalla to construct three types of check-in distributions with different uniformity degree according to different partitioning methods of longitude and latitude, which are, respectively, abbreviated as G1, G2 and G3, and the number of regions is 25, respectively.
The average data distribution and the corresponding Gini coefficient of the data are shown in Figure 5 and Table 4, respectively. Therein, Gini coefficient is used to indicate the degree of unevenness of data distribution, which is calculated according to the method of Gini mean difference [52].
Figure 5 and Table 4 both show that the daily check-in data in two datasets fluctuates greatly, meaning a high diveristy. We verify the effectiveness of our algorithms on these real-world datasets in our experiment.
(2) Utility/Privacy Metrics
Utility Metrics: The utility uses the maximum relative error as its metrics (see Section 3.3 for details). In this paper, it uses the mean and deviation of the maximum relative error to evaluate the same EDU between KRR and EXP Q in the dynamic algorithm.
Privacy Metrics: The privacy uses two new metrics including the point belief degree and the regional average belief degree (see Defintions 4 and 5 for details). In this paper, it needs to compare the privacy gurantee degree about the expected privacy protection (EPP) under the same expected data utility (EDU) using these two privacy metrics between KRR and EXP Q in the dynamic algorithm.
(3) Parameter Settings
We evaluate our solutions through experiments using two real-world datasets. The experiments are performed on an Intel Core CPU 2.50-GHz Windows 10 machine equipped with 8 GB of main memory by matlab. In the experiments, the total check-in amount of statistical validity is m = 100 , 000 . Three kinds of EDU are η = 0.1 , 0.08 and 0.05 . The expected privacy protection region is R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 4 ] or R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 10 ] . The modified estimate parameter w is set as Table 5. The update threshold parameter is R e t h r e d t h o l d = 0.02 , and the remaining relevant parameters ϵ 0 and Δ ϵ η are set to 0.5 and 0.005, respectively.

7.2. Validity and Robustness Evaluation

The performance of validity and robustness of the corresponding dynamic algorithm with KRR and EXP Q is examined through the dynamic statistics process with two real-world datasets.
Figure 6 and Figure 7 show the mean values and deviations of the maximum relative error e r r ( p , p ^ ) under the three kinds of EDU ( η = 0.1 , 0.08 and 0.05 ), which are shown by the statistics of the corresponding data subsets under B1, B2, B3, G1, G2 and G3 according to the frequency of once a day. Moreover, the frequency of each day is different and each result is repeated 10 times. In both Figure 6 and Figure 7, the horizontal axis of each graph represents the number of time slices in continuously, and the vertical axis represents the maximum relative error e r r ( p , p ^ ) = m a x ( e r r ( p i , p ^ i ) ) between the original data distribution and the estimated data distribution for any i [ 1 , n ] . As can be seen from the left small graphs of Figure 6 and Figure 7, the corresponding dynamic algorithm with KRR and EXP Q can converge quickly and maintain the corresponding unified convergence stable state under different data distributions of B1, B2, B3, G1, G2 and G3. This verifies that the dynamic algorithm has a good validity and robustness.

7.3. Utility and Privacy Evaluation

The performance of utility and privacy of the corresponding dynamic algorithm with two B-DP mechanisms based on KRR and EXP Q is also examined through the dynamic statistics process with two real-world datasets. As can also be seen from the right small graphs of Figure 6 and Figure 7, a part of the left small graphs of Figure 6 and Figure 7, it shows clearly that the dynamic algorithm can satisfy the utility even during the dynamic process.
In addition, Figure 8 and Figure 9 show the point belief degree and the regional average belief degree of each subset of two datasets under the three kinds of EDU ( η = 0.1 , 0.08 and 0.05 ). In Figure 8 and Figure 9, the horizontal axis of each graph represents the EPP with different expected privacy buget ϵ e and the vertical axis represents the gurantee degree of the EPP satisfied. It shows that the gurantee degree of the EPP satisfied varies with the data distribution and EDU. For example, from the point belief degree of all small graphs in the left of Figure 8 and Figure 9, the gurantee degree of the EPP satisfied becomes better until its value up to 1 when the expected privacy buget becomes bigger, and the more evener distribution can support the EPP with the smaller ϵ e to provide a better privacy protection in the same EDU (as Figure 10 shown). The smaller value of EDU, i.e., the lower utility, can generally support the EPP with the smaller ϵ e to provide a better privacy protection in the same data distribution (as Figure 11 shown).
In Figure 10, for EXP Q on G1, G2 and G3 with the given EDU (such as η = 0.1 ), it shows clearly that the minimum ϵ e with C ϵ e > 0 is the smallest on G1 and the largest on G3. According to Table 4, G3 is pretty uneven, while G1 is relatively even. It is the same for EXP Q on B1, B2 and B3, KRR on B1, B2 and B3 as well as on G1, G2 and G3. In Figure 11, for EXP Q on G1, G2 and G3 with the given data distribution (such as G1), it also shows clearly that the minimum ϵ e with C ϵ e > 0 is the smallest on the EDU with η = 0.1 and the largest on the EDU with η = 0.05 . Similar trends can be observed for EXP Q on B1, B2 and B3, KRR on B1, B2 and B3 as well as on G1, G2 and G3.
For the regional average belief degree, the similar results can be concluded from all small graphs in the right of Figure 8 and Figure 9. Moreover, in Figure 12, it shows the maximum difference of C ( R e g i o n ( ϵ e ) ) in EXP Q minus C ( R e g i o n ( ϵ e ) ) in KRR with different η on each subset. It shows that the more unevener the data distribution is, the more bigger the maximum difference is. It means that EXP Q is more adapt to the unevener data distribution than KRR.
Furthermore, in order to be more objective evaluation of the privacy performance of KRR and EXP Q , it extends to use the privacy metrics of DP to compare the ϵ η on each subset shown in Table 6, where ϵ η refers to the privacy budget of a DP mechanism that just satisfies the EDU η . As can be seen from Table 6, except for η = 0.08 and η = 0.1 on B3, all the ϵ η of EXP Q is a little greater than those of KRR, which means that EXP Q provides a little worse DP than KRR. However, EXP Q could provide better B-DP from a new perspective of preference for privacy and utility than KRR to provide a good trade-off between them.

8. Discussions and Conclusions

This paper proposes a concept of best-effort differential privacy (B-DP) with the expected data utility (EDU) satisfied first and then with the expected privacy protection (EPP) satisfied as much as possible, and designs two new metrics including point belief degree and regional average belief degree to measure the guarantee degree of satisfying the EPP. Moreover, we also provide implementation algorithms, including the corresponding dynamic algorithm of two B-DP mechanisms based on KRR and a newly constructed mechanism EXP Q . Extensive experiments on two real-world check-in datasets verify the effectiveness of the concept of B-DP. It also verifies that the dynamic algorithm has a good validity and robustness, and can satisfy the utility even during the dynamic process. Besides, EXP Q is more adapt to the unevener data distribution and satisfies a better B-DP than KRR to provide a good trade-off between privacy and utility.
Specifically, the point belief degree measures the guarantee degree of privacy protection for any one expected privacy budget, and the regional average belief degree measures the average guarantee degree of the EPP in a region including multiple expected privacy budgets. Compared with the ( ϵ , δ )-DP, the latter can measure only one EPP with the expected privacy budget equal to ϵ and cannot directly measure the average guarantee degree of the EPP, that is, the ( ϵ , δ )-DP can only measure the guarantee degree of the EPP when ϵ e = ϵ , i.e., 1 δ . In addition, many real-world applications can only provide an approximate value of ϵ e as their EPP, and hence a neighborhood interval with ϵ e can be regarded as their EPP. Therefore, the regional average belief degree introduced in this paper is very necessary.
Moreover, two B-DP mechanisms based on KRR and newly constructed EXP Q in this paper are applied to the dynamic collection and publishing of check-in data with relative error as its utility metrics. Therein, KRR itself does not depend on the data distribution, but the dynamic collection and publishing algorithm with B-DP mechanism based on KRR needs to, where the privacy setting parameter has to be adjusted with the influence of data distribution to realize the utility guaranteed firstly in real time. In addition, EXP Q itself is dependent on the data distribution to realize some of its outputs having strong privacy protection and some having weak privacy protection, which is different from KRR to provide consistent privacy protection intensity. Thus, the dynamic collection and publishing algorithm based on these two B-DP mechanisms needs to depend on the data distribution, and then it has to face the challenges of algorithm validity and robustness with unknown data distribution. Fortunately, the experimental results have already verified that the algorithm can solve both challenges and is promising for the typical application of check-in data.
Besides, if the scenic spots use EXP Q for privacy protection, the data provider may be more inclined to visit these scenic spots with a large number of visitors, because the regions where these scenic spots are located may have a stronger privacy protection. Compared with the algorithms based on the existing DP mechanisms with consistent privacy protection intensity to realize B-DP, such as KRR in this paper, they maybe do not achieve the EPP at all, but the algorithm based on EXP Q newly proposed in this paper can achieve the EPP partly at least, that is, EXP Q can satisfy a better B-DP to provide a good trade-off between privacy and utility.
In a word, although the B-DP dynamic collection and publishing algorithm based on KRR or EXP Q is not necessarily perfect, it fully proves the feasibility of the concept of B-DP in this paper. It is not only a great step forward for the basic theory of DP, but also provides two feasible solutions for the implementation of DP in practical applications. The two solutions take check-in data as an example, but are not limited to it. They can also be used to other category data for privacy protection where the perturbation model is one input and one output. In the future work, we will make a further discussion on other mechanisms with binary inputs in LDP, where the perturbation model can support one input is perpurbed to multiple outputs, such as RAPPOR, and design them to achieve better B-DP. Moreover, it is an interesting problem about correlated B-DP.

Author Contributions

The problem was conceived by Y.C. and Z.X. The theoretical analysis and experimental verification were performed by Y.C., Z.X. and J.C. Y.C. and J.C. wrote the paper. S.J. reviewed the writing on grammar and structure of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

Natural Science Foundation of China (No.41971407), China Postdoctoral Science Foundation (No.2018M633354), Natural Science Foundation of Fujian Province, China (Nos.2020J01571, 2016J01281) and Science and Technology Innovation Special Fund of Fujian Agriculture and Forestry University (No.CXZX2019119S).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Two publicly available datasets were analyzed in this study. Both datasets can be found here: http://snap.stanford.edu/data/loc-gowalla.html and http://snap.stanford.edu/data/loc-brightkite.html (accessed on 5 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 5

Proof. 
(1) From Definition 11, ϵ η = m a x 1 i n ( ϵ i ) . According to the definition of LDP, it can be seen that ϵ η -EXP Q satisfies ϵ = ϵ η -LDP.
If it wants to proof (2) and (3) of Theorem 5, it needs the following theorems and corollary first.
Theorem A1.
In perturbation mechanism EXP Q , where i , j [ 1 , n ] and κ n [ 0 , n ] , it has
(a) q j j q i j ; if i 1 , i 2 [ 1 , n ] , i 1 i 2 and i 1 , i 2 j , then q i 1 j q i 2 j .
(b) q i i q i j ; if j 1 , j 2 [ 1 , n ] , j 1 j 2 and j 1 , j 2 i , then q i j 1 q i j 2 .
(c) for j 1 j 2 , q j 1 j 1 q j 2 j 2 .
(d) for i 1 i 2 , p ˜ i 1 p ˜ i 2 , where p ˜ i 1 , p ˜ i 2 are the distribution probabilities of P O I i 1 and P O I i 2 after perturbation, respectively.
Proof. 
See Appendix B. □
Theorem A2.
In perturbation mechanism EXP Q , it satisfies the following inequalities, where i [ 1 , n ] .
( 1 ) For κ n [ 1 , n 1 ] ,
(i) if 1 i κ n and j , j [ 1 , n ] , then e γ ( 1 p i + 1 ) q i j q i j e γ ( 1 p i + 1 ) ;
(ii) if κ n < i n 1 , κ n n 2 and j , j [ 1 , n ] , then e γ ( 1 + p n i + κ n ) q i j q i j e γ ( 1 + p n i + κ n ) ;
(iii) if κ n < i = n and j , j [ 1 , n ] , then e γ ( 1 + p κ n + 1 ) q i j q i j e γ ( 1 + p κ n + 1 ) .
( 2 ) For κ n = 0 ,
(i) if 1 i n 1 and j , j [ 1 , n ] , then e γ ( 1 + p n i ) q i j q i j e γ ( 1 + p n i ) ;
(ii) if i = n and j , j [ 1 , n ] , then e γ ( 1 + p 1 ) q i j q i j e γ ( 1 + p 1 ) .
( 3 ) For κ n = n ,
(i) if 1 i n 1 and j , j [ 1 , n ] , then e γ ( 1 p i + 1 ) q i j q i j e γ ( 1 p i + 1 ) ;
(ii) if i = n and j , j [ 1 , n ] , then e γ ( 1 p n ) q i j q i j e γ ( 1 p n ) .
Proof. 
See Appendix C. □
From Theorems A1 and A2, it has following Corollary A1.
Corollary A1.
For any i [ 1 , n ] , let ϵ i be the actual privacy budget provided by EXP Q for P O I i , it has the following properties.
(1)
For κ n [ 1 , n 1 ] ,
(i)   
if 1 i κ n , then ϵ i = γ ( 1 p i + 1 ) ;
(ii)  
if κ n < i n 1 and κ n n 2 , then ϵ i = γ ( 1 + p n i + κ n ) ;
(iii) 
if κ n < i = n , then ϵ n = γ ( 1 + p κ n + 1 ) .
(2)
For κ n = 0 ,
(i)   
if 1 i n 1 , then ϵ i = γ ( 1 + p n i ) ;
(ii)  
if i = n , then ϵ n = γ ( 1 + p 1 ) .
(3)
For κ n = n ,
(i)   
if 1 i n 1 , then ϵ i = γ ( 1 p i + 1 ) ;
(ii)  
if i = n , then ϵ n = γ ( 1 p n ) .
Let γ = γ η , where γ η is the privacy setting parameter γ of EXP Q that satisfies the EDU e r r ( p , p ^ ) = η . According to Theorem A2, the actual privacy budget for each P O I i can be set to ϵ i = φ ( γ η , p ) , which is a function of γ η and p . It is easy to obtain that ϵ i is a monotonic non-decreasing function according to Theorem A2 and Corollary A1 for a fixed κ n .
Therefore, the ϵ η = m a x 1 i n ( ϵ i ) -EXP Q satisfies (2) and (3) of Theorem 5 that can be proved as follows.
(2) When κ n is fixed and the point belief degree of ϵ η -EXP Q is C ϵ e = i = 1 n p ˜ i χ ( ϵ i , ϵ e ) , if C ϵ e is maximized, then ϵ η -EXP Q satisfies ( ϵ η , C ϵ e ) -Best-B-DP. According to Corollary A1, it can be known that the larger p i is, the smaller ϵ i is, indicating that in the same ϵ e , ϵ i can satisfy ϵ e at first, and the proportion that it satisfies is p ˜ i . According to Theorem A2, when p i is larger, p ˜ i will always be larger too. Hence, C ϵ e is also maximized under a fixed κ n .
(3) According to Definition 10 and Corollary A1, in EXP Q , the data distribution p satisfies p 1 p i p n and the actual privacy budget satisfies ϵ 1 ϵ i ϵ n . Suppose ϵ i ϵ e < ϵ i + 1 , then the probability of satisfying the EPP ϵ e under the EDU η is C ϵ e = j = 1 n p ˜ j χ ( ϵ j , ϵ e ) = j = 1 i p ˜ j . Let R 1 = { 1 , 2 , , i } , R 2 = { i + 1 , i + 2 , , n } , then P ( k R 1 ) = C ϵ e , P ( k R 2 ) = 1 C ϵ e . Hence, For any user u and k , j , j [ 1 , n ] , it has
P ( | l n ( q k j q k j ) | ϵ e ) = P ( | l n ( q k j q k j ) | ϵ e | k R 1 ) P ( k R 1 ) + P ( | l n ( q k j q k j ) | ϵ e | k R 2 ) P ( k R 2 ) = 1 · C ϵ e + 0 · ( 1 C ϵ e ) = C ϵ e .
Therefore, it is easy to obtain
q k j e ϵ e q k j + ( 1 C ϵ e )
for any user u and k , j , j [ 1 , n ] .
Therefore, according to Definitions 2, 3 and Theorem A2, ϵ η -EXP Q with the point belief degree C ϵ e satisfies ( ϵ e , 1 C ϵ e ) -LDP.

Appendix B. Proof of A1

Proof. 
Because of the perturbation mechanism EXP Q , the check-in data distribution p = [ p 1 , p 2 , , p n ] T satisfies p 1 p 2 p n . In accordance with the κ n [ 1 , n 1 ] , κ n = n and κ n = 0 , it can be discussed for three cases. For κ n [ 1 , n 1 ] , if the perturbation mechanism EXP Q satisfies Theorem A1, then the other two cases, κ n = n and κ n = 0 , are also proved by using the same method.
For κ n [ 1 , n 1 ] , it can be discussed as follows based on Theorem 4.
( a ) For any i [ 1 , n ] , the probability q i j that the check-in state of P O I j is perturbed to that of P O I i satisfies the following cases.
( i ) For i κ n and i j , it has q j j = 1 Ω j q i j = e γ ( 1 p i ) Ω j . Moreover, for i 1 i 2 , i 1 , i 2 j and i 1 , i 2 κ n , it has q i 1 j = e γ ( 1 p i 1 ) Ω j q i 2 j = e γ ( 1 p i 2 ) Ω j .
( i i ) For i > κ n and i j , it has q j j = 1 Ω j q i j = e γ ( 1 + p n i + κ n + 1 ) Ω j . Moreover, for i 1 , i 2 > κ n , i 1 i 2 , and i 1 , i 2 j , it has q i 1 j = e γ ( 1 + p n i 1 + κ n + 1 ) Ω j q i 2 j = e γ ( 1 + p n i 2 + κ n + 1 ) Ω j .
( i i i ) For i 1 j , i 1 κ n , i 2 j and i 2 > κ n , it has q i 1 j = e γ ( 1 p i 1 ) Ω j q i 2 j = e γ ( 1 + p n i 2 + κ n + 1 ) Ω j .
From the above discussions (iiii), the property ( a ) in this theorem holds for i 1 , i 2 j and κ n [ 1 , n 1 ] .
( b . 1 ) For j [ 1 , n ] , the probability q i j that the check-in state of P O I j is perturbed to that of P O I i satisfies the following cases.
( i ) For i , j κ n and j i , it has
q i i = 1 Ω i = e γ ( 1 p i ) e γ ( 1 p i ) Ω i = e γ ( 1 p i ) e γ ( 1 p i ) ( 1 + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) ) = e γ ( 1 p i ) e γ ( 1 p i ) ( 1 + e γ ( 1 p j ) + k = 1 , k i , k j κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) ) e γ ( 1 p i ) 1 + k = 1 , k j κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) Ω j = q i j .
( i i ) For j i , i κ n and j > κ n , it has
q i i = 1 Ω i = e γ ( 1 p i ) e γ ( 1 p i ) Ω i = e γ ( 1 p i ) e γ ( 1 p i ) ( 1 + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) ) = e γ ( 1 p i ) e γ ( 1 p i ) ( 1 + e γ ( 1 + p n j + κ n + 1 ) + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) ) e γ ( 1 p i ) e γ ( 2 p i + p n j + κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 p i ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) Ω j = q i j .
( i i i ) For j i , i > κ n and j κ n , it has
q i i = 1 Ω i = e γ ( 1 + p n i + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) Ω i = e γ ( 1 + p n i + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) ( 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k i n e γ ( 1 + p n k + κ n + 1 ) ) = e γ ( 1 + p n i + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) ( 1 + e γ ( 1 p j ) + k = 1 , k j κ n e γ ( 1 p k ) + k = κ n + 1 , k i n e γ ( 1 + p n k + κ n + 1 ) ) e γ ( 1 + p n i + κ n + 1 ) e γ ( 2 + p n i + κ n + 1 p j ) + k = 1 , k j κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) 1 + k = 1 , k j κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 + p n i + κ n + 1 ) Ω j = q i j .
( i v ) For j i , i > κ n and j > κ n , it has
q i i = 1 Ω i = e γ ( 1 + p n i + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) Ω i = e γ ( 1 + p n i + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) ( 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k i n e γ ( 1 + p n k + κ n + 1 ) ) e γ ( 1 + p n i + κ n + 1 ) e γ ( 2 + p n i + κ n + 1 + p n j + κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 + p n i + κ n + 1 ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 + p n i + κ n + 1 ) Ω j = q i j .
From the above discussions (iiv), it can be seen that q i i q i j for j [ 1 , n ] .
( b . 2 ) For j 1 j 2 , j 1 , j 2 i and j 1 , j 2 [ 1 , n ] , the probability q i j 1 that the check-in state of P O I j 1 is perturbed to that of P O I i and the probability q i j 2 that the check-in state of P O I j 2 is perturbed to that of P O I i satisfy the following cases.
( i ) For i κ n and j 1 , j 2 κ n , it has
q i j 1 = e γ ( 1 p i ) Ω j 1 = e γ ( 1 p i ) 1 + k = 1 , k j 1 κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + e γ ( 1 p j 2 ) + k = 1 , k j 1 , k j 2 κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 p i ) 1 + e γ ( 1 p j 1 ) + k = 1 , k j 1 , k j 2 κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + k = 1 , k j 2 κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) Ω j 2 = q i j 2 .
( i i ) For j 1 , j 2 > κ n and i κ n , it has
q i j 1 = e γ ( 1 p i ) Ω j 1 = e γ ( 1 p i ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + e γ ( 1 + p n j 2 + κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 p i ) 1 + e γ ( 1 + p n j 1 + κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) Ω j 2 = q i j 2 .
( i i i ) For i κ n , j 1 κ n and j 2 > κ n , it has
q i j 1 = e γ ( 1 p i ) Ω j 1 = e γ ( 1 p i ) 1 + k = 1 , k j 1 κ n e γ ( 1 p k ) + k = κ n + 1 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + e γ ( 1 + p n j 2 + κ n + 1 ) + k = 1 , k j 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 p i ) 1 + e γ ( 1 p j 1 ) + k = 1 , k j 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k j 2 n e γ ( 1 + p n k + κ n + 1 ) = e γ ( 1 p i ) Ω j 2 = q i j 2 .
From the above discussions (iiii) of ( b . 2 ) , when i κ n , for j 1 , j 2 i , j 1 j 2 and j 1 , j 2 [ 1 , n ] , q i j 1 q i j 2 . When i > κ n , for j 1 , j 2 i , j 1 j 2 and j 1 , j 2 [ 1 , n ] , we can use the similar method to get q i j 1 q i j 2 . Then it has q i j 1 q i j 2 for i , j 1 , j 2 [ 1 , n ] , j 1 j 2 and j 1 , j 2 i .
Hence, from ( b . 1 b . 2 ) , when κ n [ 1 , n 1 ] , the part ( b ) of Theorem A1 holds.
( c ) According to the property ( b ) and its proof process, it is easy to get 1 Ω j 1 1 Ω j 2 , i.e., q j 1 j 1 q j 2 j 2 holds for j 1 j 2 and j 1 , j 2 i .
( d ) Since p ˜ i = j = 1 n q i j p j for i [ 1 , n ] , it has i 1 i 2 that implies p i 1 p i 2 , and it has Formula (A10). According the property ( a ) q i 1 j q i 2 j and Formula (A10), it can be seen that j = 1 , j i 1 , j i 2 n ( q i 1 j q i 2 j ) p j 0 .
p ˜ i 1 p ˜ i 2 = j = 1 n q i 1 j p j j = 1 n q i 2 j p j = j = 1 , j i 1 , j i 2 n ( q i 1 j q i 2 j ) p j + q i 1 i 1 p i 1 + q i 1 i 2 p i 2 q i 2 i 1 p i 1 q i 2 i 2 p i 2 = j = 1 , j i 1 , j i 2 n ( q i 1 j q i 2 j ) p j + ( q i 1 i 1 q i 2 i 1 ) p i 1 ( q i 2 i 2 q i 1 i 2 ) p i 2 .
If it wants to prove p ˜ i 1 p ˜ i 2 always stands up for i 1 i 2 , then it just has to prove ( q i 1 i 1 q i 2 i 1 ) p i 1 ( q i 2 i 2 q i 1 i 2 ) p i 2 0 . Similarly, it only wants to discuss the following case κ n [ 1 , n 1 ] , and the other two cases, that is, κ n = 0 and κ n = n can also be proved by using the same method.
( i ) For 1 i 1 i 2 κ n , it has
( q i 1 i 1 q i 2 i 1 ) p i 1 ( q i 2 i 2 q i 1 i 2 ) p i 2 = 1 e γ ( 1 p i 2 ) Ω i 1 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 1 e γ ( 1 p i 1 ) Ω i 2 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 = 1 e γ ( 1 p i 1 ) Ω i 2 ( p i 1 p i 2 ) 0 .
( i i ) For 1 i 1 κ n and n i 2 > κ n , it can also obtain
( q i 1 i 1 q i 2 i 1 ) p i 1 ( q i 2 i 2 q i 1 i 2 ) p i 2 = 1 e γ ( 1 + p n i 2 + κ n + 1 ) Ω i 1 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 1 e γ ( 1 p i 2 ) Ω i 1 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 1 e γ ( 1 p i 2 ) Ω i 2 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 1 e γ ( 1 p i 1 ) Ω i 2 p i 1 1 e γ ( 1 p i 1 ) Ω i 2 p i 2 = 1 e γ ( 1 p i 1 ) Ω i 2 ( p i 1 p i 2 ) 0 .
( i i i ) For κ n < i 1 i 2 n , it can also get
( q i 1 i 1 q i 2 i 1 ) p i 1 ( q i 2 i 2 q i 1 i 2 ) p i 2 = 1 e γ ( 1 + p n i 2 + κ n + 1 ) Ω i 1 p i 1 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 2 p i 2 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 1 p i 1 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 2 p i 2 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 2 p i 1 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 2 p i 2 = 1 e γ ( 1 + p n i 1 + κ n + 1 ) Ω i 2 ( p i 1 p i 2 ) 0 ,
From the above discussions ( i i i i ) , the part ( d ) in Theorem A1 is true.
Therefore, from ( a d ) , the result follows. □

Appendix C. Proof of Theorem A2

Proof. 
It is only to deal with case (1) here, and a similar statement can be made of cases (2) or (3). From Theorem A1, it can be seen that q i i q i j for i , j [ 1 , n ] and it has q i j 1 q i j 2 for j 1 j 2 , j 1 , j 2 [ 1 , n ] and j 1 , j 2 i . Hence, it has the following cases to discuss for κ n [ 1 , n 1 ] .
( i ) If i κ n and j , j [ 1 , n ] , then it has Formula (A14).
q i j q i j q i i q i n = 1 Ω i e γ ( 1 p i ) Ω n = e γ ( 1 p i ) Ω n Ω i = e γ ( 1 p i ) 1 + e γ ( 1 p i ) + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 , k n n e γ ( 1 + p n k + κ n + 1 ) 1 + e γ ( 1 + p κ n + 1 ) + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 , k n n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 p i + 1 ) × 1 + e γ ( 1 p i ) + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 , k n n e γ ( 1 + p n k + κ n + 1 ) e γ ( p i p i + 1 ) + e γ ( p i p i + 1 ) e γ ( 1 + p κ n + 1 ) + k = 1 , k i κ n e γ ( 1 p k ) + k = κ n + 1 , k n n e γ ( 1 + p n k + κ n + 1 ) .
Therefore, according Formula (A14), if it wants to show q i j q i j e γ ( 1 p i + 1 ) , then it needs to show 1 + e γ ( 1 p i ) e γ ( p i p i + 1 ) + e γ ( p i p i + 1 ) e γ ( 1 + p κ n + 1 ) , i.e.,
1 + e γ ( 1 p i ) e γ ( p i p i + 1 ) e γ ( p i p i + 1 ) e γ ( 1 + p κ n + 1 ) 0
always stands up. Let f ( x ) = 1 + e γ ( 1 x ) e γ ( x p i + 1 ) e γ ( x p i + 1 ) e γ ( 1 + p κ n + 1 ) , where x [ 0 , p κ n ] , γ > 0 and 0 p i + 1 x . If it takes the derivative of f ( x ) with respect to x, it gets f ( x ) = γ e γ ( 1 x ) γ e γ ( x p i + 1 ) γ e γ ( 1 + p κ n + 1 ) e γ ( x p i + 1 ) 0 , i.e., f ( x ) decreases monotonically with x. It implies that f ( x ) f ( 0 ) = 0 always stands up. Therefore, the inequation 1 + e γ ( 1 p i ) e γ ( p i p i + 1 ) e γ ( p i p i + 1 ) e γ ( 1 + p κ n + 1 ) 0 always stands up for p i [ 0 , p k n ] . From the above discussion, it can be seen that
q i j q i j q i i q i n e γ ( 1 p i + 1 ) .
Similarly, it has
q i j q i j q i j q i i q i n q i i e γ ( 1 p i + 1 ) .
( i i ) For n > i > κ n , κ n n 2 and j , j [ 1 , n ] , it has
q i j q i j q i i q i n = 1 Ω i e γ ( 1 + p n i + κ n + 1 ) Ω n = e γ ( 1 + p n i + κ n + 1 ) Ω n Ω i = e γ ( 1 + p n i + κ n + 1 ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k n n e γ ( 1 + p n k + κ n + 1 ) 1 + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k i n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 + p n i + κ n ) ( 1 + e γ ( 1 + p n i + κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k i , k n n e γ ( 1 + p n k + κ n + 1 ) ) × 1 e γ ( p n i + κ n p n i + κ n + 1 ) ( 1 + e γ ( 1 + p κ n + 1 ) ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k i , k n n e γ ( 1 + p n k + κ n + 1 ) .
Hence, if we want to show q i j q i j e γ ( 1 + p n i + κ n ) according the Formula (A18), then it needs to show that 1 + e γ ( 1 + p n i + κ n + 1 ) e γ ( p n i + κ n p n i + κ n + 1 ) ( 1 + e γ ( 1 + p κ n + 1 ) ) , i.e.,
1 + e γ ( 1 + p n i + κ n + 1 ) e γ ( p n i + κ n p n i + κ n + 1 ) ( 1 + e γ ( 1 + p κ n + 1 ) ) 0
always stand up. Let f ( x ) = 1 + e γ ( 1 + x ) e γ ( p n i + κ n x ) ( 1 + e γ ( 1 + p κ n + 1 ) ) , where x [ p n , p k n + 1 ] ( i [ k n + 1 , n ] ) , γ > 0 and p n i + κ n x . If we take the derivative of f ( x ) with respect to x, we get f ( x ) = γ e γ ( 1 + x ) + γ e γ ( p n i + κ n x ) ( 1 + e γ ( 1 + p κ n + 1 ) ) 0 , that is, f ( x ) increases monotonically with x. Then f ( x ) f ( p k n + 1 ) = 1 + e γ ( 1 + p k n + 1 ) e γ ( p n i + κ n p k n + 1 ) ( 1 + e γ ( 1 + p κ n + 1 ) ) 0 always stand up. Therefore, the inequation 1 + e γ ( 1 + p n i + κ n + 1 ) e γ ( p n i + κ n p n i + κ n + 1 ) ( 1 + e γ ( 1 + p κ n ) ) 0 always stand up for p n i + κ n + 1 [ p n , p k n + 1 ) ( i [ k n + 1 , n ) ) .
Similarly, it has
q i j q i j q i j q i i q i n q i i e γ ( 1 + p n i + κ n ) .
( i i i ) For i = n and j , j [ 1 , n ] , it has
q n j q n j q n n q n ( n 1 ) = 1 Ω n e γ ( 1 + p κ n + 1 ) Ω ( n 1 ) = e γ ( 1 + p κ n + 1 ) Ω n 1 Ω n = e γ ( 1 + p κ n + 1 ) 1 + e γ ( 1 + p κ n + 1 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k n 1 , k n n e γ ( 1 + p n k + κ n + 1 ) 1 + e γ ( 1 + p κ n + 2 ) + k = 1 κ n e γ ( 1 p k ) + k = κ n + 1 , k n 1 , k n n e γ ( 1 + p n k + κ n + 1 ) e γ ( 1 + p κ n + 1 ) .
Similarly, it has
q n j q n j q n j q n n q n ( n 1 ) q n n e γ ( 1 + p κ n + 1 ) .
From the above discussion of ( i ) ( i i i ) , it can be seen that the part ( 1 ) of Theorem A2 holds. □

References

  1. Patil, S.; Norcie, G.; Kapadia, A.; Lee, A.J. Reasons, rewards, regrets: Privacy considerations in location sharing as an interactive practice. In Proceedings of the 8th Symposium on Usable Privacy and Security (SOUPS), Washington, DC, USA, 11–13 July 2012; pp. 1–15. [Google Scholar]
  2. Patil, S.; Norcie, G.; Kapadia, A.; Lee, A. “Check out where I am!”: Location-sharing motivations, preferences, and practices. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI), Austin, TX, USA, 5–10 May 2012; pp. 1997–2002. [Google Scholar]
  3. Lindqvist, J.; Cranshaw, J.; Wiese, J.; Hong, J.; Zimmerman, J. I’m the mayor of my house: Examining why people use foursquare-a social-driven location sharing application. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Vancouver, BC, Canada, 7–12 May 2011; pp. 2409–2418. [Google Scholar]
  4. Guha, S.; Birnholtz, J. Can you see me now?: Location, visibility and the management of impressions on foursquare. In Proceedings of the 15th International Conference on Human-computer Interaction with Mobile Devices and Services ( MobileHCI), Munich, Germany, 27–30 August 2013; pp. 183–192. [Google Scholar]
  5. Gruteser, M.; Grunwald, D. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MOBISYSP), San Francisco, CA, USA, 5–8 May 2003; pp. 31–42. [Google Scholar]
  6. Cho, E.; Myers, S.A.; Leskovec, J. Friendship mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, CA, USA, 21–24 August 2011; pp. 1082–1090. [Google Scholar]
  7. Huo, Z.; Meng, X.; Zhang, R. Feel free to check-in: Privacy alert against hidden location inference attacks in GeoSNs. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), Wuhan, China, 22–25 April 2013; pp. 377–391. [Google Scholar]
  8. Naghizade, E.; Bailey, J.; Kulik, L.; Tanin, E. How private can I be among public users? In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UBICOMP), Osaka, Japan, 7–11 September 2015; pp. 1137–1141. [Google Scholar]
  9. Rossi, L.; Williams, M.J.; Stich, C.; Musolesi, M. Privacy and the city: User identification and location semantics in location-based social networks. In Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM), Oxford, UK, 26–29 May 2015; pp. 387–396. [Google Scholar]
  10. Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz. 2002, 10, 557–570. [Google Scholar] [CrossRef] [Green Version]
  11. Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymitty. ACM Trans. Knowl. Discov. Data 2007, 1, 3–54. [Google Scholar] [CrossRef]
  12. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC), New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
  13. Hay, M.; Rastogi, V.; Miklau, G.; Dan, S. Boosting the accuracy of differentially private histograms through consistency. arXiv 2010, arXiv:0904.0942v5. [Google Scholar] [CrossRef] [Green Version]
  14. Xiao, X.; Wang, G.; Gehrke, J. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 2011, 23, 1200–1214. [Google Scholar] [CrossRef]
  15. Rastogi, V.; Nath, S. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), Indianapolis, IN, USA, 6–10 June 2010; pp. 735–746. [Google Scholar]
  16. Dwivedi, A.D.; Singh, R.; Ghosh, U.; Mukkamala, R.R.; Tolba, A.; Said, O. Privacy preserving authentication system based on non-interactive zero knowledge proof suitable for internet of things. J. Amb. Intel. Hum. Comp. 2021, in press. [Google Scholar]
  17. Dwork, C. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP), Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
  18. Xiao, X.; Bender, G.; Hay, M.; Gehrke, J. iReduct: Differential privacy with reduced relative errors. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD), Athens, Greece, 12–16 June 2011; pp. 229–240. [Google Scholar]
  19. Liu, H.; Wu, Z.; Zhou, Y.; Peng, C.; Tian, F.; Lu, L. Privacy-preserving monotonicity of differential privacy mechanisms. Appl. Sci. 2018, 8, 2081. [Google Scholar] [CrossRef] [Green Version]
  20. Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), St. Petersburg, Russia, 28 May–1 June 2006; pp. 486–503. [Google Scholar]
  21. Tang, J.; Korolova, A.; Bai, X.; Wang, X.; Wang, X. Privacy loss in Apple’s implementation of differential privacy on MacOS 10.12. arXiv 2017, arXiv:1709.02753. [Google Scholar]
  22. Dwork, C. Differential privacy: A survey of results. In Proceedings of the International Conference on Theory and Applications of Models of Computation (TAMC), Xi’an, China, 25–29 April 2008; pp. 1–19. [Google Scholar]
  23. Liu, H.; Wu, Z.; Peng, C.; Tian, F.; Lu, H. Adaptive gaussian mechanism based on expected data utility under conditional filtering noise. KSII Trans. Internet. Inf. 2018, 12, 3497–3515. [Google Scholar]
  24. Kairouz, P.; Bonawitz, K.; Ramage, D. Discrete distribution estimation under local privacy. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 2436–2444. [Google Scholar]
  25. Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Cambridge, MA, USA, 8–13 December 2014; pp. 2879–2887. [Google Scholar]
  26. Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local privacy and statistical minimax rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar]
  27. Hale, M.T.; Egerstedty, M. Differentially private cloud-based multi-agent optimization with constraints. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 1235–1240. [Google Scholar]
  28. Erlingsson, U.; Pihur, V.; Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computer and Communications Security (CCS), Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
  29. Chen, R.; Li, H.; Qin, A.K.; Kasiviswanathan, S.P.; Jin, H. Private spatial data aggregation in the local setting. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; pp. 289–300. [Google Scholar]
  30. Ligett, K.; Neel, S.; Roth, A.; Bo, W.; Wu, Z.S. Accuracy first: Selecting a differential privacy level for accuracy-constrained ERM. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA, 4–9 December 2017; pp. 2563–2573. [Google Scholar]
  31. Shoaran, M.; Thomo, A.; Weber, J. Differential privacy in practice. In Proceedings of the Workshop on Secure Data Management (SDM), Istanbul, Turkey, 27 August 2012; pp. 14–24. [Google Scholar]
  32. Bassily, R.; Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the 47th annual ACM symposium on Theory of Computing (STOC), Portland, OR, USA, 14–17 June 2015; pp. 127–135. [Google Scholar]
  33. Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy; Now Publisher: Norwell, MA, USA, 2014; pp. 28–64. [Google Scholar]
  34. Liu, H.; Wu, Z.; Zhang, L. A Differential Privacy Incentive Compatible Mechanism and Equilibrium Analysis. In Proceedings of the 2016 International Conference on Networking and Network Applications (NaNA), Hakodate, Hokkaido, Japan, 23–25 July 2016; pp. 260–266. [Google Scholar]
  35. Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
  36. Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. ACM Commun. 1970, 13, 422–426. [Google Scholar] [CrossRef]
  37. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  38. Blum, A.; Ligett, K.; Roth, A. A learning theory approach to noninteractive database privacy. J. ACM 2013, 60, 1–25. [Google Scholar] [CrossRef]
  39. Huang, H.; Zhang, D.; Xiao, F.; Wang, K.; Gu, J.; Wang, R. Privacy-preserving approach PBCN in social network with differential privacy. IEEE Trans. Netw. Serv. Man. 2020, 17, 931–945. [Google Scholar] [CrossRef]
  40. Hu, X.; Zhu, T.; Zhai, X.; Zhou, W.; Zhao, W. Privacy data propagation and preservation in social media: A real-world case study. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar]
  41. Shin, H.; Kim, S.; Shin, J.; Xiao, X. Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Trans. Knowl. Data Eng. 2018, 30, 1770–1782. [Google Scholar] [CrossRef]
  42. Huang, W.; Zhou, S.; Zhu, T.; Liao, Y. Privately publishing internet of things data: Bring personalized sampling into differentially private mechanisms. IEEE Internet Things 2008, 9, 80–91. [Google Scholar] [CrossRef]
  43. Ou, L.; Qin, Z.; Liao, S.; Hong, Y.; Jia, X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Trans. Depend. Secur. Comput. 2020, 17, 1109–1123. [Google Scholar] [CrossRef]
  44. Ren, X.; Yu, C.M.; Yu, W.; Yang, S.; Yang, X.; McCann, J.A.; Yu, P.S. LoPub: High-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2151–2166. [Google Scholar] [CrossRef] [Green Version]
  45. Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S.; Atiquzzaman, M. Local differential privacy for deep learning. IEEE Internet Things 2020, 7, 5827–5842. [Google Scholar]
  46. Ye, D.; Zhu, T.; Cheng, Z.; Zhou, W.; Yu, P.S. Differential advising in multiagent reinforcement learning. IEEE Trans. Cybern. 2020, in press. [Google Scholar]
  47. Ying, C.; Jin, H.; Wang, X.; Luo, Y. Double Insurance: Incentivized federated learning with differential privacy in mobile crowdsensing. In Proceedings of the 2020 International Symposium on Reliable Distributed Systems (SRDS), Shanghai, China, 21–24 September 2020; pp. 81–90. [Google Scholar]
  48. McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Providence, RI, USA, 21–23 October 2007; pp. 94–103. [Google Scholar]
  49. McSherry, F.D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), New York, NY, USA, 29 June–2 July 2009; pp. 19–30. [Google Scholar]
  50. Garofalakis, M.; Kumar, A. Wavelet synopses for general error metrics. ACM Trans. Database Syst. 2005, 30, 888–928. [Google Scholar] [CrossRef]
  51. Vitter, J.S.; Wang, M. Approximate computation of multidimensional aggregates of sparse data using wavelets. ACM Sigm. Rec. 1999, 28, 193–204. [Google Scholar] [CrossRef]
  52. Gini, C. Measurement of inequality of incomes. Econ. J. 1921, 31, 124–126. [Google Scholar] [CrossRef]
Figure 1. POI check-in model. Therein, S i , S j and S k represent check-in states. h ˜ ( S ) and p ˜ represent the check-in counts and the check-in frequency (data distribution) in perturbation phase, respectively, while h ^ ( S ) and p ^ represent the check-in counts and the check-in frequency (data distribution) in construction phase, respectively. K represents a perturbation mechanism. The more details can also be seen in Section 4.3.
Figure 1. POI check-in model. Therein, S i , S j and S k represent check-in states. h ˜ ( S ) and p ˜ represent the check-in counts and the check-in frequency (data distribution) in perturbation phase, respectively, while h ^ ( S ) and p ^ represent the check-in counts and the check-in frequency (data distribution) in construction phase, respectively. K represents a perturbation mechanism. The more details can also be seen in Section 4.3.
Entropy 24 00404 g001
Figure 2. Pareto distribution.
Figure 2. Pareto distribution.
Entropy 24 00404 g002
Figure 3. Point belief degree ( C ϵ e ) with η = 0.1 .
Figure 3. Point belief degree ( C ϵ e ) with η = 0.1 .
Entropy 24 00404 g003
Figure 4. Regional average belief degree ( C R e g i o n ( ϵ e ) ) with η = 0.1 .
Figure 4. Regional average belief degree ( C R e g i o n ( ϵ e ) ) with η = 0.1 .
Entropy 24 00404 g004
Figure 5. The average data distribution of Brightkite VS. Gowalla.
Figure 5. The average data distribution of Brightkite VS. Gowalla.
Entropy 24 00404 g005
Figure 6. The mean and deviation of e r r ( p , p ^ ) on B1, B2 and B3 subsets of Brightkite. Three η settings of EDU, including 0.1 , 0.08 and 0.05 , are compared. (a) B1; (b) B2; (c) B3.
Figure 6. The mean and deviation of e r r ( p , p ^ ) on B1, B2 and B3 subsets of Brightkite. Three η settings of EDU, including 0.1 , 0.08 and 0.05 , are compared. (a) B1; (b) B2; (c) B3.
Entropy 24 00404 g006
Figure 7. The mean and deviation of e r r ( p , p ^ ) on G1, G2 and G3 subsets of Gowalla. Three η settings of EDU, including 0.1 , 0.08 and 0.05 , are compared. (a) G1; (b) G2; (c) G3.
Figure 7. The mean and deviation of e r r ( p , p ^ ) on G1, G2 and G3 subsets of Gowalla. Three η settings of EDU, including 0.1 , 0.08 and 0.05 , are compared. (a) G1; (b) G2; (c) G3.
Entropy 24 00404 g007
Figure 8. The belief degree on B1, B2 and B3 subsets of Brightkite, where the belief degree includes the point belief degree and the regional average belief degree. (a) B1; (b) B2; (c) B3.
Figure 8. The belief degree on B1, B2 and B3 subsets of Brightkite, where the belief degree includes the point belief degree and the regional average belief degree. (a) B1; (b) B2; (c) B3.
Entropy 24 00404 g008
Figure 9. The belief degree on G1, G2 and G3 subsets of Gowalla, where the belief degree includes the point belief degree and the regional average belief degree. (a) G1; (b) G2; (c) G3.
Figure 9. The belief degree on G1, G2 and G3 subsets of Gowalla, where the belief degree includes the point belief degree and the regional average belief degree. (a) G1; (b) G2; (c) G3.
Entropy 24 00404 g009
Figure 10. The minimum ϵ e with C ϵ e > 0 based on the same EDU ( η ) and different data distributions, where C ϵ e is the point belief degree on the EPP of ϵ e , and moreover, η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
Figure 10. The minimum ϵ e with C ϵ e > 0 based on the same EDU ( η ) and different data distributions, where C ϵ e is the point belief degree on the EPP of ϵ e , and moreover, η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
Entropy 24 00404 g010
Figure 11. The minimum ϵ e with C ϵ e > 0 based on the same data distribution and different EDU ( η ), where C ϵ e is the point belief degree on the EPP of ϵ e , and moreover, η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
Figure 11. The minimum ϵ e with C ϵ e > 0 based on the same data distribution and different EDU ( η ), where C ϵ e is the point belief degree on the EPP of ϵ e , and moreover, η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
Entropy 24 00404 g011
Figure 12. The maximum difference of C ( R e g i o n ( ϵ e ) ) in EXP Q minus C ( R e g i o n ( ϵ e ) ) in KRR with different η , where C ( R e g i o n ( ϵ e ) ) is the regional average belief degree on the region of R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 4 ] or R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 10 ] , and η = 0.1 , 0.08 and 0.05 represents three kinds of EDU.
Figure 12. The maximum difference of C ( R e g i o n ( ϵ e ) ) in EXP Q minus C ( R e g i o n ( ϵ e ) ) in KRR with different η , where C ( R e g i o n ( ϵ e ) ) is the regional average belief degree on the region of R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 4 ] or R e g i o n ( ϵ e ) = [ 1 , 1.001 , 1.002 , , 10 ] , and η = 0.1 , 0.08 and 0.05 represents three kinds of EDU.
Entropy 24 00404 g012
Table 1. Comparison of existing literature with the method proposed in this paper.
Table 1. Comparison of existing literature with the method proposed in this paper.
Selected PapersMechanismUtility FirstPrivacy FirstPrivacy MetricsUtility MetricsEPP & EDU
Involved
Katrina et al. [30]LaplaceYesNoCentral DPAbsolute errorNo
Liu et al. [23]Gauss with conditional filtering noiseYesNoCentral DPRelative errorYes, but it may not provide EPP as much as possible
Maryam et al. [31]LaplaceNoYesCentral DPRelative errorNo
Xiao et al. [18]LaplaceNoYesCentral DPRelative errorNo
Kairouz et al. [25]W-RRNoYesLDPKL divergenceNo
Erlingsson et al. [28]RAPPORNoYesLDPStandard deviationNo
Bassily et al. [32]S-HistNoYesLDPAbsolute errorNo
Chen et al. [29]PCEPNoYesPLDPKL divergence/relative errorNo
Kairouz et al. [24,25]KRRNoYesLDPKL divergenceNo
Our paperEXP Q YesNo(Local) B-DPRelative errorYes, and it provides EPP as much as possible
Table 2. Notations.
Table 2. Notations.
SymbolDescription
EDUExpected data utility
EPPExpected privacy protection
ε e Expected privacy budget
Region( ϵ e )Expected privacy protection region around ϵ e
η Expected data utility
ϵ η The privacy budget of a differential privacy mechanism that just meets the expected data utility η
C ϵ e Point belief degree of ϵ e
C R e g i o n ( ϵ e ) Regional average belief degree of Region( ϵ e )
p Original data distribution
p ˜ Perturbed data distribution
p ^ Estimated data distribution
SCheck-in state space
h ( S ) Original check-in counts vector
h ˜ ( S ) Perturbed check-in counts vector
h ^ ( S ) Estimated check-in counts vector
QPerturbation probability matrix
q i j The perturbation probability of the original check-in state S j to the check-instate S i
KRRk-ary randomized response mechanism
EXP Q Perturbation mechanism
γ Privacy setting parameter
γ η Privacy setting parameter with satisfying the expected data utility η
κ n The parameter of privacy protection intensity change point
wModified estimate parameter
Re t h r e d t h o l d Update threshold parameter
e r r ( p , p ^ ) The maximum relative error between p and p ^
Table 3. Gini coefficient VS. Pareto parameter of θ .
Table 3. Gini coefficient VS. Pareto parameter of θ .
Pareto Distribution θ Gini Coefficient
P11.550.4471
P21.170.3884
P30.520.2784
Table 4. Gini coefficient of data in Brightkite and Gowalla.
Table 4. Gini coefficient of data in Brightkite and Gowalla.
DatasetsData DistributionsGini Coefficient
GowallaG10.2357
G20.3488
G30.4465
BrightkiteB10.2849
B20.3488
B30.4628
Table 5. All kinds of modified estimate parameter w used in the dynamic algorithm.
Table 5. All kinds of modified estimate parameter w used in the dynamic algorithm.
MechanismsBrightkiteGowalla
B1B2B3G1G2G3
EXP Q 0.0450.060.150.180.250.4
KRR0.0350.0550.150.180.250.4
Table 6. ϵ η on each subset, where η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
Table 6. ϵ η on each subset, where η = 0.1 , 0.08 and 0.05 represent three kinds of EDU.
η MechanismsData Distributions
B1B2B3G1G2G3
0.1EXP Q 1.7381.8746.6272.9284.4295.774
KRR1.551.836.712.654.095.695
0.08EXP Q 2.0172.1286.9673.2044.6886.072
KRR1.912.017.0652.9454.5055.955
0.05EXP Q 2.5682.7517.9953.8585.4517.041
KRR2.4352.577.953.665.3756.995
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Y.; Xu, Z.; Chen, J.; Jia, S. B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy 2022, 24, 404. https://doi.org/10.3390/e24030404

AMA Style

Chen Y, Xu Z, Chen J, Jia S. B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy. 2022; 24(3):404. https://doi.org/10.3390/e24030404

Chicago/Turabian Style

Chen, Youqin, Zhengquan Xu, Jianzhang Chen, and Shan Jia. 2022. "B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy" Entropy 24, no. 3: 404. https://doi.org/10.3390/e24030404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop