Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks

Zhu, Liang; Liu, Xiaowei; Jing, Zhiyong; Yu, Liping; Cai, Zengyu; Zhang, Jianwei

doi:10.3390/electronics12010070

Open AccessArticle

Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks

by

Liang Zhu

^1,*

,

Xiaowei Liu

¹

,

Zhiyong Jing

^2,3,*,

Liping Yu

¹,

Zengyu Cai

¹ and

Jianwei Zhang

¹

College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China

²

Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China

³

College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(1), 70; https://doi.org/10.3390/electronics12010070

Submission received: 29 October 2022 / Revised: 13 December 2022 / Accepted: 21 December 2022 / Published: 24 December 2022

(This article belongs to the Special Issue Advanced Edge Intelligence Collaborative Technology over Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

Location privacy-preserving methods for location-based services in mobile communication networks have received great attention. Traditional location privacy-preserving methods mostly focus on the researches of location data analysis in geographical space. However, there is a lack of studies on location privacy preservation by considering the personalized features of users. In this paper, we present a Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme, in order to mine user preferences and provide customized location privacy protection for users. Firstly, the UBPG algorithm is proposed to mine the basic portrait. User familiarity and user curiosity are modelled to generate psychological portrait. Then, the location transfer matrix based on the user portrait is built to transfer the real location to an anonymous location. In order to achieve customized privacy protection, the amount of privacy is modelled to quantize the demand of privacy protection of target user. Finally, experimental evaluation on two real datasets illustrates that our KD-LPP scheme can not only protect user privacy, but also achieve better accuracy of privacy protection.

Keywords:

knowledge mining; user portrait; customized privacy preserving; semantic location; location transfer matrix

1. Introduction

Mobile Internet has entered the era of the fifth generation of mobile communication (i.e., 5G). The 5G network, which is characterized by high speed, high reliability, low delay and a large number of terminal networks, has greatly changed the users socialize. With the popularity of intelligent terminals, and spatial and temporal sensors, massive and accurate location related data (e.g., GPS data) are shared by users to the central server. The movement behavior of target user can be mined by the central server according to situational awareness, machine learning, information fusion, etc. Therefore, the central server intelligently recommends Location-Based Services (i.e., LBS) for users according to the differences in interests of different users or the differences in preferences of the same user, in order to meet the personalized needs of target user [1,2,3].

The life of users has great convenience through Location-Based Social Networks (i.e., LBSNs). However, it leads to the risk of disclosure of personal privacy (e.g., identity, location, or query information, etc.) [4,5,6]. In LBSNs, the real location data need to be published to the central server by users. Through the process of data cleaning and fusion, the movement pattern and characteristics of users can be analyzed by the central server. Then, the central server recommends location-based services for users. However, the central server has the feature of being honest and curious, i.e., it not only performs the query and recommendation tasks rigorously, but also tries its best to mine the interests and preferences of target user. It will cause a serious threat to the personal privacy and safety of target user if the sensitive information is stolen by attackers.

The existing methods of location privacy protection are mainly based on false location, spatial anonymity, encryption or other technologies. Due to the problems of user sensitive information leakage, low data availability and lack of self-adaptation, these methods have been unable to meet the needs of diversified and personalized location service recommendation for LBSNs. Moreover, the unified privacy protection scheme lacks the consideration of user situational information and preference information, which seriously affects the availability of sensitive data and the performance of service recommendation. In order to solve the above problems, it is necessary to research the customizable and quantifiable privacy protection schemes based on deeply recognizing the rules of scenarios, users and services, in order to protection user privacy and improve the data utility.

This paper draws on the ideas of new technologies such as user portraits and knowledge mining to propose the Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme for Location-Based Social Networks (LBSNs). The contributions of our work can be divided into four aspects as follows:

(1) We mine the stay-points and locations based on the original trajectory sequence of users. Furthermore, each stay-point is tagged with semantic information, in order to generate the user portrait.

(2) We construct the user portrait by considering basic attributes and psychological attributes of target user. The UBPG algorithm is proposed to generate basic portrait. User familiarity and user curiosity are modelled to generate psychological portrait.

(3) We build a location transfer matrix to hide the real location of target user. The amount of privacy is modelled to quantize the demand of privacy protection, so as to provide customized privacy protection for the target user.

(4) We conduct an extensive experimental study to verify the functions and performance of proposed KD-LPP scheme over two real datasets. The experiment results show that our KD-LPP scheme can privately provide customized services for users.

2. Related Works

2.1. Location Privacy Preserving Based on User Behavior Analysis

The user behavior analysis includes a wide range of research areas, e.g., the detection of user motion and action. Data mining and big data approaches can be utilized to construct user behavior model by analyzing the location related data [7]. In [8], the authors proposed an interpretable user behavior recognition framework that support movement pattern modelling of users. To better describe the movement pattern of users, the authors proposed an anonymous location information protection mechanism to construct users’ movement patterns through the Markov chain [9]. In addition, cooperative location is also an effective method of location privacy attacks. In [10], the authors firstly used associated location information to locate the user’s location, mined the user’s mobile behavior pattern, and proposed an optimal joint location attack. Meanwhile, the authors simulated the problem of user privacy disclosure under location information sharing through experiments and proposed a location privacy protection method based on mobile behavior, which automatically learned users’ privacy preferences to achieve controllable privacy protection [11]. In [12], the authors used machine learning to discover the motivation of users to share location and proposed a motivation-based privacy protection method. In [13], the authors proposed a privacy-aware service recommendation method, which considers the spatial diversity of location and selects the optimal stop point for users.

2.2. Trajectory Privacy Preserving

Trajectory privacy protection is the research hotspot in the area of privacy protection. Current trajectory privacy protection methods mainly include false trajectory, trajectory k-anonymity and suppression technology. In [14], the authors proposed a trajectory privacy protection scheme based on random mode and rotation mode, which uses the method of false trajectory to generate a dummy trajectory and upload anonymous trajectory data to the central server. Then, the authors proposed an anonymous method based on dummy elements by considering the stay behavior in the user’s moving trajectory [15]. At the same time, the authors proposed a time fuzzy technology to disrupt the time when users query the release, so as to protect their track privacy [16]. Trajectory k-anonymity method makes use of the Trusted Third Party (TTP) to generalize the actual track data of the user into k-1 false trajectories, forming k trajectory sequences and publishing them to the central server. K. Gao et al. proposed a novel freshness-aware age optimization solution to satisfy the real-time demands of users [17]. In order to deploy cost-effective transcoding operations by distributing the computation-intensive workload among cloud, edge, and crowd, X. Chen et al. proposed a novel stochastic approach that jointly optimizes the usage of transmission resources and transcoding resources [18]. In [19], the authors proposed a TrPF framework to protect users’ trajectory privacy by TTP. Suppression technology is that when the user publishes real track data to the central server, the mobile client prohibits the release of sensitive location information in the track to protect the personal privacy of the user. In [20], the authors proposed a local suppression technique, whereby users can suppress the publication of their sensitive locations as needed.

2.3. Collaborative Mechanism between Privacy Preserving and Data Utility

In terms of balancing privacy protection and data utility, researchers have found that different users have different demands for privacy protection, and even the same user needs different intensities of privacy protection in different scenarios. How to provide users with configurable and quantifiable privacy protection mechanisms has become a hot topic of privacy protection research at home and abroad. In [21], the authors designed a privacy-preserving data aggregation scheme for edge computing, which not only preserves the privacy of uploaded data but also realizes batch operations. In order to reduce the loss of effective data in the process of privacy protection, the authors proposed an optimal geographical location in-discernibility mechanism to protect location privacy and weigh relevant data utility [22]. In [23], the authors proposed a recommendation system based on context information to provide users with an adaptive privacy protection strategy. In terms of privacy quantification, the authors proposed a high-quality map generation method of privacy perception, which comprehensively considers the balance between privacy disclosure and map evaluation quality and quantifies the privacy disclosure risk of users’ data points [24]. In literature [25], the authors analyzed the privacy protection technology in the process of information ex-change and formulated a privacy protection scheme for the privacy perception and measurement of multi-source environmental information and the scene definition. In literature [26], the authors proposed a privacy measurement method based on mutual information to classify users of location data at different levels and release location data of different disturbance degrees to users, so as to protect users’ location privacy.

In recent years, differential privacy theory [27] has attracted extensive attention from researchers at home and abroad due to its strict and provable definition of privacy and has gradually become the mainstream solution to the problem of user privacy disclosure under location information sharing. In [28], the authors proposed a differential privacy protection algorithm based on the supermodel game and adopted personalized anonymous technology to protect users’ data privacy. In [29], the authors proposed a frequent itemset mining algorithm for differential privacy protection, which effectively compensated for data loss in the process of privacy protection by reducing the amount of noise added by data generalization. Meanwhile, the authors proposed a frequent sequence mining algorithm with differential privacy protection, which utilizes candidate sequence pruning technology of basic sampling to achieve better data utility [30]. In addition, the author proposed a personalized differential privacy protection model in the literature [31], which protects the privacy of a group of individuals in a data set and achieves less data distortion. In [32], the authors proposed a location recommendation framework based on probabilistic differential privacy to achieve personalized fine-grained location service recommendations supporting differential privacy.

3. Overview of KD-LPP Scheme

In this section, we give the problem definition, scheme design and attack hypothesis of KD-LPP scheme.

3.1. Problem Definition

Definition 1 (Stay-point).

Stay-points indicate that users stay in a certain geographic area for a period of time. Through the algorithm of stay-point detection, it can infer that the target user has done some meaningful activities in the area. Each stay-point can be formed by the triple

〈l o n, l a t, θ_{t}〉

, where

〈l o n, l a t〉

represents the longitude and latitude of the stay-point,

θ_{t}

represents the length of time the user stays at each stay-point.

Definition 2 (Location).

Location is the clustering of stay-points, i.e., the stay-points with the same semantic information are clustered into locations. Each location can be formed by the triple

〈l o n, l a t, t y p e〉

, where

〈l o n, l a t〉

represents the longitude and latitude of the location, and

t y p e

represents the corresponding semantic information.

Definition 3 (Semantic information).

Locations with different semantics have different access probabilities in different time periods. Furthermore, locations with similar semantics may also have different access probabilities. To prevent attackers from identifying less likely locations by user access time, the semantic information of a location unit is established by using the number of user visits in different time periods. The semantic information of location unit i can be expressed as

S_{i} = \{N_{1}, N_{2}, \dots, N_{24}\}

, where

N_{1}, N_{2}, \dots, N_{24}

represents the access frequency of location unit i in 24 time periods, respectively.

Definition 4 (User portrait).

User portrait is the characteristic model of users in the real world, which can better reflect the movement pattern and personalized preference of users in LBSNs. The user portrait of user u can be formed by

P A_{u} = \{p a^{1}, p a^{2}, \cdot \cdot \cdot, p a^{n}\}

, where represents the different preference characteristics of user u.

Definition 5 (Privacy requirements).

Privacy requirements reflect the protection of personalized needs for different users. The privacy requirements can be formed by

P P D = (P A_{u}, {|D|}_{u})

, where

P P D

represents the privacy protection demand of target user u,

P A_{u}

represents the user portrait of user u,

{|D|}_{u}

represents the customize privacy parameter of user u.

3.2. Scheme Design

As shown in Figure 1, the workflow of KD-LPP scheme can be divided into four stages: initialization, semantic tagging, user portrait and transfer matrix. The detailed explanations of four stages are as follows.

① Initialization

In this stage, the mobile device collects GPS trajectory data locally as the original data of users. The original trajectory data is used to generate the target user’s stay-points through the stay-point detection algorithm to reflect stay behavior of user in a certain area. The real location of the user can be obtained by the stay-points clustering for the following location anonymity processing.

② Semantic tagging

In this stage, the generated stay-points are tagged with semantic information through the method of data fusion. Semantic information can well reflect the user’s meaningful activities in the stay area, so as to effectively mine the personalized preferences of users. In addition, some sensitive semantic locations of target user (e.g., home address, workplace, etc.) will be deleted to protect the privacy information of the user from being leaked.

③ User portrait

In this phase, the user portrait of each user is generated based on the semantic-tagged location data. User portrait can better describe the user’s characteristics, preferences and so on. The user portrait will be input into the construction process of transfer matrix as knowledge, so as to complete the establishment of knowledge model.

④ Transfer matrix

In this stage, the location transition matrix is generated according to the knowledge model constructed in the above three stages. When target user inputs the real location, the transition matrix can generate the corresponding anonymous location according to transition probability, and output to the anonymous location set. Because the transition matrix is generated based on the knowledge model of different users, it can generate anonymous locations for different users to meet their personalized and customized needs.

3.3. Attack Hypothesis

In this paper, it is assumed that LBS server is an active attacker with background knowledge such as map location information, historical query probability and location semantic information. In the case of snapshot query, privacy threats from active attackers can be divided three aspects. First, locations in the anonymous location set have different query probabilities. The attacker is more likely to think that the location with higher query probability in the anonymous location set is the real location. Second, the attacker with background knowledge of location semantic information can identify locations that are less likely to be accessed in the anonymous location set at a certain time. Third, if the locations in the anonymous location set are distributed in a certain area or around the real location, the attacker can get the approximate location of the target user’s real location according to inference.

4. Models and Algorithms

In this section, we give the models and algorithms of KD-LPP scheme. The user portrait can be divided into basic portrait and psychological portrait. Basic portrait, such as gender, date of birth, place of origin, occupation, education, etc., are fixed over a long period of time or that do not change throughout life for each user. Different from the basic portrait, psychological portrait can describe the personalized preference of each user. Also, the psychological characteristics will be different over different time and location, which reflect the changes of user requirements in different situations.

4.1. Basic Portrait

Temporal characteristics, spatial characteristics and location knowledge of each user can be extracted from the Geolife dataset [33], which will be explained in Section 5.1. These three attributes are closely related to the basic portrait of users.

Time characteristic. Time characteristic refers to the user trajectory data under different time segments, which has a strong connection with the basic attributes of the user. For example, students commute between home and school at nearly exactly the same time during the working day. Office workers commute between their homes and offices during morning and evening rush hours in the city. Take-out waiters crisscross the city during lunches and dinners.

Spatial characteristics. Spatial characteristics mean that the trajectory data of continuous movement of users are constrained by the physical distance in the real world. According to the analysis of each trajectory sequence, it is found that the distance of continuous movement of most users is within 20 km. For example, in the trajectory sequence, it is possible for a user to move from one place to another in the same city, but it is impossible to move from one city to another.

Location knowledge. Location knowledge refers to the location with specific semantic categorical information. Location knowledge is closely related to the basic attributes of users. For example, researchers usually gather in research institutes because it has a large number of research projects. Workers tend to be in factories because there are a lot of production tasks. Teachers are usually in school during the day because they need to take on a lot of teaching tasks.

In order to avoid the disclosure of sensitive information of target user, the location data (e.g., home address, workplace, etc.) will be firstly distinguished through the time and spatial characteristic during the stage of initialization. Then, the sensitive information will be blocked in the stage of semantic tagging. Finally, the geographic location information and semantic location information are considered to generate the basic attributes of users. In this paper, the three-layer classifier is utilized to better classify the model through strong supervision. Figure 2 shows the workflow of the three-layer classifier. Algorithm 1 is the pseudo-code of User Basic Portrait Generation (UBPG) algorithm. The detailed steps are as follows:

Algorithm 1 User Basic Portrait Generation

Input: Semantic information

S_{i} = \{N_{1}, N_{2}, \dots, N_{24}\}

, Time characteristics

D_{h o l i}

,

D_{w o r k}

,

H_{i}

Output: User basic portrait

P A = \{p a_{1}, p a_{2}, \dots, p a_{n}\}

1: Input the semantic information

S_{i}

into the layer 1 classifier for the preliminary characterization of user attributes;
2: Obtain the result of the first classification

P A^{1} = \{p a_{1}^{1}, p a_{2}^{1}, \dots, p a_{n}^{1}\}

;
3: Input

P A^{1}

into the layer 2 classifier;
4: Obtain the result of the second classification

P A^{2}

through logistic regression;
5: Input the time characteristics

D_{h o l i}

,

D_{w o r k}

,

H_{i}

and

P A^{2}

into the layer 3 classifier;
6: Return

P A

In the above Algorithm 1,

S_{i}

represents the semantic information of each stay-point,

D_{h o l i}

represents the rest day,

D_{w o r k}

represents the working day, and

H_{i} = \{h_{1}, h_{2}, \dots, h_{24}\}

represents the 24 h from

h_{1}

to

h_{24}

in a day. Firstly, the semantic information is tagged for each location in the stage of semantic tagging. Then, the tagged semantic information is input to the layer 1 classifier to achieve the preliminary classification. The result of the first classification

P A^{1}

is input to the layer 2 classifier to complete the fine-grained classification. In order to well describe the interest and preference of users, the result of the second classification

P A^{2}

combined with time characteristics are input to the layer 3 classifier. Finally, the basic portrait

P A

of target user is generated through the three-layer classifier. According to UBPG algorithm, the generated basic attributes can be utilized to well finish the customized location privacy protection.

4.2. Psychological Portrait

The psychological portrait of users can be divided into user familiarity and user curiosity. The corresponding models are as following.

(1) User Familiarity Model

The change of the user’s location over time generates the trajectory sequence

S e q = (L_{1}, L_{2}, \dots, L_{n})

, where

L_{i}

represents the user’s location, n represents the number of times the user’s location changes. Each location

L_{i}

has the longitude and latitude according to the triple

〈l o n, l a t, t y p e〉

. As shown in Figure 3, it is variable for the next location before the user arrives at

L_{n + 1}

, where

L_{n + 1} \in (L_{d 1}, L_{d 2}, \cdot \cdot \cdot, L_{d m})

indicates that it may have m different choices when the user moves from

L_{n}

to

L_{n + 1}

.

According to the psychology characteristics of users, there are two factors need to be considered when users will move to the next location. One is the familiarity of the next location itself to the target user. Another is the familiarity of the generated trajectory sequence to the target user. The greater the familiarity of the next location or generated trajectory sequence to the user, the greater the psychological preference of the user, and the smaller the privacy level required. On the contrary, the less familiarity the user is about a location, the stronger the privacy requirements for the location.

In order to dynamically select the next location, the probability matrix is modelled as follows:

N P M_{i L_{i}} \propto \frac{1}{(# L_{i} + 1) \cdot (T_{L_{i}, L_{i + 1}} + 1)},

(1)

where

# L_{i}

represents the occurrence number of location

L_{i}

before the i-th movement in the trajectory sequence of the user,

T_{L_{i}, L_{i + 1}}

represents the occurrence number of the trajectory sequence from location

L_{i}

to location

L_{i + 1}

before the i-th movement.

(2) User Curiosity Model

In this section, we use user curiosity as a measure of whether users are willing to experience new things and take the risk of privacy disclosure. For example, a user with strong curiosity is more willing to explore new things, and that user will have a lower need for privacy protection. For more conservative users, the need for privacy protection should be increased. Location novelty can increase the user’s curiosity to access the location. The degree of location novelty is mainly determined by three factors, namely, the frequency of the user’s stay at the location, the length of the user’s stay at the location, and the similarity between the next location and the previous location. The degree of location novelty can be calculated by Formula (2).

N o v_{u, i}^{t} = \frac{1}{3} \times (S F_{u, i}^{t} + S R_{u, i}^{t} + D I S_{u, i}^{t}),

(2)

where

N o v_{u, i}^{t}

represents the novelty of location

L_{i}

for user u at time t.

S F_{u, i}^{t}

rep-resents the frequency that user u visited location

L_{i}

before time t.

S R_{u, i}^{t}

represents the time duration that user u visited location

L_{i}

.

D I S_{u, i}^{t}

represents the degree of difference between location

L_{i}

and the historical location of user u.

According to the attenuation function of human memory and response to things proposed by Bayesian,

S F_{u, i}^{t}

can be calculated by Formula (3).

S F_{u, i}^{t} = e^{- α \times |I_{u, i}^{t}|},

(3)

where

α

represents the attenuation coefficient. The value range of

α

is

(0, 1]

. The experimental results show that it has stability when

α = 1

.

| I_{u, i}^{t} |

represents the time duration that user u visited location

L_{i}

before time t. The more times the user u visits the location

L_{i}

, the greater the value of

| I_{u, i}^{t} |

and the smaller the value of

S F_{u, i}^{t}

.

S R_{u, i}^{t}

can be calculated by Formula (4).

S R_{u, i}^{t} = e^{- {(t - t (I_{u, i}^{- 1}))}^{- 1}},

(4)

where

t (I_{u, i}^{- 1})

represents the timestamp of the last access to location

L_{i}

before time t for user u. The closer the latest timestamp is to time t of user u visiting location

L_{i}

, the smaller the

S R_{u, i}^{t}

.

D I S_{u, i}^{t}

can be calculated by Formula (5).

D I S_{u, i}^{t} = \frac{1}{| 2 \times T a g s (i) |} \times \sum_{t a g}^{T a g s (i)} (e^{^{- ρ \times |I_{u, t a g}|}} + e^{- {(t - t (I_{u, t a g}^{- 1}))}^{- 1}}),

(5)

where

T a g s (i)

represents the set of semantic information owned by location

L_{i}

,

| I_{u, t a g} |

represents the number of locations with target tags visited by user u before time t.

t (I_{u, t a g}^{- 1})

represents the timestamp of the last visit of user u to the location with the target tags prior to time t. Moreover, the parameter

ρ

is added, in order to ensure that

e^{^{- ρ \times |I_{u, t a g}|}}

and

e^{- {(t - t (I_{u, t a g}^{- 1}))}^{- 1}}

have the same order of magnitude. In our experiment, the value of

ρ

is set as 0.05. According to the Formula (5), the more similar the user visits a new location to the historical locations, the smaller the difference, the smaller the

D I S_{u, i}^{t}

.

Through the above formulas, the novelty of each location for the target user is acquired. Thus, the user portrait

P C = \{P C_{1}, P C_{2}, \dots P C_{n}\}

is built according to basic portrait and psychological portrait, which can be used for the location transfer matrix construction.

4.3. Location Transfer Matrix

The user characteristics, such as job, preference, character, etc. can be described through the basic portrait and psychological portrait of target user. In order to build the location transfer matrix, the weight of privacy protection

s_{i}

is given for each portrait feature

P C_{i}

. The privacy sensitivity is divided into

P

grades. The weight set of privacy protection for all user can be constructed as

P S = (s_{1}, s_{2}, \cdot \cdot \cdot, s_{n})

, where

0 \leq s_{i} \leq 1

,

(1 \leq i \leq n)

. The vector

P S

can reflect the personalized location privacy protection of users.

For any portrait feature

P C_{i}

, the specific weight of privacy protection can be represented as

S_{P C_{i}}

, where

0 \leq S_{P C_{i}} \leq P - 1

. Thus, the strength of privacy protection

U P_S_{P C_{i}}

can be calculated by Formula (6).

U P_S_{P C_{i}} = \ln (1 + (S_{P C_{i}} + φ) / P) .

(6)

In Formula (6),

S_{P C_{i}} = 0

represents the portrait feature

P C_{i}

can be fully shared by the target user. In order to prevent the mathematical calculation problem of

S_{P C_{i}}

being 0, the parameter

φ

is given, and

φ

approaches 0 infinitely.

Let the vector

π_{P L_{i}} = (s_{P L_{i, 1}}, s_{P L_{i, 2}}, \cdot \cdot \cdot, s_{P L_{i, n}})

be the weight of the privacy protection for the target user at location

L_{i}

. For any users, the location transfer matrix

\prod

can be calculated by Formula (7).

\begin{array}{l} \prod & = {(π_{P L_{1}}, π_{P L_{2}}, \dots, π_{P L_{n}})}^{T} \\ = {(U P_S_{P L_{i}, P C_{j}})}_{n \times k} \\ = {(U P_S_{i, j})}_{n \times k} . \end{array}

(7)

In order to quantize the strength of location transfer, let

| D |

represents the amount of privacy for customized privacy protection, which can be calculated by Formula (8).

| D | = \frac{e^{| \prod |_{F}} - e^{- | \prod |_{F}}}{e^{| \prod |_{F}} + e^{- | \prod |_{F}}} .

(8)

As shown in Table 1, the privacy protection method can be selected according to the change of the amount of privacy

| D |

. The privacy level k increases gradually as the amount of privacy increases. Furthermore, different privacy protection methods correspond to different privacy levels. For example, it does not need privacy protection when the amount of privacy

|D| \in (0.0, 0.2]

, and the privacy level is A,

k \in \{1, 2, 3, 4, 5\}

. It needs to suppress the location published when the amount of privacy

|D| \in (0.8, 1.0]

, and the privacy level is E,

k \in \{21, 22, 23, 24, 25\}

.

Thus, the customized privacy protection can be achieved due to the amount of privacy

| D |

is changed according to the basic portrait and psychological portrait of target user. In the next section, the performance of proposed KD-LPP scheme is evaluated.

5. Performance Evaluation

5.1. Datasets and Experimental Setup

In this paper, MatLab is used to analyze the location data and verify the performance of the proposed KD-LPP scheme. The GPS trajectory dataset of GeoLife project of Microsoft Asia Research Institute is used as the original trajectory data of users [33]. From April 2007 to August 2012, Geolife dataset has collected trajectory data of 182 users, including 17,621 tracks, with a total distance of more than 1.2 million kilometers and a total time of more than 48,000 h. Geolife dataset includes not only the daily activities (e.g., studying, going to work, and coming home, etc.) of users, but also personalized activities (e.g., shopping, traveling, dining, and sports, etc.). Most of the data in the Geolife dataset is located in Beijing, with a small amount of data located in Europe or the United States. Since only location data located in Beijing are considered, we first need to filter the Geolife dataset to screen out all data points with latitude 39.4~41.1 and longitude 115.4~117.6. Second, the 200 × 200 location units of the same size are divided for easy calculation.

In order to add semantic tagging to the location, this paper uses the Beijing POI dataset, which records the location information contained by most points of interest in Beijing, namely, longitude and latitude coordinates. The original Beijing POI dataset is divided into 20 service types as shown in Table 2, which can be used for the next semantic tagging.

5.2. Data Analysis and Function Realization

The geographic location covered by each type can be acquired according to the POI dataset. By fusing Geolife dataset and POI dataset, the service name, type, latitude and longitude information of each geographic location in Geolife dataset can be effectively identified. In this section, TF-IDF model is used to tag the semantical information of each stay-points. TF-IDF model is a classical weighting technique in the field of information retrieval and data mining, where TF represents the word frequency, i.e., the retrieval frequency of the words to be retrieved in the file, IDF indicates the frequency of reverse files. The implementation process of data cleaning, stay-points generation and location clustering has been detailed explanation in our previous research [1]. The experiment of semantic tagging and user portrait in this paper is the extension of reference [1]. Figure 4 shows the generated stay-points and corresponding semantic tagging of a target user. We can see that the user has visited seven kinds of semantic location in the trajectory sequence. It can be utilized for building a user portrait model.

In order to realize the function of customized privacy protection, the system based on KD-LPP scheme is built. Figure 5 shows an example of the function realization of proposed KD-LPP scheme. In Figure 5a, the privacy protection method can be selected according to the demand of target user. The corresponding anonymous area, time and privacy level is generated. In Figure 5b, it shows the real location and the generated dummy location. The corresponding semantic information of two locations is also generated. Figure 6 shows an example of spatial cloaking method for KD-LPP scheme. In Figure 6a, three candidate locations are recommended when the real location of target user is taken as input. In Figure 6b, six candidate locations are recommended when the set of locations is taken as input.

The above functions of KD-LPP schemes reflect that it can realize the customized privacy protection according to the demand and preference of target. Thus, the data utility can be well improved, so as to provide high Quality of Experience for the target user.

5.3. Experimental Results and Performance Analysis

In this part, we compare the performance of proposed KD-LPP scheme with k-NN scheme [22] and VLBS scheme [34]. There are four indexes to evaluate the three algorithms as follows.

(1) Location set entropy

Location set entropy is used to measure the uncertainty of historical query probability among locations in the anonymous location set. The larger the location set entropy, the more similar the historical query probability between locations in the anonymous location set, the higher the uncertainty of the attacker to infer the real location, and the better the location privacy protection effect. The location set entropy can be calculated by Formula (9).

H (R) = - \sum_{i = 1}^{k} p_{i} \cdot \log_{2} p_{i} .

(9)

In Formula (9),

H (R)

reaches the maximum value when

H (R) = \log_{2} k

, and the uncertainty in the anonymous location set is the highest.

(2) Location distance entropy

Location distance entropy is the measure of the distance from each location in the anonymous location set to the center of the set. The location distance entropy can reflect the physical distribution uniformity of the locations in the anonymous location set. The smaller the distance entropy, the greater the difference of distance from each location and the center of the anonymous location set, and the more uniform the physical distribution. The location distance entropy can be calculated by Formula (10).

H_{d} = - \sum_{i = 1}^{k} d (R_{i}, l_{c e n t r e}) \cdot \log_{2} d (R_{i}, l_{c e n t r e}),

(10)

where

l_{c e n t r e}

represents the central coordinate of the anonymous location set.

(3) Average anonymous time

In LBSNs, the service quality and privacy protection are equally important for target user. The average anonymous time is the most intuitive factor to measure the Quality of Experience (QoE). Therefore, under the condition of ensuring the quality of service and the effect of privacy protection, the smaller the average anonymous time, the better the QoE. In this paper, the average anonymous time is the average time to generate the anonymous location set by repeating the experiment many times for different values of k.

(4) Anonymous success rate

Anonymous success rate is used to measure the ability of the location privacy algorithm to resist attacks by attackers. The higher the anonymous success rate, the harder it is for an attacker to infer the real location from the anonymous location set. Considering that the k-NN algorithm and VLBS algorithm compared in the experiment have similar location set entropy, as proposed in the KD-LPP algorithm, the anonymous success rate in this experiment only considers the difference between location semantics in the anonymous location set. The anonymous success rate can be calculated by Formula (11).

A S R = \frac{C o u n t (S e m S i m i l a r (R_{i}, R_{j}) \in (θ_{1}, θ_{2})) - k}{k^{2} - k},

(11)

where

C o u n t (S e m S i m i l a r (R_{i}, R_{j}) \in (θ_{1}, θ_{2}))

represents the number of locations that meet the upper and lower limits of semantic information.

Figure 7a shows the effect of location set entropy for different methods when the privacy level k is changed. The location set entropy for anonymous locations can reach the maximum

H (R) = \log_{2} k

under the ideal condition. In Figure 6a, we can see that the location set entropy of k-NN method, VLBS method and proposed KD-LPP method is almost ideal for different values of k. The location set entropy of random method has a minimum for different values of k.

Figure 7b shows the effect of location distance entropy of different methods when the privacy level k is changed. We can see that the location distance entropy of the proposed KD-LPP method is always the smallest than the other three methods. It suggests that the KD-LPP method has better physical distribution uniformity. Therefore, the KD-LPP method has the best effect of resisting inference attack from the perspective of historical query probability compared to the other methods.

Figure 8a shows the effect of average anonymous time for different methods when the privacy level k is changed. We can see that the average anonymous time increases as the value of k increases. The average anonymous time of k-NN method is the longest in the four methods. The proposed KD-LPP method has the shorter average anonymous time than the k-NN method and VLBS method. When

k > 6

, the growth range of average anonymous time of k-NN method is obviously higher than VLBS method and KD-LPP method. The reason is that the k-NN method does not consider the semantic similarity of the locations. In the process of anonymous location selection, the KD-LPP method proposed in this paper does not need to calculate the distance between locations in the anonymous location set for each round, which greatly reduces the time consumption.

Figure 8b shows the effect of anonymous success rate for different methods when the privacy level k is changed. In this paper, the user access frequency of 24 periods is utilized to quantify the location semantic information. According to the quantification characteristics, the location units with similar query probability are more likely to have higher semantic similarity, which is also the reason why the anonymous success rate of the Random method is the lowest in the four methods. The anonymous success rate of the k-NN method is lower than VLBS method and KD-LPP method because of the lack of consideration regarding the semantic similarity of the locations. Compared with the VLBS method, the KD-LPP method has higher anonymous success rate. The reason is that the KD-LPP method takes user portrait into consideration to generate an anonymous location. Therefore, the KD-LPP method has the best effect of resisting background knowledge attack compared to the other methods.

In order to meet the demands of different users for location privacy in different scenarios, the proposed KD-LPP method allows users to customize the upper and lower limits of the amount of privacy in the anonymous location set. The anonymous success rate can directly reflect the effect of privacy protection. Figure 9a,b shows the effect of lower limit and higher limit of the amount of privacy on anonymous success rate of anonymity when the privacy level k is changed, respectively. In Figure 9a, we can see that the anonymous success rate gradually decreases as the lower limit of the amount of privacy increases. The reason is the selected privacy protection methods (i.e., N/A, Fuzzy location) cannot effectively anonymize the location. In Figure 9b, the anonymous success rate gradually increases as the higher limit of the amount of privacy increases because of the higher privacy level. The corresponding privacy protection methods (i.e., Spatial cloaking, Inhibition) can effectively anonymize the location. Under the condition of

|D| = 0.0

and

|D| = 1.0

, the anonymous success rate of KD-LPP method is always 1.0, because the value range of the amount of privacy is [0.0, 1.0]. It does not need the privacy protection method when

|D| = 0.0

, and the privacy protection method of Inhibition can disable the publication of location when

|D| = 1.0

. Therefore, the anonymous location set is valid at any time.

6. Conclusions and Future Work

In this paper, we study the problem of customized location privacy protection under the user portrait model in LBSNs. First, we introduce LBSNs which provide personalized location-based service for users. Then, we explore the possibility of designing a Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme, which can dynamically select the privacy protection method according to the quantified amount of privacy. By experiments, it shows that our KD-LPP scheme can provide customized privacy protection for target user with high location anonymous success rate. For future work, we will further complete the privacy protection scheme considering on the distributed computing or edge computing, in order to better reduce the risk of privacy leakage.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; software, X.L.; validation, L.Z.; formal analysis, Z.C.; investigation, L.Y.; resources, J.Z.; data curation, Z.J.; writing—original draft preparation, L.Z. and X.L.; writing—review and editing, L.Z. and X.L.; visualization, Z.C.; supervision, J.Z.; project administration, Z.J.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61902361, in part by the Henan Key Research Project of Higher Education Institutions under Grant No. 22B520046, the Henan Key Laboratory of Network Cryptography Technology under Grant No. LNCT2021-A15, the Henan Province Key Research and Development Special Project No. 221111210500, the Henan Provincial Science and Technology Department under Grant No. 212102210095.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of the study are available within the article.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Zhu, L.; Xu, C.; Guan, J.; Zhang, H. SEM-PPA: A semantical pattern and preference-aware service mining method for personalized point of interest recommendation. J. Netw. Comput. Appl. 2017, 82, 35–46. [Google Scholar]
Xiao, H.; Xu, C.; Feng, Z.; Ding, R.; Yang, S.; Zhong, L.; Liang, J.; Muntean, G.M. A Transcoding-Enabled 360° VR Video Caching and Delivery Framework for Edge-Enhanced Next-Generation Wireless Networks. IEEE J. Sel. Areas Commun. 2022, 40, 1615–1631. [Google Scholar] [CrossRef]
Liu, X.; Wang, G.; Bhuiyan, M. Re-ranking with multiple objective optimization in recommender system. Trans. Emerg. Telecommun. Technol. 2022, 33, e4398. [Google Scholar] [CrossRef]
Xu, C.; Zhu, L.; Liu, Y.; Guan, J.; Yu, S. DP-LTOD: Differential Privacy Latent Trajectory Community Discovering Services over Location-Based Social Networks. IEEE Trans. Serv. Comput. 2021, 14, 1068–1083. [Google Scholar] [CrossRef]
Zhu, L.; Xie, H.; Liu, Y.; Guan, J.; Liu, Y.; Xiong, Y. PTPP: Preference-Aware Trajectory Privacy-Preserving over Location-Based Social Networks. J. Inf. Sci. Eng. 2018, 34, 803–820. [Google Scholar]
Lu, J.; Zhang, Y.; Yang, L.; Jin, S. Inter-cloud secure data sharing and its formal verification. Trans. Emerg. Telecommun. Technol. 2022, 33, e4380. [Google Scholar] [CrossRef]
Ding, X.; Gan, Q.; Bahrmi, S. A systematic survey of data mining and big data in human behavior analysis: Current datasets and models. Trans. Emerg. Telecommun. Technol. 2022, 33, e4574. [Google Scholar] [CrossRef]
Jeyashree, G.; Padmavathi, S. IHAR—A fog-driven interpretable human activity recognition system. Trans. Emerg. Telecommun. Technol. 2022, 33, e4506. [Google Scholar]
Montazeri, Z.; Houmansadr, A.; Pirhro, N. Achieving Perfect Location Privacy in Wireless Devices Using Anonymization. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2683–2698. [Google Scholar] [CrossRef]
Olteanu, A.M.; Huguenin, K.; Shokri, R.; Humbert, M.; Hubaux, J.P. Quantifying Interdependent Privacy Risks with Location Data. IEEE Trans. Mob. Comput. 2017, 16, 829–840. [Google Scholar]
Li, H.; Zhu, H.; Du, S.; Liang, X.; Shen, X. Privacy Leakage of Location Sharing in Mobile Social Networks: Attacks and Defense. IEEE Trans. Dependable Secur. Comput. 2018, 15, 646–660. [Google Scholar] [CrossRef]
Huguenin, K.; Bilogrevic, I.; Machado, J.S.; Mihaila, S.; Shokri, R.; Dacosta, I.; Hubaux, J.P. A Predictive Model for User Motivation and Utility Implications of Privacy Protection Mechanisms in Location Check-Ins. IEEE Trans. Mob. Comput. 2018, 17, 760–774. [Google Scholar] [CrossRef]
He, X.; Jin, R.; Dai, H. Leveraging Spatial Diversity for Privacy-Aware Location Based Services in Mobile Networks. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1524–1534. [Google Scholar] [CrossRef]
You, T.; Peng, W.; Lee, W. Protecting Moving Trajectories with Dummies. In Proceedings of the International Conference on Mobile Data Management, Beijing, China, 27–30 April 2008; pp. 278–282. [Google Scholar]
Kato, R.; Iwata, M.; Hara, T.; Suzuki, A.; Xie, X.; Arase, Y.; Nishio, S. A dummy-based anonymization method based on user trajectory with pauses. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–9 November 2012; pp. 249–258. [Google Scholar]
Hwang, R.; Hsueh, Y.; Chung, H. A Novel Time-Obfuscated Algorithm for Trajectory Privacy Protection. IEEE Trans. Serv. Comput. 2013, 7, 126–139. [Google Scholar] [CrossRef]
Gao, K.; Xu, C.; Ji, X.; Qin, J.; Yang, S.; Zhong, L.; Wu, D. Freshness-Aware Age Optimization for Multipath TCP Over Software Defined Networks. IEEE Trans. Netw. Sci. Eng. 2021. early access. [Google Scholar] [CrossRef]
Chen, X.; Xu, C.; Wang, M.; Wu, Z.; Zhong, L.; Grieco, L.A. Augmented Queue-based Transmission and Transcoding Optimization for Livecast Services Based on Cloud-Edge-Crowd Integration. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4470–4484. [Google Scholar] [CrossRef]
Gao, S.; Ma, J.; Shi, W.; Zhan, G.; Sun, C. TrPF: A Trajectory Privacy-Preserving Framework for Participatory Sensing. IEEE Trans. Inf. Forensics Secur. 2013, 8, 874–887. [Google Scholar] [CrossRef]
Chen, R.; Fung, B.C.; Mohammed, N.; Desai, B.C.; Wang, K. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 2013, 231, 83–97. [Google Scholar] [CrossRef]
Zhao, O.; Liu, X.; Li, X.; Singh, P.; Wu, F. Privacy-preserving data aggregation scheme for edge computing supported vehicular ad hoc networks. Trans. Emerg. Telecommun. Technol. 2022, 33, e3952. [Google Scholar] [CrossRef]
Yi, X.; Paulet, R.; Bertino, E.; Varadharajan, V. Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy. IEEE Trans. Knowl. Data Eng. 2016, 28, 1546–1559. [Google Scholar] [CrossRef]
Celdrán, A.H.; Pérez, M.G.; Clemente, F.J.G.; Perez, G.M. Precise: Privacy-aware recommender based on context information for cloud service environments. IEEE Commun. Mag. 2014, 52, 90–96. [Google Scholar] [CrossRef]
Chen, X.; Wu, X.; Li, X.Y.; Ji, X.; He, Y.; Liu, Y. Privacy-aware High-Quality Map Generation with Participatory Sensing. IEEE Trans. Mob. Comput. 2014, 15, 719–732. [Google Scholar] [CrossRef]
Hua, J.; Li, F.; Guo, Y.; Geng, K.; Niu, B. Research on privacy protection in the process of information exchange. Chin. J. Netw. Inf. Secur. 2016, 2, 28–38. [Google Scholar]
Zhang, W.; Liu, Q.; Zhu, H. Evaluation and protection of multi-level location privacy based on an information theoretic approach. Chin. J. Commun. 2019, 40, 51–59. [Google Scholar]
Dwork, C. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–19. [Google Scholar]
Fouad, M.; Elbassion, K.; Bertino, E. A Supermodularity-Based Differential Privacy Preserving Algorithm for Data Anonymization. IEEE Trans. Knowl. Data Eng. 2014, 26, 1591–1601. [Google Scholar] [CrossRef]
Su, S.; Xu, S.; Cheng, X.; Li, Z.; Yang, F. Differentially Private Frequent Itemset Mining via Transaction Splitting. IEEE Trans. Knowl. Data Eng. 2015, 27, 1875–1891. [Google Scholar] [CrossRef]
Xu, S.; Cheng, X.; Su, S.; Xiao, K.; Xiong, L. Differentially Private Frequent Sequence Mining. IEEE Trans. Knowl. Data Eng. 2016, 28, 2910–2926. [Google Scholar] [CrossRef]
Soria-Comas, J.; Domingo-Ferrer, J.; Sánchez, D.; Megías, D. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1418–1429. [Google Scholar] [CrossRef]
Zhan, J.; Chow, C. Enabling Probabilistic Differential Privacy Protection for Location Recommendations. IEEE Trans. Serv. Comput. 2018, 14, 426–440. [Google Scholar] [CrossRef]
Zheng, Y.; Xie, X.; Ma, W. GeoLife: A Collaborative Social Networking Service among User, location and trajectory. IEEE Data Eng. Bull. 2010, 33, 32–40. [Google Scholar]
Shi, X.; Zhang, J.; Gong, Y. A dummy location generation algorithm based on the semantic quantification of location. In Proceedings of the IEEE International Conference on Artificial Intelligence and Computer Applications, Dalian, China, 28–30 June 2021; pp. 172–176. [Google Scholar]

Figure 1. The workflow of KD-LPP scheme.

Figure 2. The three-layer classifier.

Figure 3. The choice for the next location.

Figure 4. Stay-points and semantic tagging.

Figure 5. Example of KD-LPP scheme realization.

Figure 6. Example of spatial cloaking method.

Figure 7. Effect of location set entropy and location distance entropy.

Figure 8. Effect of average anonymous time and anonymous success rate.

Figure 9. Effect of anonymous success rate in different amount of privacy.

Table 1. Classified location privacy protection method.

Privacy Level (k)	Amount of Privacy ( $\| D \|$ )	Privacy Protection Method
A (01~05)	(0.0, 0.2]	N/A
B (06~10)	(0.2, 0,4]	Fuzzy location
C (11~15)	(0.4, 0,6]	Fake location
D (16~20)	(0.6, 0,8]	Spatial cloaking
E (21~25)	(0.8, 1.0]	Inhibition

Table 2. Service types of Beijing POI dataset.

Type	Service	Type	Service
1	Food and beverage service	11	Motorcycle service
2	Road ancillary	12	Auto service
3	Name address	13	Vehicle repair
4	Scenic spot	14	Car sales
5	Public facilities	15	Commercial housing
6	Companies	16	Life service
7	Shopping service	17	Sports leisure
8	Traffic facilities	18	Health care
9	Financial insurance	19	Government agencies
10	Science and education	20	Accommodation services

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, L.; Liu, X.; Jing, Z.; Yu, L.; Cai, Z.; Zhang, J. Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks. Electronics 2023, 12, 70. https://doi.org/10.3390/electronics12010070

AMA Style

Zhu L, Liu X, Jing Z, Yu L, Cai Z, Zhang J. Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks. Electronics. 2023; 12(1):70. https://doi.org/10.3390/electronics12010070

Chicago/Turabian Style

Zhu, Liang, Xiaowei Liu, Zhiyong Jing, Liping Yu, Zengyu Cai, and Jianwei Zhang. 2023. "Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks" Electronics 12, no. 1: 70. https://doi.org/10.3390/electronics12010070

APA Style

Zhu, L., Liu, X., Jing, Z., Yu, L., Cai, Z., & Zhang, J. (2023). Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks. Electronics, 12(1), 70. https://doi.org/10.3390/electronics12010070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks

Abstract

1. Introduction

2. Related Works

2.1. Location Privacy Preserving Based on User Behavior Analysis

2.2. Trajectory Privacy Preserving

2.3. Collaborative Mechanism between Privacy Preserving and Data Utility

3. Overview of KD-LPP Scheme

3.1. Problem Definition

3.2. Scheme Design

3.3. Attack Hypothesis

4. Models and Algorithms

4.1. Basic Portrait

4.2. Psychological Portrait

4.3. Location Transfer Matrix

5. Performance Evaluation

5.1. Datasets and Experimental Setup

5.2. Data Analysis and Function Realization

5.3. Experimental Results and Performance Analysis

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI