You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

24 December 2022

Knowledge-Driven Location Privacy Preserving Scheme for Location-Based Social Networks

,
,
,
,
and
1
College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
2
Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China
3
College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Advanced Edge Intelligence Collaborative Technology over Wireless Communications

Abstract

Location privacy-preserving methods for location-based services in mobile communication networks have received great attention. Traditional location privacy-preserving methods mostly focus on the researches of location data analysis in geographical space. However, there is a lack of studies on location privacy preservation by considering the personalized features of users. In this paper, we present a Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme, in order to mine user preferences and provide customized location privacy protection for users. Firstly, the UBPG algorithm is proposed to mine the basic portrait. User familiarity and user curiosity are modelled to generate psychological portrait. Then, the location transfer matrix based on the user portrait is built to transfer the real location to an anonymous location. In order to achieve customized privacy protection, the amount of privacy is modelled to quantize the demand of privacy protection of target user. Finally, experimental evaluation on two real datasets illustrates that our KD-LPP scheme can not only protect user privacy, but also achieve better accuracy of privacy protection.

1. Introduction

Mobile Internet has entered the era of the fifth generation of mobile communication (i.e., 5G). The 5G network, which is characterized by high speed, high reliability, low delay and a large number of terminal networks, has greatly changed the users socialize. With the popularity of intelligent terminals, and spatial and temporal sensors, massive and accurate location related data (e.g., GPS data) are shared by users to the central server. The movement behavior of target user can be mined by the central server according to situational awareness, machine learning, information fusion, etc. Therefore, the central server intelligently recommends Location-Based Services (i.e., LBS) for users according to the differences in interests of different users or the differences in preferences of the same user, in order to meet the personalized needs of target user [1,2,3].
The life of users has great convenience through Location-Based Social Networks (i.e., LBSNs). However, it leads to the risk of disclosure of personal privacy (e.g., identity, location, or query information, etc.) [4,5,6]. In LBSNs, the real location data need to be published to the central server by users. Through the process of data cleaning and fusion, the movement pattern and characteristics of users can be analyzed by the central server. Then, the central server recommends location-based services for users. However, the central server has the feature of being honest and curious, i.e., it not only performs the query and recommendation tasks rigorously, but also tries its best to mine the interests and preferences of target user. It will cause a serious threat to the personal privacy and safety of target user if the sensitive information is stolen by attackers.
The existing methods of location privacy protection are mainly based on false location, spatial anonymity, encryption or other technologies. Due to the problems of user sensitive information leakage, low data availability and lack of self-adaptation, these methods have been unable to meet the needs of diversified and personalized location service recommendation for LBSNs. Moreover, the unified privacy protection scheme lacks the consideration of user situational information and preference information, which seriously affects the availability of sensitive data and the performance of service recommendation. In order to solve the above problems, it is necessary to research the customizable and quantifiable privacy protection schemes based on deeply recognizing the rules of scenarios, users and services, in order to protection user privacy and improve the data utility.
This paper draws on the ideas of new technologies such as user portraits and knowledge mining to propose the Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme for Location-Based Social Networks (LBSNs). The contributions of our work can be divided into four aspects as follows:
(1) We mine the stay-points and locations based on the original trajectory sequence of users. Furthermore, each stay-point is tagged with semantic information, in order to generate the user portrait.
(2) We construct the user portrait by considering basic attributes and psychological attributes of target user. The UBPG algorithm is proposed to generate basic portrait. User familiarity and user curiosity are modelled to generate psychological portrait.
(3) We build a location transfer matrix to hide the real location of target user. The amount of privacy is modelled to quantize the demand of privacy protection, so as to provide customized privacy protection for the target user.
(4) We conduct an extensive experimental study to verify the functions and performance of proposed KD-LPP scheme over two real datasets. The experiment results show that our KD-LPP scheme can privately provide customized services for users.

3. Overview of KD-LPP Scheme

In this section, we give the problem definition, scheme design and attack hypothesis of KD-LPP scheme.

3.1. Problem Definition

Definition 1 (Stay-point).
Stay-points indicate that users stay in a certain geographic area for a period of time. Through the algorithm of stay-point detection, it can infer that the target user has done some meaningful activities in the area. Each stay-point can be formed by the triple  l o n , l a t , θ t , where  l o n , l a t  represents the longitude and latitude of the stay-point,  θ t  represents the length of time the user stays at each stay-point.
Definition 2 (Location).
Location is the clustering of stay-points, i.e., the stay-points with the same semantic information are clustered into locations. Each location can be formed by the triple  l o n , l a t , t y p e , where  l o n , l a t  represents the longitude and latitude of the location, and  t y p e  represents the corresponding semantic information.
Definition 3 (Semantic information).
Locations with different semantics have different access probabilities in different time periods. Furthermore, locations with similar semantics may also have different access probabilities. To prevent attackers from identifying less likely locations by user access time, the semantic information of a location unit is established by using the number of user visits in different time periods. The semantic information of location unit i can be expressed as  S i = N 1 , N 2 , , N 24 , where  N 1 , N 2 , , N 24  represents the access frequency of location unit i in 24 time periods, respectively.
Definition 4 (User portrait).
User portrait is the characteristic model of users in the real world, which can better reflect the movement pattern and personalized preference of users in LBSNs. The user portrait of user u can be formed by  P A u = p a 1 , p a 2 , , p a n , where represents the different preference characteristics of user u.
Definition 5 (Privacy requirements).
Privacy requirements reflect the protection of personalized needs for different users. The privacy requirements can be formed by  P P D = P A u , D u , where  P P D  represents the privacy protection demand of target user u,  P A u  represents the user portrait of user u,  D u  represents the customize privacy parameter of user u.

3.2. Scheme Design

As shown in Figure 1, the workflow of KD-LPP scheme can be divided into four stages: initialization, semantic tagging, user portrait and transfer matrix. The detailed explanations of four stages are as follows.
Figure 1. The workflow of KD-LPP scheme.
Initialization
In this stage, the mobile device collects GPS trajectory data locally as the original data of users. The original trajectory data is used to generate the target user’s stay-points through the stay-point detection algorithm to reflect stay behavior of user in a certain area. The real location of the user can be obtained by the stay-points clustering for the following location anonymity processing.
Semantic tagging
In this stage, the generated stay-points are tagged with semantic information through the method of data fusion. Semantic information can well reflect the user’s meaningful activities in the stay area, so as to effectively mine the personalized preferences of users. In addition, some sensitive semantic locations of target user (e.g., home address, workplace, etc.) will be deleted to protect the privacy information of the user from being leaked.
User portrait
In this phase, the user portrait of each user is generated based on the semantic-tagged location data. User portrait can better describe the user’s characteristics, preferences and so on. The user portrait will be input into the construction process of transfer matrix as knowledge, so as to complete the establishment of knowledge model.
Transfer matrix
In this stage, the location transition matrix is generated according to the knowledge model constructed in the above three stages. When target user inputs the real location, the transition matrix can generate the corresponding anonymous location according to transition probability, and output to the anonymous location set. Because the transition matrix is generated based on the knowledge model of different users, it can generate anonymous locations for different users to meet their personalized and customized needs.

3.3. Attack Hypothesis

In this paper, it is assumed that LBS server is an active attacker with background knowledge such as map location information, historical query probability and location semantic information. In the case of snapshot query, privacy threats from active attackers can be divided three aspects. First, locations in the anonymous location set have different query probabilities. The attacker is more likely to think that the location with higher query probability in the anonymous location set is the real location. Second, the attacker with background knowledge of location semantic information can identify locations that are less likely to be accessed in the anonymous location set at a certain time. Third, if the locations in the anonymous location set are distributed in a certain area or around the real location, the attacker can get the approximate location of the target user’s real location according to inference.

4. Models and Algorithms

In this section, we give the models and algorithms of KD-LPP scheme. The user portrait can be divided into basic portrait and psychological portrait. Basic portrait, such as gender, date of birth, place of origin, occupation, education, etc., are fixed over a long period of time or that do not change throughout life for each user. Different from the basic portrait, psychological portrait can describe the personalized preference of each user. Also, the psychological characteristics will be different over different time and location, which reflect the changes of user requirements in different situations.

4.1. Basic Portrait

Temporal characteristics, spatial characteristics and location knowledge of each user can be extracted from the Geolife dataset [33], which will be explained in Section 5.1. These three attributes are closely related to the basic portrait of users.
Time characteristic. Time characteristic refers to the user trajectory data under different time segments, which has a strong connection with the basic attributes of the user. For example, students commute between home and school at nearly exactly the same time during the working day. Office workers commute between their homes and offices during morning and evening rush hours in the city. Take-out waiters crisscross the city during lunches and dinners.
Spatial characteristics. Spatial characteristics mean that the trajectory data of continuous movement of users are constrained by the physical distance in the real world. According to the analysis of each trajectory sequence, it is found that the distance of continuous movement of most users is within 20 km. For example, in the trajectory sequence, it is possible for a user to move from one place to another in the same city, but it is impossible to move from one city to another.
Location knowledge. Location knowledge refers to the location with specific semantic categorical information. Location knowledge is closely related to the basic attributes of users. For example, researchers usually gather in research institutes because it has a large number of research projects. Workers tend to be in factories because there are a lot of production tasks. Teachers are usually in school during the day because they need to take on a lot of teaching tasks.
In order to avoid the disclosure of sensitive information of target user, the location data (e.g., home address, workplace, etc.) will be firstly distinguished through the time and spatial characteristic during the stage of initialization. Then, the sensitive information will be blocked in the stage of semantic tagging. Finally, the geographic location information and semantic location information are considered to generate the basic attributes of users. In this paper, the three-layer classifier is utilized to better classify the model through strong supervision. Figure 2 shows the workflow of the three-layer classifier. Algorithm 1 is the pseudo-code of User Basic Portrait Generation (UBPG) algorithm. The detailed steps are as follows:
Algorithm 1 User Basic Portrait Generation
Input: Semantic information S i = N 1 , N 2 , , N 24 , Time characteristics D h o l i , D w o r k , H i
Output: User basic portrait P A = p a 1 , p a 2 , , p a n
1: Input the semantic information S i into the layer 1 classifier for the preliminary characterization of user attributes;
2: Obtain the result of the first classification P A 1 = p a 1 1 , p a 2 1 , , p a n 1 ;
3: Input P A 1 into the layer 2 classifier;
4: Obtain the result of the second classification P A 2 through logistic regression;
5: Input the time characteristics D h o l i , D w o r k , H i and P A 2 into the layer 3 classifier;
6: Return P A
Figure 2. The three-layer classifier.
In the above Algorithm 1, S i represents the semantic information of each stay-point, D h o l i represents the rest day, D w o r k represents the working day, and H i = h 1 , h 2 , , h 24 represents the 24 h from h 1 to h 24 in a day. Firstly, the semantic information is tagged for each location in the stage of semantic tagging. Then, the tagged semantic information is input to the layer 1 classifier to achieve the preliminary classification. The result of the first classification P A 1 is input to the layer 2 classifier to complete the fine-grained classification. In order to well describe the interest and preference of users, the result of the second classification P A 2 combined with time characteristics are input to the layer 3 classifier. Finally, the basic portrait P A of target user is generated through the three-layer classifier. According to UBPG algorithm, the generated basic attributes can be utilized to well finish the customized location privacy protection.

4.2. Psychological Portrait

The psychological portrait of users can be divided into user familiarity and user curiosity. The corresponding models are as following.
(1) User Familiarity Model
The change of the user’s location over time generates the trajectory sequence S e q = L 1 , L 2 , , L n , where L i represents the user’s location, n represents the number of times the user’s location changes. Each location L i has the longitude and latitude according to the triple l o n , l a t , t y p e . As shown in Figure 3, it is variable for the next location before the user arrives at L n + 1 , where L n + 1 ( L d 1 , L d 2 , , L d m ) indicates that it may have m different choices when the user moves from L n to L n + 1 .
Figure 3. The choice for the next location.
According to the psychology characteristics of users, there are two factors need to be considered when users will move to the next location. One is the familiarity of the next location itself to the target user. Another is the familiarity of the generated trajectory sequence to the target user. The greater the familiarity of the next location or generated trajectory sequence to the user, the greater the psychological preference of the user, and the smaller the privacy level required. On the contrary, the less familiarity the user is about a location, the stronger the privacy requirements for the location.
In order to dynamically select the next location, the probability matrix is modelled as follows:
N P M i L i 1 ( # L i + 1 ) ( T L i , L i + 1 + 1 ) ,
where # L i represents the occurrence number of location L i before the i-th movement in the trajectory sequence of the user, T L i , L i + 1 represents the occurrence number of the trajectory sequence from location L i to location L i + 1 before the i-th movement.
(2) User Curiosity Model
In this section, we use user curiosity as a measure of whether users are willing to experience new things and take the risk of privacy disclosure. For example, a user with strong curiosity is more willing to explore new things, and that user will have a lower need for privacy protection. For more conservative users, the need for privacy protection should be increased. Location novelty can increase the user’s curiosity to access the location. The degree of location novelty is mainly determined by three factors, namely, the frequency of the user’s stay at the location, the length of the user’s stay at the location, and the similarity between the next location and the previous location. The degree of location novelty can be calculated by Formula (2).
N o v u , i t = 1 3 × ( S F u , i t + S R u , i t + D I S u , i t ) ,
where N o v u , i t represents the novelty of location L i for user u at time t. S F u , i t rep-resents the frequency that user u visited location L i before time t. S R u , i t represents the time duration that user u visited location L i . D I S u , i t represents the degree of difference between location L i and the historical location of user u.
According to the attenuation function of human memory and response to things proposed by Bayesian, S F u , i t can be calculated by Formula (3).
S F u , i t = e α × I u , i t ,
where α represents the attenuation coefficient. The value range of α is 0 , 1 . The experimental results show that it has stability when α = 1 . | I u , i t | represents the time duration that user u visited location L i before time t. The more times the user u visits the location L i , the greater the value of | I u , i t | and the smaller the value of S F u , i t .
S R u , i t can be calculated by Formula (4).
S R u , i t = e t t I u , i 1 1 ,
where t ( I u , i 1 ) represents the timestamp of the last access to location L i before time t for user u. The closer the latest timestamp is to time t of user u visiting location L i , the smaller the S R u , i t .
D I S u , i t can be calculated by Formula (5).
D I S u , i t = 1 | 2 × T a g s ( i ) | × t a g T a g s ( i ) ( e ρ × I u , t a g + e t t I u , t a g 1 1 ) ,
where T a g s ( i ) represents the set of semantic information owned by location L i , | I u , t a g | represents the number of locations with target tags visited by user u before time t. t ( I u , t a g 1 ) represents the timestamp of the last visit of user u to the location with the target tags prior to time t. Moreover, the parameter ρ is added, in order to ensure that e ρ × I u , t a g and e t t I u , t a g 1 1 have the same order of magnitude. In our experiment, the value of ρ is set as 0.05. According to the Formula (5), the more similar the user visits a new location to the historical locations, the smaller the difference, the smaller the D I S u , i t .
Through the above formulas, the novelty of each location for the target user is acquired. Thus, the user portrait P C = P C 1 , P C 2 , P C n is built according to basic portrait and psychological portrait, which can be used for the location transfer matrix construction.

4.3. Location Transfer Matrix

The user characteristics, such as job, preference, character, etc. can be described through the basic portrait and psychological portrait of target user. In order to build the location transfer matrix, the weight of privacy protection s i is given for each portrait feature P C i . The privacy sensitivity is divided into P grades. The weight set of privacy protection for all user can be constructed as P S = ( s 1 , s 2 , , s n ) , where 0 s i 1 , ( 1 i n ) . The vector P S can reflect the personalized location privacy protection of users.
For any portrait feature P C i , the specific weight of privacy protection can be represented as S P C i , where 0 S P C i P 1 . Thus, the strength of privacy protection U P _ S P C i can be calculated by Formula (6).
U P _ S P C i = ln ( 1 + ( S P C i + φ ) / P ) .
In Formula (6), S P C i = 0 represents the portrait feature P C i can be fully shared by the target user. In order to prevent the mathematical calculation problem of S P C i being 0, the parameter φ is given, and φ approaches 0 infinitely.
Let the vector π P L i = ( s P L i , 1 , s P L i , 2 , , s P L i , n ) be the weight of the privacy protection for the target user at location L i . For any users, the location transfer matrix can be calculated by Formula (7).
= π P L 1 , π P L 2 , , π P L n T = U P _ S P L i , P C j n × k = U P _ S i , j n × k .
In order to quantize the strength of location transfer, let | D | represents the amount of privacy for customized privacy protection, which can be calculated by Formula (8).
| D | = e | | F e | | F e | | F + e | | F .
As shown in Table 1, the privacy protection method can be selected according to the change of the amount of privacy | D | . The privacy level k increases gradually as the amount of privacy increases. Furthermore, different privacy protection methods correspond to different privacy levels. For example, it does not need privacy protection when the amount of privacy D 0.0 , 0.2 , and the privacy level is A, k 1 , 2 , 3 , 4 , 5 . It needs to suppress the location published when the amount of privacy D 0.8 , 1.0 , and the privacy level is E, k 21 , 22 , 23 , 24 , 25 .
Table 1. Classified location privacy protection method.
Thus, the customized privacy protection can be achieved due to the amount of privacy | D | is changed according to the basic portrait and psychological portrait of target user. In the next section, the performance of proposed KD-LPP scheme is evaluated.

5. Performance Evaluation

5.1. Datasets and Experimental Setup

In this paper, MatLab is used to analyze the location data and verify the performance of the proposed KD-LPP scheme. The GPS trajectory dataset of GeoLife project of Microsoft Asia Research Institute is used as the original trajectory data of users [33]. From April 2007 to August 2012, Geolife dataset has collected trajectory data of 182 users, including 17,621 tracks, with a total distance of more than 1.2 million kilometers and a total time of more than 48,000 h. Geolife dataset includes not only the daily activities (e.g., studying, going to work, and coming home, etc.) of users, but also personalized activities (e.g., shopping, traveling, dining, and sports, etc.). Most of the data in the Geolife dataset is located in Beijing, with a small amount of data located in Europe or the United States. Since only location data located in Beijing are considered, we first need to filter the Geolife dataset to screen out all data points with latitude 39.4~41.1 and longitude 115.4~117.6. Second, the 200 × 200 location units of the same size are divided for easy calculation.
In order to add semantic tagging to the location, this paper uses the Beijing POI dataset, which records the location information contained by most points of interest in Beijing, namely, longitude and latitude coordinates. The original Beijing POI dataset is divided into 20 service types as shown in Table 2, which can be used for the next semantic tagging.
Table 2. Service types of Beijing POI dataset.

5.2. Data Analysis and Function Realization

The geographic location covered by each type can be acquired according to the POI dataset. By fusing Geolife dataset and POI dataset, the service name, type, latitude and longitude information of each geographic location in Geolife dataset can be effectively identified. In this section, TF-IDF model is used to tag the semantical information of each stay-points. TF-IDF model is a classical weighting technique in the field of information retrieval and data mining, where TF represents the word frequency, i.e., the retrieval frequency of the words to be retrieved in the file, IDF indicates the frequency of reverse files. The implementation process of data cleaning, stay-points generation and location clustering has been detailed explanation in our previous research [1]. The experiment of semantic tagging and user portrait in this paper is the extension of reference [1]. Figure 4 shows the generated stay-points and corresponding semantic tagging of a target user. We can see that the user has visited seven kinds of semantic location in the trajectory sequence. It can be utilized for building a user portrait model.
Figure 4. Stay-points and semantic tagging.
In order to realize the function of customized privacy protection, the system based on KD-LPP scheme is built. Figure 5 shows an example of the function realization of proposed KD-LPP scheme. In Figure 5a, the privacy protection method can be selected according to the demand of target user. The corresponding anonymous area, time and privacy level is generated. In Figure 5b, it shows the real location and the generated dummy location. The corresponding semantic information of two locations is also generated. Figure 6 shows an example of spatial cloaking method for KD-LPP scheme. In Figure 6a, three candidate locations are recommended when the real location of target user is taken as input. In Figure 6b, six candidate locations are recommended when the set of locations is taken as input.
Figure 5. Example of KD-LPP scheme realization.
Figure 6. Example of spatial cloaking method.
The above functions of KD-LPP schemes reflect that it can realize the customized privacy protection according to the demand and preference of target. Thus, the data utility can be well improved, so as to provide high Quality of Experience for the target user.

5.3. Experimental Results and Performance Analysis

In this part, we compare the performance of proposed KD-LPP scheme with k-NN scheme [22] and VLBS scheme [34]. There are four indexes to evaluate the three algorithms as follows.
(1) Location set entropy
Location set entropy is used to measure the uncertainty of historical query probability among locations in the anonymous location set. The larger the location set entropy, the more similar the historical query probability between locations in the anonymous location set, the higher the uncertainty of the attacker to infer the real location, and the better the location privacy protection effect. The location set entropy can be calculated by Formula (9).
H R = i = 1 k p i log 2 p i .
In Formula (9), H R reaches the maximum value when H R = log 2 k , and the uncertainty in the anonymous location set is the highest.
(2) Location distance entropy
Location distance entropy is the measure of the distance from each location in the anonymous location set to the center of the set. The location distance entropy can reflect the physical distribution uniformity of the locations in the anonymous location set. The smaller the distance entropy, the greater the difference of distance from each location and the center of the anonymous location set, and the more uniform the physical distribution. The location distance entropy can be calculated by Formula (10).
H d = i = 1 k d R i , l c e n t r e log 2 d R i , l c e n t r e ,
where l c e n t r e represents the central coordinate of the anonymous location set.
(3) Average anonymous time
In LBSNs, the service quality and privacy protection are equally important for target user. The average anonymous time is the most intuitive factor to measure the Quality of Experience (QoE). Therefore, under the condition of ensuring the quality of service and the effect of privacy protection, the smaller the average anonymous time, the better the QoE. In this paper, the average anonymous time is the average time to generate the anonymous location set by repeating the experiment many times for different values of k.
(4) Anonymous success rate
Anonymous success rate is used to measure the ability of the location privacy algorithm to resist attacks by attackers. The higher the anonymous success rate, the harder it is for an attacker to infer the real location from the anonymous location set. Considering that the k-NN algorithm and VLBS algorithm compared in the experiment have similar location set entropy, as proposed in the KD-LPP algorithm, the anonymous success rate in this experiment only considers the difference between location semantics in the anonymous location set. The anonymous success rate can be calculated by Formula (11).
A S R = C o u n t S e m S i m i l a r R i , R j θ 1 , θ 2 k k 2 k ,
where C o u n t S e m S i m i l a r R i , R j θ 1 , θ 2 represents the number of locations that meet the upper and lower limits of semantic information.
Figure 7a shows the effect of location set entropy for different methods when the privacy level k is changed. The location set entropy for anonymous locations can reach the maximum H R = log 2 k under the ideal condition. In Figure 6a, we can see that the location set entropy of k-NN method, VLBS method and proposed KD-LPP method is almost ideal for different values of k. The location set entropy of random method has a minimum for different values of k.
Figure 7. Effect of location set entropy and location distance entropy.
Figure 7b shows the effect of location distance entropy of different methods when the privacy level k is changed. We can see that the location distance entropy of the proposed KD-LPP method is always the smallest than the other three methods. It suggests that the KD-LPP method has better physical distribution uniformity. Therefore, the KD-LPP method has the best effect of resisting inference attack from the perspective of historical query probability compared to the other methods.
Figure 8a shows the effect of average anonymous time for different methods when the privacy level k is changed. We can see that the average anonymous time increases as the value of k increases. The average anonymous time of k-NN method is the longest in the four methods. The proposed KD-LPP method has the shorter average anonymous time than the k-NN method and VLBS method. When k > 6 , the growth range of average anonymous time of k-NN method is obviously higher than VLBS method and KD-LPP method. The reason is that the k-NN method does not consider the semantic similarity of the locations. In the process of anonymous location selection, the KD-LPP method proposed in this paper does not need to calculate the distance between locations in the anonymous location set for each round, which greatly reduces the time consumption.
Figure 8. Effect of average anonymous time and anonymous success rate.
Figure 8b shows the effect of anonymous success rate for different methods when the privacy level k is changed. In this paper, the user access frequency of 24 periods is utilized to quantify the location semantic information. According to the quantification characteristics, the location units with similar query probability are more likely to have higher semantic similarity, which is also the reason why the anonymous success rate of the Random method is the lowest in the four methods. The anonymous success rate of the k-NN method is lower than VLBS method and KD-LPP method because of the lack of consideration regarding the semantic similarity of the locations. Compared with the VLBS method, the KD-LPP method has higher anonymous success rate. The reason is that the KD-LPP method takes user portrait into consideration to generate an anonymous location. Therefore, the KD-LPP method has the best effect of resisting background knowledge attack compared to the other methods.
In order to meet the demands of different users for location privacy in different scenarios, the proposed KD-LPP method allows users to customize the upper and lower limits of the amount of privacy in the anonymous location set. The anonymous success rate can directly reflect the effect of privacy protection. Figure 9a,b shows the effect of lower limit and higher limit of the amount of privacy on anonymous success rate of anonymity when the privacy level k is changed, respectively. In Figure 9a, we can see that the anonymous success rate gradually decreases as the lower limit of the amount of privacy increases. The reason is the selected privacy protection methods (i.e., N/A, Fuzzy location) cannot effectively anonymize the location. In Figure 9b, the anonymous success rate gradually increases as the higher limit of the amount of privacy increases because of the higher privacy level. The corresponding privacy protection methods (i.e., Spatial cloaking, Inhibition) can effectively anonymize the location. Under the condition of D = 0.0 and D = 1.0 , the anonymous success rate of KD-LPP method is always 1.0, because the value range of the amount of privacy is [0.0, 1.0]. It does not need the privacy protection method when D = 0.0 , and the privacy protection method of Inhibition can disable the publication of location when D = 1.0 . Therefore, the anonymous location set is valid at any time.
Figure 9. Effect of anonymous success rate in different amount of privacy.

6. Conclusions and Future Work

In this paper, we study the problem of customized location privacy protection under the user portrait model in LBSNs. First, we introduce LBSNs which provide personalized location-based service for users. Then, we explore the possibility of designing a Knowledge-Driven Location Privacy Preserving (KD-LPP) scheme, which can dynamically select the privacy protection method according to the quantified amount of privacy. By experiments, it shows that our KD-LPP scheme can provide customized privacy protection for target user with high location anonymous success rate. For future work, we will further complete the privacy protection scheme considering on the distributed computing or edge computing, in order to better reduce the risk of privacy leakage.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; software, X.L.; validation, L.Z.; formal analysis, Z.C.; investigation, L.Y.; resources, J.Z.; data curation, Z.J.; writing—original draft preparation, L.Z. and X.L.; writing—review and editing, L.Z. and X.L.; visualization, Z.C.; supervision, J.Z.; project administration, Z.J.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61902361, in part by the Henan Key Research Project of Higher Education Institutions under Grant No. 22B520046, the Henan Key Laboratory of Network Cryptography Technology under Grant No. LNCT2021-A15, the Henan Province Key Research and Development Special Project No. 221111210500, the Henan Provincial Science and Technology Department under Grant No. 212102210095.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used to support the findings of the study are available within the article.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Zhu, L.; Xu, C.; Guan, J.; Zhang, H. SEM-PPA: A semantical pattern and preference-aware service mining method for personalized point of interest recommendation. J. Netw. Comput. Appl. 2017, 82, 35–46. [Google Scholar]
  2. Xiao, H.; Xu, C.; Feng, Z.; Ding, R.; Yang, S.; Zhong, L.; Liang, J.; Muntean, G.M. A Transcoding-Enabled 360° VR Video Caching and Delivery Framework for Edge-Enhanced Next-Generation Wireless Networks. IEEE J. Sel. Areas Commun. 2022, 40, 1615–1631. [Google Scholar] [CrossRef]
  3. Liu, X.; Wang, G.; Bhuiyan, M. Re-ranking with multiple objective optimization in recommender system. Trans. Emerg. Telecommun. Technol. 2022, 33, e4398. [Google Scholar] [CrossRef]
  4. Xu, C.; Zhu, L.; Liu, Y.; Guan, J.; Yu, S. DP-LTOD: Differential Privacy Latent Trajectory Community Discovering Services over Location-Based Social Networks. IEEE Trans. Serv. Comput. 2021, 14, 1068–1083. [Google Scholar] [CrossRef]
  5. Zhu, L.; Xie, H.; Liu, Y.; Guan, J.; Liu, Y.; Xiong, Y. PTPP: Preference-Aware Trajectory Privacy-Preserving over Location-Based Social Networks. J. Inf. Sci. Eng. 2018, 34, 803–820. [Google Scholar]
  6. Lu, J.; Zhang, Y.; Yang, L.; Jin, S. Inter-cloud secure data sharing and its formal verification. Trans. Emerg. Telecommun. Technol. 2022, 33, e4380. [Google Scholar] [CrossRef]
  7. Ding, X.; Gan, Q.; Bahrmi, S. A systematic survey of data mining and big data in human behavior analysis: Current datasets and models. Trans. Emerg. Telecommun. Technol. 2022, 33, e4574. [Google Scholar] [CrossRef]
  8. Jeyashree, G.; Padmavathi, S. IHAR—A fog-driven interpretable human activity recognition system. Trans. Emerg. Telecommun. Technol. 2022, 33, e4506. [Google Scholar]
  9. Montazeri, Z.; Houmansadr, A.; Pirhro, N. Achieving Perfect Location Privacy in Wireless Devices Using Anonymization. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2683–2698. [Google Scholar] [CrossRef]
  10. Olteanu, A.M.; Huguenin, K.; Shokri, R.; Humbert, M.; Hubaux, J.P. Quantifying Interdependent Privacy Risks with Location Data. IEEE Trans. Mob. Comput. 2017, 16, 829–840. [Google Scholar]
  11. Li, H.; Zhu, H.; Du, S.; Liang, X.; Shen, X. Privacy Leakage of Location Sharing in Mobile Social Networks: Attacks and Defense. IEEE Trans. Dependable Secur. Comput. 2018, 15, 646–660. [Google Scholar] [CrossRef]
  12. Huguenin, K.; Bilogrevic, I.; Machado, J.S.; Mihaila, S.; Shokri, R.; Dacosta, I.; Hubaux, J.P. A Predictive Model for User Motivation and Utility Implications of Privacy Protection Mechanisms in Location Check-Ins. IEEE Trans. Mob. Comput. 2018, 17, 760–774. [Google Scholar] [CrossRef]
  13. He, X.; Jin, R.; Dai, H. Leveraging Spatial Diversity for Privacy-Aware Location Based Services in Mobile Networks. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1524–1534. [Google Scholar] [CrossRef]
  14. You, T.; Peng, W.; Lee, W. Protecting Moving Trajectories with Dummies. In Proceedings of the International Conference on Mobile Data Management, Beijing, China, 27–30 April 2008; pp. 278–282. [Google Scholar]
  15. Kato, R.; Iwata, M.; Hara, T.; Suzuki, A.; Xie, X.; Arase, Y.; Nishio, S. A dummy-based anonymization method based on user trajectory with pauses. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–9 November 2012; pp. 249–258. [Google Scholar]
  16. Hwang, R.; Hsueh, Y.; Chung, H. A Novel Time-Obfuscated Algorithm for Trajectory Privacy Protection. IEEE Trans. Serv. Comput. 2013, 7, 126–139. [Google Scholar] [CrossRef]
  17. Gao, K.; Xu, C.; Ji, X.; Qin, J.; Yang, S.; Zhong, L.; Wu, D. Freshness-Aware Age Optimization for Multipath TCP Over Software Defined Networks. IEEE Trans. Netw. Sci. Eng. 2021. early access. [Google Scholar] [CrossRef]
  18. Chen, X.; Xu, C.; Wang, M.; Wu, Z.; Zhong, L.; Grieco, L.A. Augmented Queue-based Transmission and Transcoding Optimization for Livecast Services Based on Cloud-Edge-Crowd Integration. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4470–4484. [Google Scholar] [CrossRef]
  19. Gao, S.; Ma, J.; Shi, W.; Zhan, G.; Sun, C. TrPF: A Trajectory Privacy-Preserving Framework for Participatory Sensing. IEEE Trans. Inf. Forensics Secur. 2013, 8, 874–887. [Google Scholar] [CrossRef]
  20. Chen, R.; Fung, B.C.; Mohammed, N.; Desai, B.C.; Wang, K. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 2013, 231, 83–97. [Google Scholar] [CrossRef]
  21. Zhao, O.; Liu, X.; Li, X.; Singh, P.; Wu, F. Privacy-preserving data aggregation scheme for edge computing supported vehicular ad hoc networks. Trans. Emerg. Telecommun. Technol. 2022, 33, e3952. [Google Scholar] [CrossRef]
  22. Yi, X.; Paulet, R.; Bertino, E.; Varadharajan, V. Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy. IEEE Trans. Knowl. Data Eng. 2016, 28, 1546–1559. [Google Scholar] [CrossRef]
  23. Celdrán, A.H.; Pérez, M.G.; Clemente, F.J.G.; Perez, G.M. Precise: Privacy-aware recommender based on context information for cloud service environments. IEEE Commun. Mag. 2014, 52, 90–96. [Google Scholar] [CrossRef]
  24. Chen, X.; Wu, X.; Li, X.Y.; Ji, X.; He, Y.; Liu, Y. Privacy-aware High-Quality Map Generation with Participatory Sensing. IEEE Trans. Mob. Comput. 2014, 15, 719–732. [Google Scholar] [CrossRef]
  25. Hua, J.; Li, F.; Guo, Y.; Geng, K.; Niu, B. Research on privacy protection in the process of information exchange. Chin. J. Netw. Inf. Secur. 2016, 2, 28–38. [Google Scholar]
  26. Zhang, W.; Liu, Q.; Zhu, H. Evaluation and protection of multi-level location privacy based on an information theoretic approach. Chin. J. Commun. 2019, 40, 51–59. [Google Scholar]
  27. Dwork, C. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–19. [Google Scholar]
  28. Fouad, M.; Elbassion, K.; Bertino, E. A Supermodularity-Based Differential Privacy Preserving Algorithm for Data Anonymization. IEEE Trans. Knowl. Data Eng. 2014, 26, 1591–1601. [Google Scholar] [CrossRef]
  29. Su, S.; Xu, S.; Cheng, X.; Li, Z.; Yang, F. Differentially Private Frequent Itemset Mining via Transaction Splitting. IEEE Trans. Knowl. Data Eng. 2015, 27, 1875–1891. [Google Scholar] [CrossRef]
  30. Xu, S.; Cheng, X.; Su, S.; Xiao, K.; Xiong, L. Differentially Private Frequent Sequence Mining. IEEE Trans. Knowl. Data Eng. 2016, 28, 2910–2926. [Google Scholar] [CrossRef]
  31. Soria-Comas, J.; Domingo-Ferrer, J.; Sánchez, D.; Megías, D. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1418–1429. [Google Scholar] [CrossRef]
  32. Zhan, J.; Chow, C. Enabling Probabilistic Differential Privacy Protection for Location Recommendations. IEEE Trans. Serv. Comput. 2018, 14, 426–440. [Google Scholar] [CrossRef]
  33. Zheng, Y.; Xie, X.; Ma, W. GeoLife: A Collaborative Social Networking Service among User, location and trajectory. IEEE Data Eng. Bull. 2010, 33, 32–40. [Google Scholar]
  34. Shi, X.; Zhang, J.; Gong, Y. A dummy location generation algorithm based on the semantic quantification of location. In Proceedings of the IEEE International Conference on Artificial Intelligence and Computer Applications, Dalian, China, 28–30 June 2021; pp. 172–176. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.