An Efﬁcient Approach for LBS Privacy Preservation in Mobile Social Networks

: With the rapid development of smart handheld devices, wireless communication, and positioning technologies, location-based service (LBS) has been gaining tremendous popularity in mobile social networks (MSN). Users’ daily lives are facilitated by the applications of LBS, but users’ privacy leaking hinders the further development of LBS. In order to solve this problem, techniques such as k-anonymity and l-diversity have been widely adopted. However, most papers that combine with k-anonymity and l-diversity focus on the security of users’ privacy with little consideration of service efﬁciency. In this paper, we ﬁrstly treat the relationship between k-anonymity and l-diversity in the clustering process from a dynamic and global perspective. Then a service category table based algorithm (SCTB) is designed to identify and calculate l-diversity securely and efﬁciently, which promotes the cooperative efﬁciency of users in LBS query, especially when the preference privacy that users request in the clustering process have similarities. Finally, theoretical performance analysis and extensive experimental studies are performed to validate the effectiveness of our SCTB algorithm.


Introduction
Due to the rapid advance of mobile communication technologies and modern smart mobiles, location-based service (LBS) has become ubiquitous in recent years.With positioning technologies and applications loaded on smart mobiles, users could query location-based service providers (LBSP) for personalized services according to their locations, such as finding the nearest restaurant, getting a travel guide, or searching for a nearby market, etc.
Although location-based service brings convenience to users and plays an increasingly important role in daily life, people are also aware that their private information may be revealed along with the services provided by LBSP.Hence, the problem of privacy leakage has become a main factor that hinders the further development of LBS.Therefore, how to protect users' privacy information while providing conventional services to users has become a topic of major interest.
Generally speaking, current popular applications on smart phones can be roughly divided into two categories.One is to hide the "who" aspect completely for LBSP, and just uses the service with "what" and "where", such as AMAP.Another is not to hide the "who" for LBSP, and uses the service with "who", "what", and "where", such as Dianping.Here, "who" means the identity of a user, "where" means the location of a user, and "what" means the preference of a user.For the wide application of LBS, users' sensitive information (i.e., "who", "where", "what") should be protected well while interacting with LBSP.For the aspect of "who", security techniques such as pseudo-ID [1] and pseudonym are usually used to make users' real identity indistinguishable, but it is insufficient to protect users' sensitive information [2].For the aspect of "what", exposure of a user's preference may result in the leakage of his sensitive information.For example, if Bob often sends some requests about the knowledge of coronary disease to LBSP, then LBSP or adversary could infer that Bob or his family members may have a health problem of coronary disease.For the aspect of "where", if Bob often reveals his locations close to a hospital when sending his requests to LBSP, adversary or LBSP may also infer that Bob may have some health problems.If this sensitive information is leaked, some commercial organizations or adversary may send drug ads to Bob frequently.To address the problem of leaking users' privacy caused by location information, k-anonymity as a classic scheme was proposed.k-anonymity is an anonymity set which is composed of users with different locations, and makes one user's location indistinguishable from the locations of at least k-1 other users at some point [3].For example, there are k users simultaneously requesting LBSP.
But k-anonymity is insufficient to guarantee users' preference privacy when they request the same preference service in LBS query.There are at least two attacks, the Homogeneity Attack and Background Knowledge Attack, that can be used to compromise a anonymity set [4].In order to overcome this deficiency, the concept of l-diversity, which ensures at least l well-represented values in each equivalence class, was proposed to protect users' privacy.Figure 1 shows the way to protect users' privacy by k-anonymity and l-diversity.There are 6 users (i.e., k = 6) who are distributed in different locations requesting 4 different services (i.e., l = 4), and LBSP cannot link a specific service to a specific user.There are many studies adopt both k-anonymity and l-diversity to protect personal privacy in LBS query.For example, the authors [5] proposed a Dummy-Location-Selection (DLS) algorithm to protect location privacy and preference privacy.The paper [6] used Snet-Hierarchy to protect personal privacy by k-anonymity and l-diversity.A personalized spatial cloaking scheme named TTcloak was proposed in paper [7], users' location privacy and preference privacy were protected by k-anonymity and l-diversity respectively in Mobile Social Networks.Currently, most of the existing works concerning the LBS query mainly focus on the security of users' privacy with little consideration of service efficiency.Based on k-anonymity and l-diversity, some current research work, such as paper [8], solved the problem that how to calculate the number of l-diversity without a trusted anonymizer server under a distributed architecture.But the algorithm, proposed to protect preference privacy in the clustering process, has a high degree of complexity.The authors [9] mentioned that users' service should be classified to different service categories, but did not consider how to identify the classified service categories, and it is inevitable to incur a huge time cost if using the way of semantic recognition.Therefore, under the system of distributed architecture, how to identify and calculate l-diversity securely and efficiently in the clustering process is still a challenge.As far as we know, there is no research on this problem.Furthermore, in real applications, it is unreasonable to judge the attribute of a user's location in LBS query from a traditional and static viewpoint.For example, restaurants and schools are often seen as sensitive or dense locations because of the dense population from the traditional point of view [5,9].However, as we all know, if the time is not for meal or school day, it is unreasonable for restaurants and schools to be classified as sensitive or dense locations.Thus, the schemes based on the static viewpoint are not very practical.
In this paper, aiming at addressing the above challenges, we propose service category table based (SCTB) algorithm to improve clustering efficiency while protecting users' preference privacy.Therefore, under the system of distributed architecture, we rigorously explore the calculation procedure of l-diversity in LBS query, and introduce the service category table into our SCTB algorithm to make SCTB algorithm not only suitable for a single request but also for continuous request.Different from previous solutions based on the straightforward static scenario (e.g., a restaurant is unified as sensitive or dense locations), the scheme in this paper focuses on the process of security and efficient clustering from a dynamic and global perspective.The main contributions of this paper are as follows.
1. To overcome the vulnerabilities of preference privacy in LBS query, we introduce the service category table to represent different kinds of preference privacy so that the process of identifying the kinds of l-diversity can be fast, secure, and computationally inexpensive.
2. Unlike previous works that focus on location privacy and preference privacy independently, we firstly study the relationships between k and l in the clustering process.Then based on the relationships, the concepts of relatively dense scenario and relatively sparse scenario from a dynamic perspective are proposed.Finally, according to different scenarios mentioned above, we give the corresponding algorithm to improve the success rate of clustering from a global perspective.
3. We propose service category table based (SCTB) algorithm which includes request aggregation protocol, variety judgment protocol, filtering aggregation protocol, and representative aggregation protocol to reduce the time cost of clustering process and improve the success rate of clustering groups while protecting preference privacy of users.We also study the circumstance when users' preferences in the clustering process have similarities.
4. We carry out extensive simulations to evaluate the performance of our SCTB algorithm.

Related Work
In this section, we mainly focus on the current situation of LBS privacy protection.There have been many kinds of research focus on how to deal with LBS privacy protection, and we interpret them from three aspects: Technology, architecture, and continuity.

Technologies of LBS Privacy Preservation
From the perspective of technologies that most recent works proposed for protecting privacy in LBS, they can be roughly classified into two main categories: Obfuscation-based approach and cryptographic-based approach.Obfuscation-based approach is mainly used to confuse LBSP.For this purpose, one way is to distort information based on the original and real users themselves [10][11][12], such as adding noise in the contents of users' requests or expanding the area of users' real locations.Another way is to use some methods to confuse LBSP through a combination with other users, such as Mix zone [13], spatial cloaking [3], temporal cloaking [14,15], and Dummy [16,17].Cryptographic-based approach is mainly used to make users' information (include location, preference, etc.) invisible for adversaries when users send requests to LBSP.Recently, Homomorphic Encryption for privacy protection is a research hotspot, which can prevent the disclosure of users' information and achieve a more stringent privacy protection.For example, in paper [18], the POI (i.e., the point of interest) could be calculated by LBSP without actually knowing users' locations.The BV Homomorphic Encryption (BVHE) was used in paper [8] to calculate users' request service vector in aggregation process.However, though a stringent privacy protection can be provide by BVHE, service efficiency cannot be ignored.Thus, how to improve service efficiency while protecting users' privacy (i.e., location privacy, preference privacy) is still an important problem that needs to be addressed.

Architectures of LBS Privacy Preservation
There are two main categories proposed by most recent works for protecting privacy in LBS from the perspective of the system architecture: TTP-based approach [19] and TTP-free approach [20].TTP-based approach takes an anonymizer as a trusted third part (TTP), and the anonymizer usually plays a role to confuse LBSP.The anonymizer often adopts some technologies, such as Cloaking and Dummy, to protect users' information before sending the requests to LBSP [21,22].TTP-free approach does not rely on a trusted third part, users usually adopt some methods, such as P2P [23,24] or encounter scheme [5,25], to achieve k-anonymity, and users' information is often exchanged to confuse LBSP.It is really hard to say which of the two approaches mentioned above is better.TTP-based approach can be applied to almost all scenarios no matter users' distribution.However, the disadvantage is also obvious, that is the third part can be the single point of failure and the bottleneck of the whole system.The amount of private information exposed to a third part can be decreased by adopting TTP-free approach.However, due to the limitation of communication range of users' mobile devices, TTP-free approach may not satisfy the requirement for every mobile user.But with the development of smart mobile and communication technology, the application of TTP-free approach would be more extensive due to the advantages of distributed architecture.

Continuity of LBS Privacy Preservation
Sporadic privacy protection [8] and continuous privacy protection [26,27] are two important research respects.Sporadic privacy protection is often used for the case that users do not frequently use LBS services for a period of time.The schemes of sporadic privacy protection mainly focus on users' single process of privacy protection with LBSP, and apply to the situations when the degree of association between the two requests is low.Such as paper [28], a scheme named precision-based LPPM was proposed to reduce the precision of users' locations sent to LBSP.Continuous privacy protection is represented by trajectory privacy protection, which treats the problem of privacy protection from the view of users' trajectory [29].The schemes of continuous privacy protection usually take a user's locations of a certain period as the object of protection, thus an adversary could use the spatial and temporal correlations on users' locations to infer their private information.

Preliminaries
In this section, we give the basic concepts and meaning of symbols.

BV Homomorphic Encryption
BV homomorphic encryption (BVHE) is based on the Ring Learning With Errors (RLWE) problem [30], and supports both additive homomorphisms and multiplicative homomorphisms.The main algorithms of BVHE include key generation (KEYGEN), encryption (ENC), evaluate (EVAL), and decryption (DEC).The details are as follows.
• KEYGEN: For some given parameters n, prime q, error parameter δ, and f (x) = x n + 1 .n is the power of 2, and q ≡ 1 mod 2n .The Ring R = Z[X]/ f (x) and the Ring R q = Z q [x]/ f (x) can be selected by using n, q and f (x).A discrete Gaussian error distribution χ = D Z n ,δ can be defined by taking δ as the variance, where the integer D > 0 .The message space R t = Z t [x]/ f (x) can also be defined by choosing prime t ∈ Z * q .Under the setting above, a public-private key pair (pk, sk) can be generated by randomly choosing (s i , e i ) from χ and a 1 from R q , where sk i = s i , pk i = (a 0 , a 1 ), and a 0 = −(a 1 s i + te i ).
• ENC: For a plaintext m ∈ R q that needs to be encrypted, the encryption process (i.e., C = E p k(m) ) is as follows: The ciphertext C = (c 0 , c 1 ) can be calculated by choosing samples u ← χ, and f , g ← χ, where c 0 = a 0 u + tg + m and c 1 = a 1 u + t f .

Service Category
In this paper, the service requests of users are classified into different kinds of service category.These different kinds of service category are distinct and have no semantic relationship, which means they are not semantically similar [31].Such as banking, restaurant, bus station, post office, and supermarket, etc.But different from the paper [8,9] and others, we use the service category table to denote the various kinds of service category.The details of the service category table are as follows.
One kind of service category, which is symbolized by L i in this paper, is represented by a binary number of M bits.According to the kinds of service category, the binary number of M bits are divided into different units, which is symbolized by CU.Each CU is composed of B bits, in which the last bit is 1, and the other bits are 0.Under the setting above, the total kinds of service category, which is symbolized by L, can be represented in a service category table, and L also can be computed as L = M/B.B must be divisible by M, which means L must be an integer, and the relation between L and B should meet the condition that 2 B > L. Note that L is usually used to represent a unified threshold of l-diversity (i.e., th l ) in a region.For example, when the total kinds of service category L = 7, it can be represented by a service category table as shown in Figure 2. The CU in the service category table above is 001, which is composed of 3-bit binary numbers.The maximum value of the CU is 111, which means the redundancy of the CU is 7, and it allows 7 of the same kinds of service category to add.Of course, when the total kinds of service category that need to be represented are bigger than 7, which means the unified threshold th l > 7 in a region.We only need to redefine the table structure according to the nature of binary numbers.For example, if L = 10, then the corresponding CU of this service category table is 0001.Obviously, the CU of this service category table could be 00001 or 000001 and so on, but we follow the principle of minimization.

Calculation of Service Categories
Under the setting above, we explain how to figure out the different kinds of service category.Here, we also take the total kinds of service category L = 7 as an example.
When all bits of a CU are 1, it represents the arithmetic element of a L i .The arithmetic element is symbolized by AE i in this paper.Figure 3 shows the corresponding AE 1 of L 1 when L = 7. Assume that a random sum SL i = ∑ k i=1 L i , which is obtained by adding some different L i together when L = 7.As shown in Figure 4.Then, we can know whether there is an ith kind of service category by SC i = SL i AE i .If the bits in the SC i are all 0, it means there is no ith kind of service category in SL i .Otherwise, the count number, which is symbolized by NSC and used for counting the kinds of service category, should be added by 1.Of course, the initial value of NSC is 0. Though the final result of NSC, we can figure out how many different kinds of service category in SL i .From the Figure 4, we can know the final result of NSC is 6, which means there are 6 kinds of service category (i.e., l = 6).Note that, the number of different L i can also be figured out from Figure 4, which is 14.Therefore, in sporadic privacy protection, under the premise that one user sends one kind of service category in a clustering process, we could compare the number of L i with the number of users in a clustering group, then whether there is an occurrence of overflow can be determined (i.e., the number of L i equals the number of users normally).

The Relationship of k and l
As we all know, k-anonymity and l-diversity are classical techniques used to protect privacy in LBS.Many studies and privacy approaches were proposed based on k-anonymity and l-diversity in sporadic privacy protection.The combination of k-anonymity and l-diversity is to ensure that k users request at least l services in a clustering process of LBS query, which is shown as Figure 5.But the relationships between k and l are needed to be rigorously explored.
Theorem 1.In a clustering process, the number of diversity is not more than the number of anonymity: l ≤ k.
Proof of Theorem 1.In sporadic privacy protection, with the premise that one user sends one kind of service category in a clustering process.As shown in Figure 5.One user corresponds to one kind of service category, and some users may have the same kind of service category.Therefore, the number of diversity is not more than the number of anonymity (i.e., the number of users) in a clustering group.Theorem 2. In a clustering process, k-anonymity is easier to be satisfied than l-diversity.
Proof of Theorem 2. In sporadic privacy protection, with the premise that one user sends one kind of service category in a clustering process.Under the system architecture of TTP-free, assume that there already has a clustering group which does not yet achieve the conditions of clustering success (i.e., k < th k and l < th l ).The condition th k is easier to be satisfied than the condition th l , the reason is that finding a new user is easier than finding a new user who must be with a different kind of service category (i.e., different l) from that clustering group.Theorem 3.Under the same condition th k , bigger th l can provide better protection in LBS query.
Proof of Theorem 3. By comparing Figures 5 and 6, we can see that the condition th k is both 5, the condition th l of Figure 5 is 3, and the condition th l of Figure 6 is 4. Assume that the kinds of service category l a , l b , l c are the same in Figures 5 and 6, if an attacker owns some background knowledge, and he can ensure Alice absolutely not request l a and l b .Then the attacker may deduce which is Alice from Figure 5 by using his background knowledge.But under the same setting, he cannot ensure which is Alice from Figure 6.

Motivation
As we all know, according to the characteristic of homomorphic encryption, if there are too many multiplications on ciphertext, the computational complexity will be high and the accuracy of the original message will be affected by the noise generated during multiplications process.Different from paper [8], which adopted multiplicative homomorphism to calculate the kinds of service category in the clustering process.In this paper, we adopt additive homomorphism to calculate the kinds of service category, the noise as well as the computational complexity can be reduced.
In real applications, under the condition of specific time and space, users usually have common characteristics about interests or demands, so the kinds of service category that users request for LBS may be similar or even identical.For example, suppose that the site is an office building and the period is just off work time, the kinds of service category that users request are more likely as a restaurant or a supermarket rather than a post office or a bank (i.e., these departments or places are no longer within normal working hours).In addition, there is almost not only one clustering group in a region under normal circumstances.From a global perspective, the clustering process should be selective to improve the whole success rate of clustering in this region.The followings are two different scenarios that often appear.
Scenario 1: As shown in Figure 7, assume that there are many clustering groups in region LA 0 , the th k is 6 and the th l is 4. If the clustering group CP 1 already contains 6 users and the kinds of service category are 3, which means k ≥ th k but l < th l .When the new user U a wants to participate in the clustering process of CP 1 , U a 's kind of service category is the same as one of those kinds of service category in CP 1 .We can know that the participating of U a cannot help CP 1 to complete clustering, on the contrary, it just increases the load of CP 1 and influences the success rate of other clustering groups in region LA 0 .This scenario is what we define as a relatively dense scenario, which is more likely happen in a region with a dense population of specific time and space, rather than in an area with dense population inevitably.Scenario 2: As shown in Figure 8. Assume that there are two clustering groups in region LA 1 , the th k is 4 and the th l is 3.We also assume that the user U a is the initiator of the clustering group CP 1 , which contains 3 users, and the kinds of service category are 2.The user U d is the initiator of the clustering group CP 2 , which contains 2 users, and the kinds of service category are 2.If there are no other users who want to join in the two clustering groups above (i.e., CP 1 and CP 2 ) within a certain period of time (i.e., the waiting time before the termination time), both the two clustering process cannot complete clustering, and only wait for the termination time of the two clustering groups.This scenario is what we define as a relatively sparse scenario, which is more likely happen in an area with a sparse population of specific time and space, rather than in an area with sparse population inevitably.

System Model
The system model of this paper is mainly composed of three parts: Service Category Server, Mobile Users, and Location-Based Service Provider.The framework is shown as Figure 9.The system architecture we adopt in this paper is TTP-free, which means there is no trusted anonymizer as an intermediary in our system model, and users can communicate with each other within a distance of communication.The details of our model are as follows.

Service Category Server (SCS)
It has four missions in this paper.Firstly, SCS should compute and prepare pseudo-IDs PID i = {PID 1 , PID 2 , ...} for users registration, the details can be referred to the paper [8].Secondly, SCS generates the parameters (n, q, δ, R, R q ) of BVHE, which enable users to generate a public-private key pair (pk, sk).Thirdly, according to the different th l in different regions, SCS generates the corresponding service category table.Fourthly, SCS regularly updates the corresponding relationship between the kind of service category and binary number in the table, and users' preference privacy can be well protected in LBS query by this way.Note that how to make reasonable value th l in different regions is out of our discussion, the reasonable value th l in a region may be obtained from the analysis of big data or other approaches, but in this paper, we should make sure that there are only one pair thresholds (i.e., th k , th l ) in one region.

Mobile Users
If a user wants to obtain service from the LBSP in a region, he must register himself to the SCS, and there are also three missions for the user.Firstly, he gets his pseudo-ID PID i from SCS, and the pseudo-ID PID i is unique and different from other users.Secondly, according to the BVHE parameters (n, q, δ, R, R q ), the user generates his own public-private key pair (pk i , sk i ) as explained in Section 3.1.Thirdly, the user should ask for the corresponding kind of service category according to his preference from SCS at this moment.Note that, in this paper, there is not an anonymizer as a trusted intermediary who passes information between users and LBSP, and no dummy in the clustering process.Therefore, if a user wants to obtain service from the LBSP, he should firstly aggregate other users' requests to meet the conditions of k-anonymity and l-diversity.Then these requests would be sent together to the LBSP by their representative.In the clustering process of these users, when the clustering group finally achieves the condition of success (i.e., k ≥ th k and l ≥ th l ), the representative of the clustering group firstly will inform these users to prepare their request packets.Then he will repackage their packets as an aggregated package.Finally, he will send the aggregated package to the LBSP of this region.

Location Based Service Provider (LBSP)
Without loss of generality, the main role of LBSP is to process users' request packets and return a list of results to users.Note that there are two main ways to return the results back to these users.One way is to return the results to the representative user, then he distributes the results to his member users.The other way is to return the results to his member users directly by their pseudo-IDs, which is adopted in this paper to reduce the load of the representative user.
In our system model, all roles must follow the relevant regulations and agreements in the LBS system, which means they are all honest − but − curious.For LBSP, it is curious and wants to deduce users' locations and preferences (i.e., their kinds of service category).For a user, he also has curiosity and hopes to deduce other users' request contents including their locations and preferences.But the collusion among users and the collusion between users and LBSP are out of the scope of discussion in this paper.

Algorithm Design
In this section, we firstly describe the scheme of SCTB.Then SCTB proposed in this paper is explained, which includes request aggregation protocol, variety judgment protocol, filtering aggregation protocol, and representative aggregation protocol.Lastly, we make a summary of our service category table based (SCTB) algorithm.

The Algorithm Scheme
For the security parameters th k and th l in a region, the service category server bootstraps the LBS system and initializes it.If a user wants to obtain service from the LBS system, he must register himself to the LBS system.Before sending a request to the LBSP, he should firstly unite with other k − 1 users to become a clustering group by using the request aggregation protocol.Then the kinds of service category l in the clustering group can be calculated by the variety judgment protocol.The representative user of the clustering group firstly should judge the relationship between l and th l .If l ≥ th l , the representative user will judge the relationship between k and th k .Then there will be two results: If k ≥ th k , which means the clustering group has met the conditions of clustering success, the clustering process is ended.If k < th k , which means the clustering group has not met the condition of clustering success, the clustering group will wait for new users to join to meet the condition (i.e., k ≥ th k ) until timeout.If l < th l , which means the clustering group has not met the condition of clustering success, the representative user will choose filtering aggregation protocol or representative aggregation protocol by judging the relationship between k and th k for improving the success rate of clustering.Note that, if a clustering group does not meet the conditions (i.e., k ≥ th k and l ≥ th l ) and runs into a timeout, the clustering process of this group is a failure, and no service request will be sent to LBSP.The scheme of our SCTB algorithm is shown as Figure 10.

Request Aggregation Protocol
Generally speaking, assume that a user U 1 is not in a clustering group of other users and wants to launch a request to the LBSP in his region.U 1 should unite with other k − 1 users to form a clustering group.Algorithm 1 describes the pseudo code of the request aggregation protocol.The detailed explanation is as follows.
User U 1 firstly broadcasts the request aggregating message and sets the termination time T, and T starts counting down.If there are some users who want to join in the clustering process of U 1 , these users should return their pseudo-IDs to U 1 .Then U 1 will count the number k of users who have responded.If the number (include U 1 himself) k ≥ 2, U 1 takes himself as the representative of the clustering group, which is symbolized by PID R 1 , he names the clustering group as PID CP1 1 .When the number of sums SL i is obtained by one clustering group (i.e., this sum is obtained only from PID CP1 1 ), the number of groups NCG PID R 1 = 1.After the setting above, PID R 1 firstly uses his public key pk 1 to encrypt his kind of service category L 1 i .Then PID R 1 gives his pk 1 and pk 1 (L 1 i ) to members of this clustering group.Finally, the members use pk 1 to encrypt their kinds of service category and return the result pk 1 1 names this result of the clustering group as pk 1 (SL i ) CP1 , which is the encrypted sum of kinds of service category in PID CP1 1 .PID R 1 firstly uses his private key sk 1 to decrypt pk 1 (SL i ) and gets SL i .Then PID R 1 uses variety judgment protocol to calculate the kinds of service category l in PID CP1  1 .Note that here the kinds of service category l ≥ 2. Otherwise, U 1 would know other users' kinds of service category in the current clustering group.There will be four cases as follows, and here assume that T > 0. Case 1: k ≥ th k and l ≥ th l .U 1 firstly notifies the members in PID CP1 1 to prepare their inputs based on their kinds of service category.Then he repackages their packets and gets an aggregated package, which is symbolized by Ag.As the respective, U 1 sends Ag to the LBSP.
Case 2: k < th k and l ≥ th l .As explained in Section 3.4, k-anonymity is easier to be satisfied than l-diversity.So waiting for new users to join is a better choice for PID CP1  1 .Under this situation, U 1 does not need to judge new users' kinds of service category, and just wait for new users' joining to meet the condition th k .
Case 3: k ≥ th k and l < th l .We know this case is the same as scenario 1, the filtering aggregation protocol will be carried out.The reasons why we choose it will be explained when we describe it.
Case 4: k < th k and l < th l .We know this case is the same as scenario 2, the representative aggregation protocol will be carried out.The reasons why we choose it will be also explained when we describe it.

Input: Initiator, Service category table, Users
Output: Representative, l, k 1: Initiator U 1 broadcasts the request aggregating message, and sets the termination time T; 2: if T > 0 then 3: {PID 1 ,PID 2 ,...,PID k } → U 1 ; 4: U 1 gets the number of users k; The representative = PID R 1 , the group name = PID CP1 1 , NCG PID R 1 = 1; 7: The clustering process is a failure; 9: end if 10: 11: PID R 1 names pk 1 (SL i ) as pk 1 (SL i ) CP1 , and uses sk 1 to get SL i ; 12: PID R 1 uses variety judgment protocol, and gets l; The clustering process is a success, the remaining work will be performed; Filtering aggregation protocol is adopted; Representative aggregation protocol is adopted; The clustering process is a failure; 28: end if 29: end if

Variety Judgment Protocol
The purpose of this variety judgment protocol is to figure out the kinds of service category l in the clustering process.When there is only one sum SL i in a clustering group, we know the NCG PID R 1 = 1 just as the illustration in request aggregation protocol.After PID R 1 gets SL i from PID CP1 1 , he would use the way as explained in Section 3.3 to figure out the result of NSC, which represents the kinds of service category l in the clustering process of PID CP1 1 at this moment.When there is not only one sum SL i in this clustering group, such as two sums, then the representative sets NCG PID R 1 = 2, and this case will be explained in representative aggregation protocol.After PID R 1 gets the combined sum SL i from two clustering groups, he also uses the way as explained in Section 3.3 to figure out the result of NSC.Algorithm 2 describes the pseudo code of the variety judgment protocol.

Algorithm 2 Variety judgment
PID R 1 figures out the result of NSC from combined sum SL i ; 6: end if

Filtering Aggregation Protocol
According to the given security parameters th k and th l in a region, when a clustering group has already met the condition th k but not met th l .If the success of this clustering group (i.e., meet the conditions th k and th l ) needs to be promoted, the clustering group should make the new users who want to join in the clustering group have different kinds of service category rather than those kinds of service category that the clustering group already has.Therefore, the purpose of our filtering aggregation protocol is to choose new users selectively.There are two reasons for adopting this method: Firstly, it is no use for increasing the success probability of the clustering group when the new users have the same kinds of service category as the clustering group already has, on the contrary, it just increases the load of the clustering group.For example, suppose that the th k = 8 and th l = 5 in a region, and a clustering group now under the conditions that k = 8 and l = 3. Assume that the clustering group meets the conditions th k and th l finally, k = 15 and l = 5 may be the final result of the clustering group, which obviously generates more traffic and computation load than the result k = 10 and l = 5 by using our filtering aggregation protocol.And it is more effective especially when users' kinds of service category in LBS query are similar or even identical under a certain time and space, which will be proved in the Section 6.Secondly, it will influence the global success rate of clustering groups in this region.In the condition of the same th k and th l , if a clustering group contains too many users, obviously, other clustering groups cannot cluster successfully due to lack of users.
From the variety judgment protocol, we can know the number of l and k in PID CP1 1 .Assume that the case is k ≥ th k and l < th l , when a new user PID new wants to join in PID CP1  1 .The representative PID R 1 firstly gives his pk 1 and pk 1 (SL i ) to PID new .Then PID new uses pk 1 to encrypt his kind of service category L new i and obtains the intermediate result pk inter 1 1 , and PID R 1 recalculates the current value of NSC, which is symbolized by NSC inter .If NSC inter > NSC, we know the new user's kind of service category is different from the kinds of service category that PID CP1 1 already has, and it is helpful to promote the success of PID CP1  1 .Therefore, the representative PID R 1 allows the new user PID new to join.Otherwise, PID R 1 refuses PID new .Finally, if the current l ≥ th l , the clustering process is a success, or else PID CP1 1 waits for more new users to join.Algorithm 3 describes the pseudo code of the filtering aggregation protocol.
Security analysis: It will be explained from two sides in the clustering process.

1.
From the perspective of the single newly-joining user who wants to join in PID CP1 1 : Because PID R 1 sends the encrypted result pk 1 (SL i ) to the single newly-joining user PID new , PID new cannot infer the existing kinds of service category contained in PID R 1 unless he can decrypt pk 1 (SL i ).

2.
From the perspective of the clustering group: If the user PID new submits the kind of service category which is different from those kinds of service category that PID CP1 1 already has (i.e., NSC inter > NSC), the representative U 1 cannot know the exact kind of service category because of the different L i .
If the user PID new submits the kind of service category which is the same as one of these kinds of service category that the PID CP1 1 already has (i.e., NSC inter = NSC), there will be two cases as follows.
(a) If the kind of service category of user PID new is same as the representative U 1 , U 1 will know the kind of service category of user PID new because of the same L i ; (b) If the kind of service category of user PID new is not the same as the representative U 1 , which means it is the same as one of other users in PID CP1 1 , the representative U 1 cannot know the exact kind of service category of user PID new because of the different L i .
But due to the regulation of the filtering aggregation protocol, if NSC inter = NSC, the representative U 1 has to refuse PID new .Therefore, he cannot do further exploration for user PID new .Because the corresponding relationship between the kind of service category and binary number is not unchangeable, although the representative U 1 has deduced the meaning of L i of PID new in this clustering process, he cannot make sure that the meaning of L i is the same as previous or subsequent clustering process.

Algorithm 3 Filtering aggregation
Input: New users, pk 1 (SL i ) Output: New l 1: while T > 0 and l < th l do 2: PID R 1 continue broadcasting the request aggregating message, the new user PID new wants to join; 3: The join of PID new is agreed; 6: The join of PID new is refused; 8: end if 9: end while

Representative Aggregation Protocol
According to the given security parameters th k and th l in a region, both th k and th l are not met in an original clustering group.If the success of this clustering group needs to be promoted, the method is to combine with other clustering groups that have the same conditions (i.e., k < th k , l < th l ).The reason is that the cooperation among clustering groups could have a higher success probability to promote the success of the original clustering group than just waiting for new users to join.Besides, if other clustering groups nearby are found finally, the time cost of clustering will also be reduced.When the users' kinds of service category are similar or even identical in a certain time and space, for the original clustering group, newly-joining users are more likely to contribute to the value of k rather than the value of l.Note that our method is not to prevent new users from joining, but to find other clustering groups as the first choice.
From the variety judgment protocol, we can know the number of l and k in PID CP1 1 .Assume that the case is k < th k and l < th l .PID R 1 begins to broadcast the representative aggregating message, but different from request aggregating message, the representative aggregating message can only be responded by representatives.Suppose that there is a clustering group PID CP2 b with the same conditions (i.e., k < th k , l < th l ) as PID CP1  1 , and its representative PID R b responds to PID R 1 .Firstly, PID R 1 will send his pk 1 to PID R b , then PID R b sends his current set of pseudo-IDs and encrypted sum pk 1 (SL i ) back to PID R 1 .Note that this SL i is from decrypted pk b (SL i ) CP2 .Then, PID R 1 will set NCG PID R 1 = 2 and recalculate the value of NSC.Finally, the current NSC is figured out by the combined sum which is obtained by adding SL i of PID CP1 1 and SL i of PID CP2 b .According to the relationship of current k and l between th k and th l , PID R 1 chooses the different protocol mentioned above to finish the subsequent process.Algorithm 4 describes the pseudo code of the representative aggregation protocol.

Algorithm 4 Representative aggregation
Input: New clustering groups, pk 1 (SL i ) Output: New l, New k 1: while T > 0 , k < th k and l < th l do 2: PID R 1 broadcasts the representative aggregating message, then the PID R b responds; 3: The pseudo-IDs set of PID R b and pk 1 (SL i ) → PID R 1 ; 5: The clustering process is a success, the remaining work will be performed; if k ≥ th k and l < th l then 13: Filtering aggregation protocol is adopted; if k < th k and l < th l then 16: Representative aggregation protocol is adopted; 17: end if 18: end while

Service Category Table Based Algorithm
Algorithm 5 summarizes the strategies of the service category table based (SCTB) algorithm in this paper.If the user U i wants to send a request to the LBSP.Firstly, he should aggregate other k − 1 users by using the request aggregation protocol.If there are some users who agree to form a clustering group together with the user U i , he will become the representative.Secondly, the representative uses variety judgment protocol to figure out the kinds of service category in the clustering group.Finally, from the comparison between (k,l) and (th k ,th l ), U i chooses different strategies according to the following four kinds of situations: (i) k ≥ th k and l ≥ th l , the clustering process is a success; (ii) k < th k and l ≥ th l , waiting for new users to join; (iii) k ≥ th k and l < th l ; using filtering aggregation protocol for the subsequent processing; (iv) k < th k and l < th l , using representative aggregation protocol for the subsequent processing.The clustering process is a a success, the remaining work will be performed; Filtering aggregation protocol is adopted; Representative aggregation protocol is adopted; 16: end if 17: end while

Evaluation
In this section, the efficiency and effectiveness of our SCTB algorithm are experimentally evaluated.Firstly, we analyze the computational cost of the PLAM [8] and our SCTB on figuring out the kinds of service category in the clustering process.Then we describe the simulation environment and give the simulation results.

Performance Analysis
Assume that there is a clustering process which contains k c users and a data-pool which contains l c different kinds of service category, k c and l c are both random constant.Because PLAM, LLB, and our SCTB adopt the system architecture of TTP-free, and the kinds of service category in a clustering group are both figured out by the representative of the clustering group, the two algorithms (i.e., PLAM and LLB algorithm) are used for comparison.We assume that all conditions are the same and meet the premise that one user sends one kind of service category in a clustering process.For example, user Bob only sends one request about finding a restaurant in one clustering process, but he cannot send the request that contains both finding a restaurant and a hospital at the same time.We now give the computational cost of our SCTB and PLAM about the process of identifying and calculating l-diversity.Claim 1.In a clustering process, ignoring the computational complexity of decryption, the computational complexity of SCTB on figuring out l-diversity is O(k c E) + O(l c ).
Proof of Claim 1. Firstly, the k c users should add together to form pk 1 (SL i ), and pk 1 is the public key of the representative.We can know the computational complexity is O(k c E) [32].Then we decrypt pk 1 (SL i ) and get SL i , the NSC can be figured out by SC i = SL i AE i .Because the number of AE i is equal to l c , and l c is the total kinds of service category that can be represented.Therefore, the computational complexity is O(l c ). Ignoring the computational complexity of decryption, the computational complexity of this process is O(k c E) + O(l c ).
Claim 2. In a clustering process, ignoring the computational complexity of decryption, the computational complexity of PLAM on figuring out l-diversity is k c O(l c ) + l c O(k c E k c ).
Proof of Claim 2. Under the same settings as above.PLAM adopts a set a i = (a i1 , a i2 , ..., a in ) to denote different kinds of service category that available to a user i , and the set also means the total kinds of service category that can be represented is n, where the number n is equivalent to l c .Firstly, the representative should calculate pk( āi ) = (pk(1 − a i1 ), pk(1 − a i2 ), ..., pk(1 − a in )), and we know the computational complexity is k c O(l c ).Then users cooperatively exchange pk( āi ) and calculate pk(1 − a in )), and we know the computational complexity of Ignoring the computational complexity of decryption, the computational complexity of this process is According to above-mentioned analysis, we can know that the computational complexity of our SCTB is smaller than the PLAM.Therefore, it is helpful to finish the clustering process as well as reduce the communication overhead among users.

Performance Metrics
In order to evaluate the performance and effectiveness of the proposed algorithm, some metrics of this paper are explained in this subsection.
Clustering e f f iciency is a comprehensive evaluation for clustering groups in a region, and it is mainly composed of response time and success rate in this paper.
Response time is used to measure the time taken for a clustering group from initiating clustering to meet the conditions of k-anonymity and l-diversity (i.e., th k and th l ) in a region.The smaller the response time, the more efficiency of a clustering group.It also means the faster a clustering group can send the aggregated package to LBSP.
Success rate is used to measure the ratio of the number of clustering groups that meet the conditions th k and th l to the number of all clustering groups in a region [3].The higher the success rate, the more efficiency of cluster groups.It also means more users' requests can be sent to LBSP.

Simulation and Results
The essence of the SCTB algorithm is to improve clustering efficiency while protecting privacy.Therefore, the extensive simulations are conducted for evaluating the performance of our algorithm.In this section, we describe the details of our simulation environment, then give the simulation results and analysis.

Simulation Setup
We use Matlab R2018a to conduct our simulations, and run all algorithms on a local machine with an Intel Core-i5 2.5 GHz, 6 GB RAM, and Microsoft Windows 7 OS.Assume that there is a region A of size 10 km × 10 km with 100 × 100 locations.We construct a full-mesh network consisting of 100 × 100 nodes to simulate the locations in region A. To evaluate the effectiveness of our SCTB algorithm, we introduce the concept of service category similarity, which is symbolized by L S in this paper.Suppose that under normal circumstances, the kind of service category of user A is not related to the kind of service category of user B, then the service category similarity between user A and user B is L S = 0. Assume that under certain space-time conditions, the probability of user A's kind of service category is half the same as user B's, then we denote the service category similarity between user A and user B is L S = 0.5.We perform the simulation in the following four different scenarios and average performance results were obtained for running each scenario for 30 min.
Scenario-1: There are 50 × 100 users with transmission radius of tr = 50 m randomly distributed in region A. Assume that the total kinds of service category in the data-pool are 16.All users should join in the clustering process, if a user U a receives a request aggregating massage when he wants a LBS query, he will response for it and join in that clustering group.Otherwise, U a will broadcast the request aggregating message as an initiator, and wait for other users to join.A certain number of users who send requests at a time interval.The ratio of the certain number of users to the total number of users is 1/10, and the time interval is 0.1s.The rate of th l to th k is th l = th k /2.We set the service category similarity L S = 0, which means the probability of service category similarity among users is 0%.
Scenario-2: We set the service category similarity L S = 0.5, which means the probability of service category similarity among users is 50%.The rest of the conditions are the same as Scenario-1.
Scenario-3: We set the service category similarity L S = 0.75, which means the probability of service category similarity among users is 75%.The rest of the conditions are the same as Scenario-1.
Scenario-4: Users move independently with the same velocity v = 1m/s, and a representative U a sends clustering requests continuously in the 60s under different L S .The rest of the conditions are the same as Scenario-1.

Simulation Results
We compare the other two algorithms (i.e., PLAM algorithm and LLB algorithm) with our SCTB algorithm.Figure 11a shows the simulation results of response time under Scenario-1.It can be seen that with the number of users in a clustering group increases, the response time increases no matter which kind of algorithm is used.It is because more user leads more time to be spent.It can also be known that the response time of our SCTB is less than PLAM and LLB.That is because the computational complexity of our SCTB is smaller than PLAM and LLB.The reason why the response time of LLB is less than PLAM is the combination of some clustering groups when some certain condition is satisfied.And the condition is as follows: When the number of users in clustering group A is less than th k /2, the clustering group A can join in another clustering group B if and only if the clustering group B is within the communication range of A and the number of users in B is bigger than th k /2 [9].
Figure 11b presents the relationship between the number of users and the success rate under Scenario-1.It can be seen that the success rate of our SCTB algorithm is higher than PLAM and LLB.That is because our SCTB algorithm leads the clustering process to choose new users selectively, and makes the clustering groups more reasonable by using the filtering aggregation protocol as analyzed in Section 5.4.Besides, the representative aggregation protocol promotes the success of clustering groups as well.The reason why the success rate of LLB is higher than PLAM is also the combination of some clustering groups.From Figure 11b, it also can be known that no matter which kind of algorithm mentioned above is used in the clustering process, the success rate decreases as the number of users in a clustering group increases.The reason is that with the number of users in a clustering group increases (i.e., th k ), the clustering group has to spend more time to find users to meet the condition th k , which leads to the decrease of the success rate.a trend similar to Figure 11a, that is, the response time increases with the number of users in a clustering group increases.The response time of PLAM algorithm and LLB algorithm have almost the same performance in Scenario-1 and Scenario-2, which is influenced by the set ratio th l = th k /2.However, in Figure 13a, with the number of users in a clustering group increases, the response time of PLAM and LLB increases faster than Figures 11a and 12a.The reason is, for a clustering group, the higher service category similarity of users leads a clustering group has to find more users to meet the condition th l , which will lead more time to be spent.
Figures 12b and 13b present the success rate under Scenario-2 and Scenario-3 respectively.It can be seen that Figures 12b and 13b show a trend similar to Figure 11b, that is, the success rate decreases as the number of users in a clustering group increases.The success rate of PLAM algorithm and LLB algorithm nearly has the same performance in Scenario-1 and Scenario-2 due to the ratio th l = th k /2.However, as can be seen from Figure 13b, with the higher service category similarity of users, our SCTB still has a better performance than PLAM and LLB.This is because our SCTB chooses new users selectively during the clustering process, and makes the clustering groups more reasonable in a region.Figure 14 shows the performance of the SCTB under Scenario-1, Scenario-2, and Scenario-3. Figure 14a shows the response time of SCTB under different scenarios.It can be seen that the response time increases with the number of users in a clustering group increases.In the case of the same th k , the higher the service category similarity, the longer the response time.Figure 14b presents the success rate of SCTB algorithm under different scenarios.It can be seen that the success rate decreases with the number of users in a clustering group increases.In the case of the same th k , the higher the service category similarity, the lower the success rate.Figure 15 shows the number of clustering times when th k = 10 under Scenario-4.It can be seen that the representative U a could complete more number of clustering times by using our SCTB than PLAM and LLB.The reason is that the computational complexity of our SCTB is smaller than PLAM and LLB when U a calculates the kinds of service category (i.e., l) in a clustering group.Therefore, it can allow U a to spend less time to complete more clustering times.Figure 15 also shows that with the increase of service category similarity, no matter which algorithm mentioned above is used, the number of clustering times decreases.This is because the higher service category similarity means more users are needed to meet the condition th l .Therefore, more time will be spent.However, we can see that our SCTB algorithm is still has a better performance than other two algorithms, so it is also useful in the scenario of a continuous request.

Limitations of Current Work
There are some limitations of our current work.Firstly, limitations of our simulation.We assume that the mobile devices of users have sufficient computing resources to complete calculations and communications, and the factors such as communication delays among devices are not considered as well.Secondly, limitations of our SCTB algorithm.In our paper, we assume that the kinds of service category are distinct and semantically dissimilar, which means they have no semantic relationship.Therefore, how to apply our approach to semantically similar scenarios more practically is still a challenge.However, our simulations prove that our algorithm is useful for the clustering process.Besides, the results match to theoretical development which was presented between Sections 3 and 6.Testing those algorithms (i.e., PLAM, LLB, and our SCTB) in a real environment is still a challenge.

Figure 1 .
Figure 1.Privacy preservation by k-anonymity and l-diversity.

Figure 4 .
Figure 4.A random sum of L i when L = 7.

Figure 10 .
Figure 10.The scheme of service category table based algorithm (SCTB).

13 : if l ≥ 2 then 14 :
if k ≥ th k and l ≥ th l then15:

Figures 12 and 13
Figures 12 and 13  show the simulation results under Scenario-2 and Scenario-3 respectively.Figures12a and 13apresent the response time under Scenario-2 and Scenario-3 respectively.It can be seen that Figures 12a and 13a reflect a trend similar to Figure11a, that is, the response time increases with the number of users in a clustering group increases.The response time of PLAM algorithm and LLB algorithm have almost the same performance in Scenario-1 and Scenario-2, which is influenced by the set ratio th l = th k /2.However, in Figure13a, with the number of users in a clustering group increases, the response time of PLAM and LLB increases faster than Figures11a and 12a.The reason is, for a clustering group, the higher service category similarity of users leads a clustering group has to find more users to meet the condition th l , which will lead more time to be spent.Figures12b and 13bpresent the success rate under Scenario-2 and Scenario-3 respectively.It can be seen that Figures12b and 13bshow a trend similar to Figure11b, that is, the success rate decreases as the number of users in a clustering group increases.The success rate of PLAM algorithm and LLB algorithm nearly has the same performance in Scenario-1 and Scenario-2 due to the ratio th l = th k /2.However, as can be seen from Figure13b, with the higher service category similarity of users, our SCTB still has a better performance than PLAM and LLB.This is because our SCTB chooses new users selectively during the clustering process, and makes the clustering groups more reasonable in a region.
Figures 12 and 13  show the simulation results under Scenario-2 and Scenario-3 respectively.Figures12a and 13apresent the response time under Scenario-2 and Scenario-3 respectively.It can be seen that Figures 12a and 13a reflect a trend similar to Figure11a, that is, the response time increases with the number of users in a clustering group increases.The response time of PLAM algorithm and LLB algorithm have almost the same performance in Scenario-1 and Scenario-2, which is influenced by the set ratio th l = th k /2.However, in Figure13a, with the number of users in a clustering group increases, the response time of PLAM and LLB increases faster than Figures11a and 12a.The reason is, for a clustering group, the higher service category similarity of users leads a clustering group has to find more users to meet the condition th l , which will lead more time to be spent.Figures12b and 13bpresent the success rate under Scenario-2 and Scenario-3 respectively.It can be seen that Figures12b and 13bshow a trend similar to Figure11b, that is, the success rate decreases as the number of users in a clustering group increases.The success rate of PLAM algorithm and LLB algorithm nearly has the same performance in Scenario-1 and Scenario-2 due to the ratio th l = th k /2.However, as can be seen from Figure13b, with the higher service category similarity of users, our SCTB still has a better performance than PLAM and LLB.This is because our SCTB chooses new users selectively during the clustering process, and makes the clustering groups more reasonable in a region.

Figure 14 .
Figure 14.The simulation results of SCTB algorithm under different scenarios.
and PID R 1 gets current k and l from combined sum SL i ; 6:if k ≥ th k and l ≥ th l then 7: < th k and l ≥ th l then

Table Based (
SCTB) algorithm 1: Broadcast the aggregating message; 2: Aggregate uers' requests by using request aggregation protocol; 3: Figure out the current k and l by using variety judgment protocol; 4: if k ≥ th k and l ≥ th l then < th k and l ≥ th l then ≥ th k and l < th l then < th k and l < th l then