Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity

: With the rapid development of mobile positioning technologies, location-based services (LBSs) have become more widely used. The amount of user location information collected and applied has increased, and if these datasets are directly released, attackers may infer other unknown locations through partial background knowledge in their possession. To solve this problem, a privacy-preserving method for trajectory data publication based on local preferential anonymity (LPA) is proposed. First, the method considers suppression, splitting, and dummy trajectory adding as candidate techniques. Second, a local preferential (LP) function based on the analysis of location loss and anonymity gain is designed to effectively select an anonymity technique for each anonymous operation. Theoretical analysis and experimental results show that the proposed method can effectively protect the privacy of trajectory data and improve the utility of anonymous datasets.


Introduction
With the rapid development of modern device awareness systems and mobile positioning technologies, such as global positioning systems (GPSs) and radio frequency identification devices (RFID), access to location-based services (LBSs) is becoming increasingly used [1,2].While enjoying convenient services, users leave their location records, which constitute the trajectory data.Many trajectories are released and used for various applications, such as urban planning, advertisement placement, and shop bidding [3][4][5].However, the direct publication and application of trajectory data may lead to the leakage of users' information [6,7].Attackers can infer other unknown locations or identify users' full trajectory based on some of the background knowledge they own [8].
For example, an electronic card company provides payment services for two chain stores A and B and stores user transaction records.Chain stores A and B store transaction records of users consuming at their stores, and the transaction records generated in physical locations may contain some important spatiotemporal information (e.g., the user in a specific location at a certain time).After preprocessing, the transaction records in some chain stores can be presented as the type of trajectory in Table 1.As shown in Table 1, assume that Alice generates the record a 5 → b 4 → a 1 , which represents that she consumes at stores a 1 and a 5 of A and store b 4 of B, respectively.Considering a publication of the transaction records by the company for analysis purposes, A can be regarded as an attacker who owns and tracks all the locations a i ; he owns the partial record a 5 → a 1 of Alice and can infer that Alice has visited b 4 in addition to a 5 and a 1 .This is because only the record t 1 goes through a 5 and then through a 1 , so it can be determined that the complete record of Alice is t 1 .In this case, the user's consumption preferences and other information may be leaked, resulting in a breach of user privacy.Table 1.An example of trajectory set T.

ID Trajectory
To achieve the safe publication of trajectory data in the above scenarios, existing works usually anonymize the trajectory dataset before publication [8][9][10][11].Specifically, the data holder processes the trajectory dataset to be published when the background knowledge mastered by attackers is known to prevent the attackers from inferring user information.Under the premise of meeting users' privacy requirements, such methods continuously process trajectory sequences to reduce the probability of unknown location information being inferred.This data processing can be divided into two steps.The first step is to identify the privacy threats; in other words, all the problematic sequences in the dataset should be found.The second step is to eliminate those privacy threats by processing some trajectory sequences.The algorithms [8] based on suppression and splitting can effectively achieve privacy preservation, but the time consumption for identifying privacy threats is relatively large, and the utility of the anonymous dataset after suppression is not very great.Two new tree-based structure [9,10] were proposed to optimize time performance.However, similar to the above methods, these methods are also implemented based on suppression.Due to the implementation characteristics of suppression, a large amount of information loss also often occurs.
The above methods can effectively achieve safe publication of trajectory data while meeting users' privacy requirements.However, the methods are implemented based on suppression to eliminate privacy threats, which may lead to large information loss [9][10][11].Specifically, the global suppression and local suppression algorithms are implemented to calculate the anonymity gain value and then delete some specific locations directly.Therefore, anonymous datasets usually lose many locations, and the number of lost locations gradually increases with an increasing dataset size.This characteristic may result in the excessive loss of information in the anonymous dataset, which is not conducive to analysis and application of the data after publication.
To improve the above problem, in other words, to realize the safe publication of trajectory data and improve the utility of trajectory data after publication, we propose a privacy-preserving method for trajectory data publication based on local preferential anonymity (LPA).We adopt a tree-based storage structure [10] and a simplified calculation method to extract problematic nodes; consider suppression, splitting, and dummy trajectory adding as candidate techniques; and select a specific technique based on the analysis of location loss and anonymity gain.
The main contributions of this paper are as follows: (1) We propose a privacy-preserving method based on LPA, which implements suppression, splitting, and dummy trajectory adding based on the trajectory projection tree.(2) We design a local preferential (LP) function based on the analysis of location loss and anonymity gain to select the final technique.Selecting an appropriate technique for problematic nodes can effectively achieve privacy preservation and reduce information loss in the process of privacy preservation.(3) We conduct theoretical analysis and a set of experiments to show the feasibility of privacy preservation.Experimental results show that compared with existing methods, the LPA algorithm can effectively achieve privacy preservation while reducing the information loss of anonymous datasets.

Related Work
When trajectory datasets are published, user privacy has increasingly attracted attention from all walks of life.Researchers have done much work on privacy preservation of trajectory data and have proposed many privacy-preserving methods [1,7].In this section, we investigate the existing work in this field.The methods for anonymizing locations can be divided into the following categories.

Dummy Trajectory
The method based on dummy trajectory adds synthetic dummy trajectories according to the characteristics of the original trajectory to reduce the probability that the real trajectory is inferred.Luper et al. [12] applied the dummy trajectory technique to trajectory data publication.Effectively generating dummy trajectories is a main problem.On the one hand, the dummy trajectory should be similar to the motion pattern of the real trajectory.The trajectory synthesis method proposed by Lei et al. [13] is widely used.When synthesizing spatiotemporal trajectory data, some spatiotemporal points are selected from the real trajectory, and these spatiotemporal points are rotated at the centre.Wu et al. [14] considered the gravity movement model to generate a dummy trajectory and proposed a method to protect trajectory data in continuous LBS and trajectory publication.Considering the spatiotemporal constraints of geographical environment of the users, Lu et al. [15] proposed a trajectory privacy protection method based on dummy trajectory; this method hides the real trajectories by dummy trajectories and can effectively protect user privacy.On the other hand, it is necessary to avoid adding too many dummy trajectories.
However, existing works rarely apply dummy trajectory to privacy protection for location sequences such as our work.The method proposed in this paper is applied to the environment where the attackers have already mastered some background knowledge, and the dummy trajectory is used to reduce the availability of background knowledge.

Clustering and Partition
The method based on clustering and partitioning divides the original trajectory dataset into small groups and then performs anonymization in each group to make the trajectories within each group indistinguishable.These methods are mostly used for spatiotemporal trajectories.Samarati [16] and Sweeney [17] implemented privacy protection based on k-anonymity, which deal the data so that each record has at least the exact same quasiidentifier attrib-ute value as the other k-1 records in the same group.Abul et al. [18] proposed (k, δ)-anonymity based on k-anonymity, proposed the never walk alone (NWA) algorithm, and proposed achieving k-anonymity through trajectory clustering.Domingo-Ferrer et al. [19,20] pointed out that the NWA algorithm may still leak privacy, and they proposed the SwapLocations algorithm to classify trajectories by general microaggregation and then replace the locations in the clusters.This method may also lose much information.Dong et al. [21] proposed a trajectory privacy preservation method based on frequent paths, which applied the concept of frequent paths to trajectory privacy preservation for the first time.They first found frequent paths to divide candidate groups and then selected representative trajectories in the candidate groups.This method can effectively achieve privacy protection but does not consider the different privacy requirements of users.Considering users' privacy requirements, Kopanaki et al. [22] proposed personalized privacy parameters based on the research of Domingo-Ferrer et al. [19] and proposed trajectory division in the preprocessing stage.To further improve the utility of anonymous datasets, Chen et al. [23] considered multiple factors of spatiotemporal trajectories and proposed a method of privacy preservation for trajectory data publication based on 3D cell division to improve the problem of excessive information loss.The method divides cells in the trajectory preprocessing stage and performs suppression or location exchange within each cell for anonymity, which improves data utility.
Unlike our work, the above methods are often used for spatiotemporal trajectory data publication, and the partitioning of trajectories is implemented in the preprocessing stage.The advantage of trajectory division is that no locations are lost.Terrovitis et al. [8] proposed a method of splitting trajectories.The trajectories are split in the anonymous stage, and a trajectory may be split into two trajectories in their method, which can achieve effective anonymity and reduce the loss of locations.However, it may not work very well if only the splitting is implemented in trajectory anonymity, because splitting trajectories may affect the behavioral pattern of the users.

Generalization and Suppression
The privacy-preserving methods for location sequences are mostly based on generalization and suppression.Generalization is utilized to replace the content of information with a more ambiguous concept and generalize the locations on the trajectory to an enlarged area, which causes the locations in this area to be indistinguishable.Suppression selectively deletes specific sensitive locations before releasing trajectory data to achieve trajectory privacy preservation.Nergiz et al. [24] first proposed a generalization-based privacy preservation method for trajectory data publication, which introduced k-anonymity into the generalization-based method to achieve anonymity.Then, Yarovoy et al. [25] proposed a spatial generalization-based k-anonymity approach to anonymously publish moving object databases.Terrovitis et al. [26] defined a k m -anonymity model, which is a new version of k-anonymity and is mainly used to protect privacy in the publication of set-valued data.To effectively protect data utility, Poulis et al. [27] applied k m -anonymity to trajectory data and developed three anonymity algorithms based on a priori principles.Terrovitis et al. [28] proposed a method to suppress partial location information to achieve privacy preservation.To solve the problem of inferring unknown locations through partial information by attackers, they used partial trajectory knowledge as the quasi-identifier of the remaining locations and transformed the trajectory database into a safe counterpart.
For the above problem, Terrovitis et al. [8] continued applying suppression and splitting to develop four algorithms.Among those four algorithms, they proposed two algorithms based on suppression G sup and L sup , which delete specific locations according to different selection methods.The proposed algorithms can effectively achieve privacy protection, but there are still some problems in the processing.One is the linear storage of the original trajectory and the idea of the greedy algorithm, which result in a large amount of computation time.The other is the large information loss of the anonymous dataset.Lin et al. [10] proposed four algorithms based on suppression, which were implemented on tree structures.They proposed a method of anonymity gain measurement, which was optimized in terms of the performance of computation time.Although the performance of computation time is significantly improved, the information loss is basically the same as that of the algorithms [8].In addition, the global suppression algorithm based on the tree structure cannot perform the correct update operation in some cases.
To reduce information loss of the anonymous dataset, we adopt a tree-based storage structure [10] to store the original data and implement suppression, splitting, and dummy trajectory adding based on the tree structure, which transforms processing of the privacy problems into problematic nodes.We consider suppression, splitting, and dummy trajectory adding as candidate techniques and design an LP function based on location loss and anonymity gain to select the final anonymity technique.

Preliminaries
To address the problem of privacy preservation for trajectory data publication, some specific locations in the dataset to be published should be processed, and a corresponding safe dataset with minimal information loss and good privacy preservation will be obtained.In this section, we introduce some basic definitions and formulas that will be used throughout this paper, and the storage structure used in our method will also be described.

Problem Definition
We proposed a privacy protection method for trajectory data publication in this paper.
In our method, it should be noted that the trajectories studied in our work are sematic trajectories and are unidirectional.In other words, there are no repeated locations in a trajectory, The relevant definitions of the trajectory data are introduced as follows.
Let l i be a location of special interest on a map (e.g., hospital, bank, or store).Then, L is a set of locations denoted by L = {l 1 , l 2 , . . . ,l n }.

Definition 1 (Trajectory).
A trajectory t is defined as an ordered list of locations, t = l 1 → l 2 → . . .→ l n , where l i ∈ L, 1 ≤ i ≤ n .The length of a trajectory t denoted as |t|, which represents the number of locations in t.The set of m trajectories is presented as T = {t 1 , t 2 , . . . ,t m } Example 1. Table 1 shows eight trajectories, which are defined on the set of locations L = {a 1 , a 2 , a 3 , a 4 , a 5 , b 1 , b 2 , b 3 , b 4 }, and T = {t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 }.The length of trajec- tory t Definition 2 (Subtrajectory).Given two trajectories defined on the set of locations L, denoted as To achieve the safe publication of trajectory data, we anonymize the trajectory dataset before publication [8][9][10][11].Therefore, we need to infer the partial information of trajectory owned by potential attackers Adv, which is called the background knowledge.Definition 3 (Background Knowledge).For a set of trajectories T defined on L, an attacker A in Adv owns a set of locations L A ⊂ L, and L A is called the background knowledge owned by A. Attacker A can track any user visiting the locations in L A .
In this paper, we assume that any attacker owns background knowledge and that the background knowledge mastered by any two attackers is not shared.In other words, for any attacker A, B ∈ Adv and A = B, L A ∩ L B = ∅.If our method is applied to the situation where attackers are sharing background knowledge, the trajectory data publisher needs to regroup attackers according to the disjointed background knowledge.
Example 3. In Table 1, attackers A and B own the sets of locations L A = {a 1 , a 2 , a 3 , a 4 , a 5 } and L B = {b 1 , b 2 , b 3 , b 4 }, respectively, and L A ∩ L B = ∅ Definition 4 (Trajectory Projection).Let A be an attacker owning L A , and there is a trajectory t = l 1 → l 2 → . . .→ l n .The trajectory projection of attacker A is denoted as PT A (t) and is a subtrajectory of trajectory t.In other words, all locations in PT A (t) belong to L A and trajectory t.
The set of trajectory projections of all trajectories t ∈ T about attacker A is denoted as Example 4. In Table 1, the trajectory projections of t 1 to attackers A and B are PT A (t 1 ) = a 5 → a 1 and PT B (t 1 ) = b 4 , respectively.The trajectory projection set of all trajectories t ∈ T is T A , as shown in Table 2.
The attacker A may be able to infer the trajectories that contain the trajectory projection; we say that the set of these trajectories is a trajectory projection support set.
Definition 5 (Trajectory Projection Support Set).Let T * be a trajectory dataset to be published, and let t A be a trajectory in T A .If t A = PT A (t), we say that t supports a trajectory projection t A .The trajectory projection support set of t A in T A about T * is denoted as ST * t A , ST * It also represents that the attacker A may be able to infer that a 3 is the trajectory projection of either of t 2 or t 8 .
For these scenarios, A can also infer that a 3 may be associated with b 4 .Let ST * t A , l be a trajectory set that contains all trajectories in ST * t A and a location l / ∈ L A , denoted as

Definition 6 (Inference Probability).
The inference probability is denoted by P r t A , l , which represents the probability of associating a location l / ∈ L A to trajectory projection t A , and calculated as Equation (1): Example 6.After Example 5, ST * (a 3 , b 4 ) = {t 2 }.The inference probability of associating b 4 to trajectory projection a Definition 7 (Problematic Pair).Let P t be a probability threshold defined by the user, which represents the user's privacy requirements.If P r t A , l > P t , then the pair t A , l is a problematic pair, and t A is defined as problematic.Moreover, the number of problems of t A , l is t A , l , and the total number of problems is denoted as N.
A trajectory dataset is called unsafe if it has at least one problematic pair.The aim of our work is to eliminate all problems.

Definition 8 (Problem Definition).
Given a trajectory dataset T, a user's privacy requirements threshold P t , and a set of potential attackers Adv, construct a corresponding safe dataset T * to T, with minimum information loss.

Trajectory Projection Tree
In the existing trajectory privacy preservation methods, some storage structures were proposed in the trajectory preprocessing stage, such as linear structures [8], graph structures [21,29] and tree structures [9,10].In this paper, we choose the tree structure (TP-tree) [10] to store the original trajectory dataset T. The reason is that the tree-based storage structure can simplify the calculation of inference probability and reduce the computational cost.

Example 7.
With Table 1 as the original trajectory dataset T and attackers set Adv = {A, B} as the inputs, let us execute the TP-treeConstruction algorithm [10] to construct a TP-tree corresponding to T. The TP-tree of Table 1 is shown in Figure 1.The details of the TP-tree are introduced as follows: tree.
Secondly,for each attacker's trajectory projection, all locations in the trajectory but not in the trajectory projection are the locations that may be inferred.For the tree node  ( ) =  →  , the location  in  but not in  ( ) may be inferred by A, so the 3 -level node  : ( , 2) of the 2 -level node  →  : ( , 13) should be generated.
Similarly, for the tree node  ( ) =  , the locations  and  in  but not in  ( ) may be inferred by B, and the 3 -level nodes  : ( , 1) and  : ( , 3) of the 2 -level node  : ( , 2) should be generated.
It should be noted that in order to facilitate writing, only the trajectory projection and location representation contained in the node are written in the subsequent description of the node.For example, the 2 -level node  →  : ( , 13) can be written as  →  .1.The 2 -level nodes of the TP-tree correspond to projection trajectories, and the 3 -level nodes correspond to locations that may be inferred.

Anonymity Gain Measurement
When processing location sequences, the trajectories will be changed, and the total number of problems N may also be changed.Taking both aspects into consideration, a set of gain measurement formulas for suppression and splitting were proposed [8].To measure the change in N,  is defined as Equation (2): where N ( ) represents the number of problems before (after) anonymity.
Let  and  ( ⊂  ) be the trajectory projections owned by an attacker A. When at least one of  and  is problematic, we use  to unify  ; in other words, the  1.The 2nd-level nodes of the TP-tree correspond to projection trajectories, and the 3rd-level nodes correspond to locations that may be inferred.
Fisrtly, obtain the trajectory projections of different attackers; the trajectory projections are inserted as 2nd-level nodes.For trajectory t 1 = a 5 → b 4 → a 1 , the trajectory projections of A and B are PT A (t 1 ) = a 5 → a 1 and PT B (t 1 ) = b 4 , respectively.The 2nd-level nodes {a 5 → a 1 : (t 1 , 13)} and {b 4 : (t 1 , 2)} should be inserted into the TP-tree.
Secondly, for each attacker's trajectory projection, all locations in the trajectory but not in the trajectory projection are the locations that may be inferred.For the tree node PT A (t 1 ) = a 5 → a 1 , the location b 4 in t 1 but not in PT A (t 1 ) may be inferred by A, so the 3rd-level node {b 4 : (t 1 , 2)} of the 2nd-level node {a 5 → a 1 : (t 1 , 13)} should be generated.Similarly, for the tree node PT B (t 1 ) = b 4 , the locations a 5 and a 1 in t 1 but not in PT B (t 1 ) may be inferred by B, and the 3rd-level nodes {a 5 : (t 1 , 1)} and {a 1 : (t 1 , 3)} of the 2nd-level node {b 4 : (t 1 , 2)} should be generated.
It should be noted that in order to facilitate writing, only the trajectory projection and location representation contained in the node are written in the subsequent description of the node.For example, the 2nd-level node {a 5 → a 1 : (t 1 , 13)} can be written as {a 5 → a 1 }.

Anonymity Gain Measurement
When processing location sequences, the trajectories will be changed, and the total number of problems N may also be changed.Taking both aspects into consideration, a set of gain measurement formulas for suppression and splitting were proposed [8].To measure the change in N, gain N is defined as Equation (2): where N (N ) represents the number of problems before (after) anonymity.Let t A m and t A n (t A n ⊂ t A m ) be the trajectory projections owned by an attacker A. When at least one of t A m and t A n is problematic, we use t A n to unify t A m ; in other words, the locations in t A n but not in t A m must be deleted from t A m and the original trajectories that contain t A m .To measure the information loss, the ploss of suppression is defined as Equation (3): where t (t ) represents the trajectory before (after) suppression and |t| (|t |) represents the length of trajectory t (t ).The anonymity gain measurement of suppression is defined as G gain , which is shown as Equation ( 4): where S represents the set of trajectories that are affected by the change in suppression, S ⊆ T.
Let l be the splitting location in a trajectory t.Trajectory t is split into two subtrajectories (t , t ) from location l.The ploss of splitting is defined as Equation ( 5): where t(t , t ) represents the trajectory before (after) splitting, and |t|(|t |, |t |) represents the length of the trajectory t(t , t ).Therefore, the anonymity gain measurement of splitting is defined as S gain , shown as Equation ( 6): where S represents the set of trajectories affected by the change in splitting, S ⊆ T.
For a problematic trajectory projection t A , a dummy trajectory identical to t A will be generated and defined as t .Therefore, the ploss of the dummy trajectory is defined as Equation ( 7): where t (t ) represents the trajectory before (after) adding the dummy trajectory, and |t| (|t |) represents the length of trajectory t (t ).However, there is no original trajectory that corresponds to t , so the values of ploss(t, t ) must be 1.The anonymity gain measurement of the dummy trajectory is defined as F gain , shown as Equation ( 8):

Methods
In this section, we propose a privacy-preserving method based on LPA to protect the privacy of trajectory publication.An overview of the proposed algorithm is shown in Figure 2. First, we construct the TP-tree [10] to store the original trajectory dataset T.Then, we use the problematic nodes finding (PNF) algorithm to obtain the set of problematic nodes P in the TP-tree and the total number of the problems N. Afterwards, we measure the anonymity gain of each anonymity technique and obtain the anonymity gains G gain t A m , t A n , S gain (l, t), and F gain (t ).Then, the LP algorithm is used to select a specific technique to process the problematic nodes.The proposed method is described in detail in the rest of this section.Definition 9 (Problematic Node).The problematic node is the 2 -level node containing the problem in the TP-tree, which needs to be eliminated.The tree node of TP-tree containing  is a problematic node if and only if  is problematic.

Problematic Nodes Finding
Based on the TP-tree constructed in the preprocessing stage, we use the PNF Definition 9 (Problematic Node).The problematic node is the 2nd-level node containing the problem in the TP-tree, which needs to be eliminated.The tree node of TP-tree containing t A is a problematic node if and only if t A is problematic.

Problematic Nodes Finding
Based on the TP-tree constructed in the preprocessing stage, we use the PNF algorithm to identify the privacy threats of the trajectory dataset.Then, we find all problematic nodes in the tree under the user's privacy requirements.The PNF algorithm is depicted as Algorithm 1.

Algorithm 1: Problematic_Nodes_Finding (PNF)
Input: TP-tree, user's privacy requirements threshold P t Output: Problematic pairs set Q, problematic nodes set P, the total number of problems N for each Sec in the 2nd-level of TP-tree do 3: for each Thi node of Sec do 4: Get the tra.num of Sec named Sec.num5: Get the tra.num of Thi named Thi.num6: Calculate P r = Thi.numSec.num

7:
if P r > P t then 8: Insert pair (Sec, Thi) into Q 9: if Sec not in P then 10: Insert Sec into P 11: N = N + Thi.num 12: return Q, P, N Algorithm 1 takes the TP-tree and user's privacy requirements threshold P t as the inputs and returns the set of problematic pairs Q, the set of problematic nodes P, and the total number of problems N.After initialization, the algorithm scans all the 2nd-level nodes Sec and the corresponding child nodes Thi of Sec and obtains the trajectory number of Thi and Sec, denoted as Sec.num and Thi.num (Lines 2 to 5), respectively.Then, this algorithm calculates P r = Thi.numSec.num and checks whether P r is greater than P t .If P r is greater than P t , (Sec, Thi) is inserted into set Q.Moreover, if Sec is not in set P, Sec is inserted into set P. Finally, the Thi.num of Thi is accumulated into the total number of problems N (Lines 6 to 11).
Example 8.After constructing the TP-tree of T, Algorithm 1 is executed to find the problematic nodes of the TP-tree.In our example, the user's privacy requirements threshold P t is defined as 0.5.After executing Algorithm 1, as Figure 3 demonstrates the problematic nodes are shown in the dotted box, the problematic set

Anonymity Gain Measurment
For a problematic node, we calculate the anonymity gain when applying suppression ( ( ,  )), splitting ( (, )) and the dummy trajectory ( ( )).Then, we select the final technique and obtain an array denoted as FinalGain, which is a row of PreGain.The PreGain array is shown in Table 3, where each column of the PreGain array corresponds to the trajectory projection contained in problematic node that needs to be processed, the trajectory ID contained in the node, the final operation, the relevant

Anonymity Gain Measurment
For a problematic node, we calculate the anonymity gain when applying suppression (G gain t A m , t A n ), splitting ( S gain (l, t) and the dummy trajectory (F gain (t )).Then, we select the final technique and obtain an array denoted as FinalGain, which is a row of PreGain.The PreGain array is shown in Table 3, where each column of the PreGain array corresponds to the trajectory projection contained in problematic node that needs to be processed, the trajectory ID contained in the node, the final operation, the relevant operation information, and the anonymity gain.

Splitting
We design the splitting operation based on the TP-tree.The idea of the intuitive splitting technique is to directly divide a trajectory into two trajectories from the splitting location.The splitting location is selected based on the trajectory projection contained in the problematic node, and all the locations in the trajectory projection contained in the

Splitting
We design the splitting operation based on the TP-tree.The idea of the intuitive splitting technique is to directly divide a trajectory into two trajectories from the splitting location.The splitting location is selected based on the trajectory projection contained in the problematic node, and all the locations in the trajectory projection contained in the problematic node can be defined as the splitting location l.The location l is an indivisible location if and only if l is the last location of the original trajectory.For a problematic node there may be several cases of splitting, so selecting the final splitting location l requires calculating the splitting anonymity gain of all potential locations except the indivisible one and then selecting the best one as the final splitting anonymity gain.The details of the TP-tree changes are described in Example 10.
Example 10.Considering the problematic node b 1 → b 2 , splitting based on the TP-tree is used, and the anonymity gain is obtained.The local change in the tree is shown in Figure 5. First, the trajectories that are involved in the problematic node are found.Moreover, the trajectory involved is t 7 , and the splitting location is b 1 or b 2 , where b 2 is an indivisible location.Thus, t 7 is split from b 1 to obtain t 7 and t 7 .Then, we construct a new TP-tree* about the trajectory t 7 and t 7 , delete all node information of t 7 in the original TP-tree, and merge the TP-tree* of the trajectory t 7 and t 7 into the original TP-tree.After splitting, the number of problems N is 12; the length of t 7 is 6; the length of the trajectory t 7 and t 7 are 2 and 4, respectively; and the S gain (b 1 , t 7 ) is 0.47.

Splitting
We design the splitting operation based on the TP-tree.The idea of the splitting technique is to directly divide a trajectory into two trajectories from the location.The splitting location is selected based on the trajectory projection con the problematic node, and all the locations in the trajectory projection contain problematic node can be defined as the splitting location .The location  is an in location if and only if  is the last location of the original trajectory.For a pro node there may be several cases of splitting, so selecting the final splitting l requires calculating the splitting anonymity gain of all potential locations e indivisible one and then selecting the best one as the final splitting anonymity details of the TP-tree changes are described in Example 10.
Example 10.Considering the problematic node  →  , splitting based on the TP-tr and the anonymity gain is obtained.The local change in the tree is shown in Figure 5 trajectories that are involved in the problematic node are found.Moreover, the trajector is  , and the splitting location is  or  , where  is an indivisible location.Thus, from  to obtain  and  " .Then, we construct a new TP-tree* about the trajector  " , delete all node information of  in the original TP-tree, and merge the TP-tr trajectory  and  " into the original TP-tree.After splitting, the number of problem the length of  is 6; the length of the trajectory  and  " are 2 and 4, respectivel  ( ,  ) is 0.47.

Dummy Trajectory
The key to the dummy trajectory based on the TP-tree is adding dummy trajectories; in this paper, we add a dummy trajectory based on the trajectory projection contained in the problematic node.In other words, a dummy trajectory similar to the trajectory protection contained in problematic node is added to the TP-tree.The details of the TP-tree changes are described in Example 11.
Example 11.Considering the problematic node b 1 → b 2 , the dummy trajectory based on the TP-tree is used, and the anonymity gain is obtained.A dummy trajectory t 7 = b 1 → b 2 is added to the TP-tree, and Figure 6 shows the local change of the tree by using the dummy trajectory.Afterwards, the problem number N is 12, and F gain (b 1 → b 2 ) is 0.25.

Local Preferential Seclection
We consider suppression, splitting, and dummy trajectory adding as candidate techniques to achieve privacy preservation based on the tree structure, solving problematic nodes step by step.To select the final technique for a specific problematic node, we propose an LP function based on the analysis of location loss and the anonymity gain.
The key to the dummy trajectory based on the TP-tree is adding dummy trajectories; in this paper, we add a dummy trajectory based on the trajectory projection contained in the problematic node.In other words, a dummy trajectory similar to the trajectory protection contained in problematic node is added to the TP-tree.The details of the TPtree changes are described in Example 11. 6 shows the local change of the tree by using the dummy trajectory.Afterwards, the problem number N is 12, and  ( →  ) is 0.25.

Local Preferential Seclection
We consider suppression, splitting, and dummy trajectory adding as candidate techniques to achieve privacy preservation based on the tree structure, solving problematic nodes step by step.To select the final technique for a specific problematic node, we propose an LP function based on the analysis of location loss and the anonymity gain.
Let P and R be the first and second anonymity gains, respectively.The suppression is the final selection if one of the following conditions holds: (1)  =  ( ,  ) and Del.loc = 1; (2)  =  ( ,  ) and  ( ,  ) −  > In other cases, the operation that corresponds to R must be finally selected.Analysis.In this paper, we consider suppression, splitting, and dummy trajectory adding as candidate techniques.However, suppression will quickly delete several locations once to reduce the number of problems, which results in the algorithm tending towards suppression.Additionally, many locations may be deleted.Therefore, we construct an LP function to change this trend to protect privacy and reduce information loss.We define the operation as the best operation if and only if the number of locations lost is 1 and the anonymity gain of the suppression is largest.We set the value of  to the same as the user's privacy requirements threshold.
The LP algorithm is shown as Algorithm 2. The LP algorithm takes a problematic node c, the user's privacy requirements threshold  , and the TP-tree as the inputs.The LP algorithm returns an array FinGain, shown as Table 3.Initially, for a problematic node c, the LP algorithm calculates the anonymity gain of suppression  ( ,  ).The node pair ( ,  ) is set as the maximum anonymity gain  ( ,  ) (Lines 2 to 4).Then, the LP algorithm scans all the 2 -level nodes of the TP-tree, finds all the trajectories that Let P gain and R gain be the first and second anonymity gains, respectively.The suppression is the final selection if one of the following conditions holds: (1) P gain = G gain t A m , t A n and Del.loc = 1; (2) P gain = G gain t A m , t A n and G gain t A m , t A n − R gain > P t In other cases, the operation that corresponds to R gain must be finally selected.Analysis.In this paper, we consider suppression, splitting, and dummy trajectory adding as candidate techniques.However, suppression will quickly delete several locations once to reduce the number of problems, which results in the algorithm tending towards suppression.Additionally, many locations may be deleted.Therefore, we construct an LP function to change this trend to protect privacy and reduce information loss.We define the operation as the best operation if and only if the number of locations lost is 1 and the anonymity gain of the suppression is largest.We set the value of P t to the same as the user's privacy requirements threshold.
The LP algorithm is shown as Algorithm 2. The LP algorithm takes a problematic node c, the user's privacy requirements threshold P t , and the TP-tree as the inputs.The LP algorithm returns an array FinGain, shown as Table 3.Initially, for a problematic node c, the LP algorithm calculates the anonymity gain of suppression G gain t A m , t A n .The node pair mt A m , mt A n is set as the maximum anonymity gain G gain t A m , t A n (Lines 2 to 4).Then, the LP algorithm scans all the 2nd-level nodes of the TP-tree, finds all the trajectories that the problematic node contained, and calculates the splitting anonymity gain S gain (l, t) of all trajectories at each splitting location.By letting (ml, mt) be the gain, S gain (l, t) is largest (Lines 5 to 7).Finally, the LP algorithm adds the same dummy trajectory as trajectory projection t A c and calculates the anonymity gain of dummy trajectory F gain (t ) (Line 8).Afterwards, the LP algorithm selects the final operation for the problematic node and returns FinGain (Lines 9 to 17).

Trajectory Anonymization
In this subsection, we introduce our algorithm of privacy preservation for trajectory data publication based on LPA.The algorithm iteratively solves the problematic nodes of the TP-tree until the trajectory dataset release is met.The LPA algorithm is shown as Algorithm 3.
Algorithm 3 takes an original trajectory dataset T, a set of attackers Adv, and a user's privacy requirements threshold P t as the inputs and returns a safe trajectory dataset T * .Initially, the LPA algorithm constructs the TP-tree [10] (Line 1) and then calls the PNF algorithm to obtain the set of problematic nodes P and the total number of problems N (Line 2).If there are problematic nodes in the TP-tree ( N > 0), c is set to be a problematic node, and the LPA algorithm continues to select the final operation according to the LP algorithm (Line 5).Afterwards, the returned final operation and related information are used to solve the problematic node c and update the TP-tree.Specifically, if the final operation is suppression, the corresponding node updates of t A m are merged into t A n (Lines 6 to 11).If the final operation is splitting, the LPA algorithm finds all the trajectories that are involved in the problematic node and constructs a dataset T s .All node information about the trajectory t ⊂ T s must be deleted from the original TP-tree.After that, each trajectory t ⊂ T s is split from the splitting location l, and a new dataset T m is reconstructed.Finally, the TP-tree* of the dataset T m is merged to the TP-tree (Lines 13 to 18).If the final operation is to add a dummy trajectory, then the same dummy trajectory t as the problematic node is added and merged into the TP-tree (Lines 20 to 21).After the above step, the new number of problems N is calculated, and the LPA algorithm iteratively eliminates the number of problems until N is 0.Then, the corresponding safe trajectory dataset T * is obtained.Proof.To derive a corresponding safe dataset T * of trajectory dataset T, there are no problem pairs in T * and no problematic nodes in the final TP-tree, and the number of problems N is 0. In our method, the LPA algorithm first constructs a TP-tree of T, so each entire trajectory will be stored in the tree.Then LPA calls the PNF algorithm to find the problematic nodes P in the TP-tree.Afterwards, the LPA algorithm selects the anonymity operation for each problematic node based on the LP algorithm and then solves the problematic node one by one until P is empty and N is 0. TP-tree = TPtreeConstruction (T, Adv) 2: (Q, P, N) = Problematic_Nodes_Finding (TP-tree, P t ) 3: while N > 0 do 4: Let a pending problematic node in P be c 5: FinGain = Local_Preferential (TP-tree, c,P t ) 6: if FinOp is suppress then //FinOp Final Operation in FinGain 7: Let Sec (Sec) be the node at 2nd-level of TP-tree having projection t A m (t A n ) 8: Update Sec.tra as Sec.tra ∪Sec .traand update Sec.loc 9: Insert each child node h of Sec into Sec 10: Delete Sec from TP-tree 11: Delete the corresponding child node h of Sec " //Sec " .tra if FinOp is split then //FinOp Final Operation in FinGain 14: Reconstruct trajectory dataset as T s 15: Remove all node information about t ⊂ T s from TP-tree 16: Split each t in T s and reconstruct dataset T m 17: TP-tree* = TPtreeConstruction (T m , Adv) 18: Update TP-tree as TP-tree ∪ TP-tree* 19: else 20: FinOp is dummy //FinOp Final Operation in FinGain 21: Update TP-tree as TP-tree∪ t 22: (Q, P, N) = Problematic_Nodes_Finding (TP-tree, P t ) 23: return T * For example, for the original dataset in Table 1, after executing the LPA algorithm, we obtain a dataset T * , as shown in Table 4.Then, we execute the PNF algorithm for identification; the problem pairs set Q and the problematic nodes set P are empty, and the number of problems N is 0. Therefore, T * is a safe dataset.

Complexity Analysis
In order to present the results of algorithm complexity more clearly, let Adv be a set of potential attackers, N ta be the maximum number of different projections about trajectory dataset T to an attacker, and N loc be the maximum number of different locations contained in the trajectories to a projection.N is the total number of problems.
The cost of the TPtreeConstruction algorithm depends on inserting the trajectories in T into TP-tree.When considering an attacker and a trajectory, the cost of inserting all locations into the tree is O(N ta × N loc ).Therefore, the cost of inserting all locations of all trajectories of all attackers is O(|T| × |Adv| × N ta × N loc ).The most expensive cost of algorithm PNF depends on scanning the 2nd-level node and the corresponding 3rd-level nodes of each 2nd-level node at the same time.The worst case is the maximum number of 2nd-level projection nodes, as the cost of algorithm PNF is O(|Adv| × N ta × N loc ).
Algorithm LP mainly measures the anonymous gains of suppression, splitting, and dummy trajectory adding for a problematic node in the problematic node set of the TPtree and selects the best value in the FinGain array.The expensive cost of suppression involves to scanning the 2nd-level nodes of the TP-tree once, finding all the trajectory projection pairs, and then scanning the TP-tree again for the calculation.The cost is O(2 × |Adv| × N ta × N loc ).
The cost of splitting involves scanning the trajectories contained in the problematic node and the corresponding trajectory projection of the problematic node, and then the different locations of this trajectory projection are analyzed, with the cost of O(N loc × |Adv| × N ta ).The cost of dummy trajectory adding involves scanning the TP-tree for calculation, and the time requirement is O(|Adv| × N loc × N ta ).In total, the cost of algorithm LP is O(|Adv| × N loc × N ta ).
The time cost of algorithm LPA is mainly dependent on the total number of problems N in the TP-tree and the call to the above algorithms.The worst case is that all problems N need to be iterated, and the cost is O(N × |Adv| × N ta × N loc ).
Discussion.Time performance analysis of the proposed algorithm LPA and the compared algorithms [8,10].
The storage structures of G sup and L sup [8] are arrays to store the trajectory dataset and the set of trajectory projection, so each calculation of the inference probability needs to traverse the trajectory projection set first and then the trajectory dataset, which results in lower time performance.In order to solve this problem, the tree-based storage structure is adopted in the algorithms IG SUP * and IL SUP * [10], which avoids wasting time in traversing the trajectory dataset in the calculation process of inference probability and verifies the significant improvement of time performance through a number of experiments.In our method, the tree-based storage structure [10] is also be applied to store the trajectory dataset, so the time performance of the problematic node finding is better than that of algorithm G sup and L sup and is similar to the algorithm IG SUP * and IL SUP * .In the stage of solving the problematic nodes, we consider three candidate technologies for a specific node, which takes more time than the algorithm IG SUP * and IL SUP * , but its traversal performance is still better than the algorithm G sup and L sup because of the storage structure.However, although the time performance of the algorithm LPA is worse than that of the algorithms IG SUP * and IL SUP * , the data utility of the anonymous dataset constructed by the LPA algorithm is significantly better than that of the comparison algorithm.The detailed analysis of data utility will be introduced in Section 5.

Experiment
The LPA algorithm was implemented in Python 3, and other algorithms were implemented in Java.All experiments were tested on a PC with an AMD Ryzen 7 at 2.90 GHz with 16 GB of RAM.We compared our algorithm with the algorithm G sup and L sup [8]; both algorithms apply global and local suppression.The IG SUP * and IL SUP * [10], which are based on suppression and pruning techniques, are also compared.Our experiment is mainly aimed at measuring the data utility of the anonymous dataset; the programming language of the algorithm will not affect the final result of the data.

Dataset
We use the City80K [30,31] dataset in our experiments.City80K contains 80,000 trajectories.After trajectory preprocessing, the data form shown in Table 1 will be obtained.In this paper, only the location information that the user signs in with was used.

Metrics
The key to measuring the performance of the trajectory privacy preservation algorithm is to evaluate the quality of anonymous trajectory datasets, which mainly include information that remains from anonymous datasets and the utility of anonymous datasets.We focus on the changes in the trajectory dataset before and after anonymity.The measuring utilities are as follows.
Average trajectory remaining ratio.The first metric measures the remaining ratio of locations of each trajectory.We define TR i as the trajectory remaining ratio of trajectory t i and calculate it as Equation ( 9): We compare each trajectory step by step and find the average trajectory remaining ratio for the anonymous dataset, denoted as TR(avg).The higher the value is, the better the effect will be.The formula for calculating TR(avg) is in Equation (10): Average location appearance ratio.We use the average location appearance ratio to measure changes in the number of locations before and after anonymity, denoted as AR(avg), and we calculate the change in the number of appearances of the location in the original trajectory dataset and the anonymous dataset.For each location l ∈ L in the trajectory dataset, we define AR(l) as the number of appearances of l in the anonymity dataset n l and the original trajectory dataset (n l ratio.The formulas are shown in Equations ( 11) and (12): where the value of AR(avg) is between 0 and 1; the higher the value is, the better the effect.
Frequent sequential pattern remaining ratio.In general, if there are more frequent sequential patterns that are preserved after anonymity, then the better the data utility will be.We use the Prefixspan algorithm to obtain frequent sequential patterns in the original dataset and the anonymity dataset.Then, we calculate the average percentage of frequent sequential patterns of the original dataset observed in the safe counterpart, denoted as FSP(avg).

Result and Analysis
Considering the parameter settings of the compared algorithms, we set the experimental parameters as follows: (1) The user's privacy requirements threshold P t .P t is 0.5 by default, and the range varies from 0.4 to 0.7.
(2) The average length of a trajectory denoted by |t|.|t| is 6 by default, and the range varies from 4 to 7.
(3) The size of the trajectory dataset denoted by |T|.|T| is 300 by default, and the range varies between 150 and 400.
Since our algorithm has a certain randomness in the order of processing and the selection of problematic node c, each group of experiments is run ten times and the results are averaged.

Average Trajectory Remaining Ratio
Figure 7 shows the average trajectory remaining ratio TR(avg) for the variable of the user's privacy requirements threshold P t , the average length of a trajectory |t|, and the size of the trajectory dataset |T|.

Average Location Appearance Ratio
Figure 8 shows the average location appearance ratio AR(avg) for the user's requirements threshold  , the average length of a trajectory ||, and the siz trajectory dataset ||.
As shown in Figure 8a, with the increase of  , the AR(avg) values of all alg show an upwards trend, and the AR(avg) values of the LPA algorithm are significantly higher than those of the others.In Figure 8b with the increase o AR(avg) values of the four algorithms gradually decrease, but the AR(avg) valu LPA algorithm take 6 as the critical point and show a trend of first decreasing a increasing.Moreover, the AR(avg) value of the LPA algorithm is always higher t of the other algorithms.Figure 8c shows that as || increases, the AR(avg) valu LPA algorithm fluctuate within the range of approximately 0.80 and are alway than those of the other four algorithms.The reason for this phenomenon is that algorithm considers three techniques as candidate techniques, and the applicati LP function reduces the frequency of the suppression and reduces the informatio anonymity processing.As shown in Figure 7a, the highest TR(avg) value is always achieved by the LPA algorithm, which is much better than G SUP , L SUP IG SUP * , and IL SUP * .The trend of the LPA algorithm initially decreases and then increases because the LP function is used in the LPA algorithm, which can balance the use of the three techniques with certain randomness.In Figure 7b, the TR(avg) values decrease as |t| increase.This occurs because if the trajectory is larger, the locations of a trajectory that are owned by the attackers will also be larger, which results in an increase in the number of problems N.We can see that the values of TR(avg) of the LPA algorithm are on average 0.32 and 0.25 higher than that of G SUP and IG SUP * , respectively, and 0.16 higher than L SUP and IL SUP * .Figure 7c presents the result when the value of |T| increases.The TR(avg) values of the LPA algorithm fluctuate within the range of approximately 0.88 to 0.93.Additionally, the TR(avg) values are always much higher than the other algorithms, which indicates that the scalability of our algorithm is better than that of the other algorithms.The main reason for this result occurring is that the LP function used in the LPA balances the selection of the three techniques.

Average Location Appearance Ratio
Figure 8 shows the average location appearance ratio AR(avg) for the user's privacy requirements threshold P t , the average length of a trajectory |t|, and the size of the trajectory dataset |T|.

Frequent Sequential Pattern Mining
We present the average frequent sequential pattern remaining ratio FSP(avg variable of the user's privacy requirements threshold  , the average length of a t ||, and the size of the trajectory dataset ||.In our experiments, the frequent threshold is defined as 2. The results are shown in Figure 9.For each parameter the LPA algorithm has the highest FSP(avg) values.
As shown in Figure 9a, for LPA, L , and IL * , the general trend is FSP(avg) values increase as  increases, while the trends of G and IG * obvious upwards motion.Furthermore, the FSP(avg) values of the LPA algor significantly better than those of the other algorithms because the LP function of algorithm gradually processes the problematic nodes and merges the nodes improves the FSP(avg).In Figure 9b, the FSP(avg) values gradually increase increasing ||, and the FSP(avg) value of the LPA algorithm is always higher tha the other algorithms.The reason for this result is that the longer the average tr length is, the higher the number of problems.Similar to the data in the previ experimental results, Figure 9c shows that as ||increases, the FSP(avg) values overall upwards trend, and the values of LPA are overall higher than those of t algorithms.It can be concluded that the overall performance of FSP(avg) in LPA than that of the other algorithms.As shown in Figure 8a, with the increase of P t , the AR(avg) values of all algorithms show an upwards trend, and the AR(avg) values of the LPA algorithm are always significantly higher than those of the others.In Figure 8b with the increase of |t|, the AR(avg) values of the four algorithms gradually decrease, but the AR(avg) values of the LPA algorithm take 6 as the critical point and show a trend of first decreasing and then increasing.Moreover, the AR(avg) value of the LPA algorithm is always higher than that of the other algorithms.Figure 8c shows that as |T| increases, the AR(avg) values of the LPA algorithm fluctuate within the range of approximately 0.80 and are always higher than those of the other four algorithms.The reason for this phenomenon is that the LPA algorithm considers three techniques as candidate techniques, and the application of the LP function reduces the frequency of the suppression and reduces the information loss of anonymity processing.

Frequent Sequential Pattern Mining
We present the average frequent sequential pattern remaining ratio FSP(avg) for the variable of the user's privacy requirements threshold P t , the average length of a trajectory |t|, and the size of the trajectory dataset |T|.In our experiments, the frequent pattern threshold is defined as 2. The results are shown in Figure 9.For each parameter change, the LPA algorithm has the highest FSP(avg) values.To summarize, compared with the other algorithms [8,10], the performanc LPA algorithm is better in terms of data utility.Additionally, this result indicates LPA algorithm achieves higher data utility in anonymity and effectively balances utility and privacy preservation.

Discussion
We mainly focus on the research of privacy protection methods of traject publishing by trajectory data processing.The data type is semantic location (such transaction data).Considering potential attackers, we can further infer the u location of users that may be inferred by attackers through identifying the privacy A new privacy protection method is proposed to solve the problem of privacy di in the analysis and reuse of such data.We conducted an experimental evaluatio public dataset, and the experimental results show that our method can effectively trajectory privacy protection, reduce information loss, and improve the data uti preprocessing part and storage structure of the tree in the first stage can red running time of identifying the privacy threats, which has been proved in other w In the second stage, we consider the mixed use of multiple technologies to imp utility of anonymous data.At the same time, this method can be further exten data-type privacy protection method with different types of sensitive attrib semantic location, which is the next research content.As shown in Figure 9a, for LPA, L SUP , and IL SUP * , the general trend is that the FSP(avg) values increase as P t increases, while the trends of G SUP and IG SUP * have no obvious upwards motion.Furthermore, the FSP(avg) values of the LPA algorithm are significantly better than those of the other algorithms because the LP function of the LPA algorithm gradually processes the problematic nodes and merges the nodes, which improves the FSP(avg).In Figure 9b, the FSP(avg) values gradually increase with an increasing |t|, and the FSP(avg) value of the LPA algorithm is always higher than that of the other algorithms.The reason for this result is that the longer the average trajectory length is, the higher the number of problems.Similar to the data in the previous two experimental results, Figure 9c shows that as |T| increases, the FSP(avg) values show an overall upwards trend, and the values of LPA are overall higher than those of the other algorithms.It can be concluded that the overall performance of FSP(avg) in LPA is better than that of the other algorithms.

Conclusions
To summarize, compared with the other algorithms [8,10], the performance of the LPA algorithm is better in terms of data utility.Additionally, this result indicates that the LPA algorithm achieves higher data utility in anonymity and effectively balances the data utility and privacy preservation.

Discussion
We mainly focus on the research of privacy protection methods of trajectory data publishing by trajectory data processing.The data type is semantic location (such as user transaction data).Considering potential attackers, we can further infer the unknown location of users that may be inferred by attackers through identifying the privacy threats.A new privacy protection method is proposed to solve the problem of privacy disclosure in the analysis and reuse of such data.We conducted an experimental evaluation on the public dataset, and the experimental results show that our method can effectively achieve trajectory privacy protection, reduce information loss, and improve the data utility.The preprocessing part and storage structure of the tree in the first stage can reduce the running time of identifying the privacy threats, which has been proved in other work [10].In the second stage, we consider the mixed use of multiple technologies to improve the utility of anonymous data.At the same time, this method can be further extended to a data-type privacy protection method with different types of sensitive attributes in semantic location, which is the next research content.

Conclusions
In this paper, we proposed a novel privacy-preserving method for trajectory data publication based on LPA, which is used to defend against attackers and infer other unknown locations through partial background knowledge in their possession.We designed the operation of splitting and dummy trajectory based on the tree structure and considered suppression, splitting, and dummy trajectory adding as candidate techniques.Then, we proposed an LP function based on the analysis of location loss and anonymity gain to select the final operation for each problematic node.The empirical results illustrated that the LPA algorithm reduces information loss and improves data utility to a certain extent.There is also room for improvement in the selection function, and we will continue to study this problem and consider the relevance of different types of sensitive attributes in semantic location in our ongoing work.

Patent
A patent of the same name has been applied for the relevant research content of this manuscript (No.CN202210178099.9),which entered the substantive examination stage on 27 May 2022.

Figure 1 .
Figure1.The TP-tree in Table1.The 2 -level nodes of the TP-tree correspond to projection trajectories, and the 3 -level nodes correspond to locations that may be inferred.

Figure 1 .
Figure1.The TP-tree in Table1.The 2nd-level nodes of the TP-tree correspond to projection trajectories, and the 3rd-level nodes correspond to locations that may be inferred.

22 Figure 2 .
Figure 2. Overview of the proposed methods (LPA).N is the total number of problems.

Figure 2 .
Figure 2. Overview of the proposed methods (LPA).N is the total number of problems.

22 Example 8 .
and the total number of problems N is 16.Information 2023, 14, x FOR PEER REVIEW 10 of After constructing the TP-tree of T, Algorithm 1 is executed to find the problematic nodes of the TP-tree.In our example, the user's privacy requirements threshold  is defined as 0.5.After executing Algorithm 1, as Figure3demonstrates the problematic nodes are shown in the dotted box, the problematic set  =  →  ,  →  ,  →  ,  →  ,  ,  →  →  →  ,  →  ,  →  ,  , and the total number of problems N is16.

Example 9 . 22 Figure 4 .
Figure 4. Updating of the TP-tree by using suppression.The number ① represents the location  was removed from 2 -level node  →  , and updated the remaining nodes, and ② represents 3 -level node  and also be deleted.

Figure 4 .
Figure 4. Updating of the TP-tree by using suppression.The number 1 represents the location b 1 was removed from 2nd-level node b 1 → b 2 , and updated the remaining nodes, and 2 represents 3rd-level node b 1 and also be deleted.

Figure 5 .
Figure 5. Updating of the TP-tree by using splitting.The number 1 represents t 7 is split from b 1 to obtain t 7 and t 7 , and 2 represents the construction of a new TP-tree* about the trajectory t 7 and t 7 .

Figure 6 .
Figure 6.Updating of the TP-tree by using a dummy trajectory.The number ① represents a dummy trajectory  =  →  is added.

Figure 6 .
Figure 6.Updating of the TP-tree by using a dummy trajectory.The number 1 represents a dummy trajectory t 7 = b 1 → b 2 is added.

Example 12 .
According to the LP function, the final operation for the problematic node b 1 → b 2 is suppression.4.5.Algorithms Anlysis 4.5.1.Trajectory Privacy Preservation Capability Consider an original trajectory dataset T, and let T * be the anonymized trajectory dataset of T. Theorem 1.The LPA algorithm can derive a corresponding safe dataset T * of the original trajectory dataset T.

Algorithm 3 :
Local_Preferential_Anonymity (LPA)Input: Original trajectory dataset T, attackers set Adv, user's privacy requirements threshold P t Output: A corresponding safe dataset T * 1:

Author Contributions:
Conceptualization, X.Z. and Y.L.; methodology, X.Z.; validation, X.Z., L.X. and Z.L.; investigation, L.X. and Z.L.; resources, Q.Y.; data curation, X.Z.; writing-original draft preparation, X.Z.; writing-review and editing, Y.L. and Q.Y.; visualization, Z.L.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the National Natural Science Foundation of China, grant number 62272006 and the University Collaborative Innovation Project of Anhui Province, grant number GXXT-2019-040.The APC was funded by the National Natural Science Foundation of China, grant number 62272006.

Table 2 .
Trajectory projection set T A of T.

Table 3 .
Array of PreGain.We introduce the operation of suppression based on the TP-tree.For a problematic node, we first find two nodes of the same attacker from the 2nd level of the TP-tree to form a node pair t A m , t A n .Notably, t A n is the subtrajectory of t A m .The idea of suppression is to use t A n to unify t A m .In other words, the locations that exist in t A n but not in t A m are deleted.The details of the tree updating are described in Example 9.

Algorithm 2: Local_Preferential (LP) Input:
Pending problematic node c, user's privacy requirements threshold P t , TP-tree Output: Array of final operation FinGain mt) be the projections with maximum S gain (l, t) //Problematic Node ← t A c & Final Operation ←split & Operation Information ← l & Anonymity Gain ← S gain(ml, mt)

Table 4 .
A corresponding safe dataset T * of T.