Next Article in Journal
Lean 5S + Safety and Work-Related Injuries in an Aluminum Casting Plant: A Five-Year Department-Stratified Analysis
Previous Article in Journal
Experimental Investigation and Predictive Modeling of Surface Roughness in Dry Turning of AISI 1045 Steel Using Power-Law and Response Surface Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Trajectory Privacy Protection Scheme Based on the Replacement of Stay Points

School of Cyber Security and Computer, Hebei University, Baoding 071000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1391; https://doi.org/10.3390/app16031391
Submission received: 15 December 2025 / Revised: 24 January 2026 / Accepted: 27 January 2026 / Published: 29 January 2026

Abstract

Location-based services generate a large amount of location and trajectory data, which contain rich spatiotemporal and semantic information. Publishing these data without proper protection can seriously threaten users’ trajectory privacy. Existing trajectory privacy protection schemes generally fail to consider the dependency between a stay point and its preceding location and also overlook the relationship between the semantic information of location and privacy. Moreover, they often suffer from issues such as over-protection. Therefore, this paper proposes a trajectory privacy protection scheme based on the replacement of stay points. First, a stay point extraction algorithm is proposed, which extracts users’ stay points by setting distance and time thresholds based on the principle of the sliding window. Then, this paper proposes a location perturbation algorithm based on the vector indistinguishability mechanism and introduces different protection strategies for ordinary stay points and long-duration stay points, respectively. Finally, the perturbed trajectory is adjusted by generating a certain number of location points near the replacement points to maintain the temporal continuity and integrity of the trajectory. The experimental results indicate that it is necessary to provide more meticulous protection for long-duration stay points. Compared with similar schemes, the proposed scheme in this paper achieves higher data utility while ensuring privacy.

1. Introduction

With the widespread adoption of mobile devices and the rapid development of positioning technologies, location-based services (LBSs) have garnered increasing attention and usage [1]. They provide services and convenience for people in various aspects, such as location navigation, nearby searches, and urban computing. However, while location-based services bring great convenience to people, they also generate a large amount of location and trajectory data. These data contain rich spatiotemporal information and are closely related to users’ privacy. If these data are not protected, malicious attackers can use various data mining techniques and background knowledge to extract sensitive information related to users, such as their home addresses and workplaces. They may even be able to track users’ movements. Therefore, how to balance data utility and privacy quality in trajectory privacy protection is an important scientific issue [2].
In recent years, scholars both at home and abroad have proposed a variety of privacy protection schemes for trajectory privacy. These schemes can be broadly classified into three categories based on the techniques they use: cryptographic techniques [3], anonymization techniques [4], and differential privacy techniques [5]. Cryptographic techniques are effective methods for protecting data privacy. However, methods based on cryptographic techniques have limitations in the context of location-sharing services, as encrypted location and trajectory data are difficult to directly search and utilize. The main idea of anonymization techniques is to generalize a user’s true location to an anonymous region that contains at least k users [6]. This makes the probability of an attacker successfully identifying the user’s true location 1/k, thereby achieving the goal of privacy protection. However, there is still no clear quantitative standard for the size of the generalized area. An overly large generalized area can affect the quality of location-based services, while an overly small area can easily lead to location privacy leakage. Some researchers have proposed variations in k-anonymity [7], such as l-diversity and t-closeness [8], but there are still limitations. Moreover, single anonymization techniques cannot defend against attacks with background knowledge, such as deFinetti attacks [9] and composite attacks. Differential privacy, proposed by Dwork [10], is a general privacy protection strategy with a rigorous mathematical model. It prevents attackers from determining whether a single record has been modified or deleted based on changes in the output results, thereby defending against attacks that utilize background knowledge. However, the original differential privacy technique is only applicable to one-dimensional data and cannot be used for location data on a two-dimensional plane [11].
Trajectory data is composed of a sequence of location data, which can be categorized into moving points and stay points [12]. Moving points reflect the locations that a user passes through, while stay points indicate the user’s sensitive locations. By analyzing stay points, one can identify the locations that a user frequently visits. From this information, it is possible to infer the user’s daily activities, health conditions, and even religious beliefs. Therefore, compared with moving points, stay points are more likely to reveal users’ sensitive information. By protecting these sensitive stay points, not only can users’ privacy be ensured, but also the disruption to the original trajectory can be minimized. Moreover, a good balance between data utility and privacy protection can be achieved. Existing schemes for protecting stay points fail to consider the dependency between a stay point and its preceding location. They simply select a location near the stay point using the planar Laplace mechanism for replacement. This approach may result in the replacement location coinciding with or being in the opposite direction of the preceding location. Not only does this reduce data utility, but it also makes it easier for attackers to infer and guess that the position is unreasonable through reasoning. Some schemes neglect the moving points. For instance, in a previous scheme [13], the scheme extracted the semantic information of stay points in trajectories and perturbed the sensitive information. The advantage of this scheme is that it can effectively protect stay points. However, it ignores the moving points and reduces data utility. When multiple users pass through the same road without stopping, this scenario is ignored in the scheme. Yet, it is evident that such a road is quite important.
In most schemes, trajectory data is regarded as a sequence of locations in Euclidean space, considering only the spatiotemporal attributes of the trajectory while neglecting the semantic attributes of the locations [14]. Schemes that do take semantic attributes into account almost entirely overlook the duration of users’ stays at those semantic locations [15]. Generally speaking, the longer the duration of stay at a location, the more important that location is to the user. For hospitals, if a user is a doctor, their stay duration is likely to exceed three hours or even more, while if they are a regular patient, their stay duration is generally not more than one hour. Many schemes employ a uniform protection strategy for stay points, neglecting the fact that long-duration stay points are more indicative of user privacy [16]. The duration of stay at a point can be calculated after the trajectory is published, and protection for long-duration stay points should be more meticulous than that for ordinary stay points. Moreover, stay points of different users may also have potential connections and can form hot stay areas. Considering these hot stay areas when formulating protection strategies for stay points can help better conceal the stay points.
In addition, existing schemes based on perturbation technology are to perturb and protect the positions of the entire trajectory [17]. This strategy may not fully protect the user’s sensitive positions, and over-protect the non-sensitive positions that do not need protection. This is not only inefficient but also reduces the availability of data. The core attack targets of attackers are sensitive locations such as users’ stay points, while the privacy information carried by non-sensitive locations themselves is relatively limited. Enhancing the privacy of these locations is a redundant operation. Redundant protection can significantly increase computational overhead. For instance, the computational overhead of overall noise addition to a trajectory containing thousands of sampling points is much higher than that of processing only sensitive positions such as its stay points. Meanwhile, excessive noise superposition will severely distort the original motion characteristics of the trajectory, leading to a decline in the accuracy of tasks using this trajectory and ultimately reducing the practical usable value of the data [18].
As Table 1 shows, while prior works [12,13,17] have made notable contributions, they often fail to address the dependency between a stay point and its preceding location, leading to implausible mutations. Additionally, they lack mechanisms to ensure semantic rationality and often suffer from over-protection due to uniform processing of all locations. In contrast, this paper proposes an integrated approach leveraging vector indistinguishability and semantic location trees to preserve both structural and semantic integrity. Moreover, we refine the stay point replacement strategy by implementing discriminative protection for ordinary versus long-term stay points, effectively mitigating the over-protection issue and optimizing the overall utility. The main contributions of this paper are as follows:
  • First, this paper proposes a stay point extraction algorithm that identifies users’ stay points by setting distance and time thresholds based on the sliding window principle.
  • Then, this paper proposes a location perturbation algorithm based on the vector indistinguishability mechanism and introduces distinct protection strategies for ordinary stay points and long-duration stay points, respectively.
  • Finally, the perturbed trajectory is adjusted by generating a certain number of location points near the replacement points to maintain the temporal continuity and integrity of the trajectory.
The remainder of this paper is organized as follows: Section 2 introduces the relevant background knowledge; Section 3 details the proposed scheme in this paper; Section 4 describes the experimental analysis of the scheme; Section 5 describes the limitations and future work; and Section 6 concludes the paper.

2. Preliminary Section

2.1. Trajectory Semantics Related

Definition 1.
(Trajectory data TR) [19]: A trajectory is defined as a sequence of position-time pairs and is represented as  T R = { p 1 , t 1 ,   p 2 , t 2 ,   , ( p n , t n ) } , where  p i  denotes the latitude and longitude coordinates  ( x i , y i ) .  T R  indicates the length of the trajectory sequence,  t i  represents the time corresponding to the position  p i .
Definition 2.
(Stay Point): A stay point is characterized by a set of consecutive points and can be represented as  s p = { p i ,   p i + 1 , , p j } , satisfying  D i s t ( p i ,   p j ) D t h  and  I n t ( p i ,   p j ) T t h , where  D i s t ( p i ,   p j )  denotes the Euclidean distance between two positions, and  I n t ( p i ,   p j )  denotes the time difference between the sampling times of the two points.
Definition 3.
(Location Semantic Tree) [17]: The semantic attribute set of a trajectory can be represented as  S = { a 1 , a 2 , , a m } . Based on the semantic attribute set  S , a location semantic tree can be constructed, denoted as  T = ( V , E , g ) , where V is the set of nodes representing different basic semantic attribute categories, E is the set of edges between nodes representing the relationships between them, and g is a labeling function that assigns some semantic attributes from S to each node in the set V. In a location semantic tree, the leaf nodes are composed of the semantic attributes of the locations.
Definition 4.
(Hot Stay Area): During a specific time window  T w , such as 9:00–18:00 on weekdays, a geographical area where the density of user stay points exceeds the threshold  ρ  and the average stay duration exceeds the time threshold  t w  is defined as a hot stay region.

2.2. Differential Privacy Related

Definition 5.
( ε -Differential Privacy) [20]: For any two trajectory datasets  D  and  D , if the output  S R a n g e ( M )  of a randomized algorithm M satisfies Equation (6), then the randomized algorithm M is said to satisfy ε-differential privacy.
P r   [ M ( D ) S ] e ε × P r   [ M ( D ) S ] ,
where M ( D ) represents the result obtained by using D as the input to the algorithm M, P r   [ M D S ] represents the probability that the output M(D) equals S.
Definition 6.
(Vector Indistinguishability) [21]: A randomized mechanism K satisfies ε-vector indistinguishability if and only if, for any two input vectors u , u 𝒰 , where  𝒰 R n , the following condition holds:
V d χ ( K ( u ) , K ( u ) ) ϵ d 𝒰 ( u , u )
d 𝒰 = u u
u = x t x t 1
where 𝒰 is the set of possible vectors in a continuous metric space. The d 𝒰 ( · , · ) measures the distance between two vectors and satisfies the metric axioms such as indistinguishability. The V d χ is the distinguishability level between u and u , where a smaller V d χ indicates a stronger level of privacy. When the difference between u and u is larger, it means their subtraction vectors become larger in terms of distance, direction, or both.
Definition 7.
(Distance Indistinguishability) [21]: A randomized mechanism  K M  satisfies  ϵ m -distance indistinguishability if and only if the magnitudes  | u | , | u |  of any two vectors  u , u 𝒰  and any output  | v | , then the following condition holds:
K M ( | u | ) ( | v | ) e ϵ m d d i s t a n c e K M ( | u | ) ( | v | )
d d i s t a n c e = | | u | | u | |
d 𝒰 = d d i s t a n c e · u u u u = c o n s t
where d d i s t a n c e is the difference between | u | and | u | . The d 𝒰 is obtained by multiplying d d i s t a n c e with the unit vector u u . If the vectors u and u have the same direction and their magnitudes | u | and | u | are very close, then the multiplier e ϵ m d d i s t a n c e will be close to 1. In this case, it is almost impossible to distinguish whether the output vector | v | is generated from | u | or | u | .
Definition 8.
(Direction-Indistinguishability) [21]: A randomized mechanism  K D  satisfies  ϵ d -direction indistinguishability if and only if for any two directions  α  and  α  of two vectors  u  and  u , and any output angle  θ , the following condition holds:
K D ( α ) ( θ ) e ϵ d d a n g l e K D ( α ) ( θ )
d a n g l e = | α α |
d 𝒰 = c o s c o n s t · u u · cos d a n g l e + s i n c o n s t · u · s i n ( d a n g l e )
where u = u = c o n s t . The d a n g l e is the difference between α and α . The d 𝒰 is a specific function of d a n g l e , c o s c o n s t and s i n c o n s t are a pair of orthogonal basis vectors. When α and α are close in direction, their output distributions K D ( α ) ( · ) and K D ( α ) ( · ) are similar.

3. Trajectory Privacy Protection Scheme

3.1. Overview

Existing trajectory privacy protection schemes generally fail to consider the dependency between a stay point and its preceding location, overlook the relationship between the semantic information of location and privacy, and also suffer from issues such as over-protection. To address these issues, this paper proposes a trajectory privacy protection scheme based on the replacement of stay points, as shown in Figure 1. The scheme takes user trajectory as input. First, it extracts stay points using the sliding window principle. Then, it perturbs the stay points using techniques such as hot stay area, vector indistinguishability and semantic location tree. Finally, it reconstructs the trajectory to obtain a publishable version.

3.2. Symbols

In this section, we define the symbols used in this paper and explain their meanings, as shown in Table 2 below.

3.3. Detail

3.3.1. Extract Stay Points

In this section, to extract users’ stay points, this paper proposes a stay point extraction algorithm based on the sliding window principle, as shown in Algorithm 1.
Algorithm 1: Extract Stay Points
Input: TR, T s t a y , D s t a y ;
Output: SPset
1.       SPset =
2.        l = 1
3.       while l < | T R | :
4.              r = l + 1
5.              s p = [ p l ]
6.             while r < | T R | and D i s p l , p r D s t a y :
7.                 sp.append( p r )
8.                 r = r + 1
9.             if s p . t T s t a y :
10.               Compute sp.x,sp.y,sp.t
11.               SPset.append(sp)
12.          l = r
13.     return SPset
The algorithm takes the user’s trajectory sequence TR, time threshold T s t a y and distance threshold D s t a y as inputs, and outputs a set of stay points SPset. Specifically, the algorithm initializes the left pointer l , and enters a loop that continues as long as l is less than the length of the trajectory T R . In the loop, a temporary right pointer r is set and initialized to r = l + 1 . A temporary stay point variable is declared, and the right pointer r is expanded by continuously adding trajectory points that meet the criteria to the temporary stay point. The expansion stops when the condition D i s ( p l , p r ) D s t a y is no longer satisfied. If the stay time of the stay point exceeds the time threshold T s t a y , the stay point is added to the set of stay points SPset. The left pointer l is then updated to l = r , and the loop continues until the end of the trajectory. After the loop ends, the algorithm outputs the set of stay points SPset for the user’s trajectory.

3.3.2. Perturb Stay Points

This paper categorizes stay points into ordinary stay points and long-duration stay points based on the duration of stay. Long-duration stay points are those that exceed a certain threshold of stay time compared to ordinary stay points. Given the different levels of significance associated with these two types of stay points, this paper proposes two distinct protection strategies.
Perturb Ordinary Stay Points
To ensure the dependency between a stay point and its preceding location, this paper introduces the vector indistinguishability mechanism, which is composed of the distance indistinguishability mechanism and the direction indistinguishability mechanism [21].
Specifically, Vector Indistinguishability is an extension of Differential Privacy. Conceptually, while ε -DP mathematically limits the probability of inferring the true data (focusing on privacy), Vector Indistinguishability further constrains the direction and magnitude of the perturbation. This constraint effectively ensures the spatial continuity between trajectory points (focusing on utility).
The relationship data between a location p i and its preceding location p i 1 consists of the distance M and direction α between the two locations. The formulas for calculating the distance M and direction α are as follows:
M = u = p i . x p i 1 . x 2 + p i . y p i 1 . y 2
α = a r c t a n p i . y p i 1 . y p i . x p i 1 . x
For distance privacy, this paper introduces a mechanism to sample a random distance l , with the probability being inversely proportional to the closeness to the true distance M :
P r l M e x p ϵ m d d i s t a n c e l , M
where l and M are continuous. Generally, the range of l is controlled to be a finite range determined by M. The range of l can be taken as [ 0 ,   2 M ] , and the corresponding probability function is:
K M l , M = C M · e x p ϵ m l M l 0,2 M
C M = 1 2 · ϵ m 1 e ϵ m · M
This indicates that the true distance is protected within the range [ 0 ,   2 M ] , with the distinguishability boundary being ϵ m · M , where ϵ m is the privacy budget, and its unit is the reciprocal of the unit of M .
For directional privacy, this paper introduces a mechanism to sample a random direction θ , controlling the range of θ to be determined by α . The range of θ is ( π + α , π + α ] , and the corresponding probability distribution function is:
K D θ , α = C D · e x p ϵ d θ α θ ( π + α , π + α ]
C D = 1 2 · ϵ d 1 e ϵ d · π
This indicates that the true direction is protected within the range ( π + α , π + α ] , with the distinguishability boundary being ϵ d · π , where ϵ d is the privacy budget, and its unit is r a d 1 .
These two mechanisms satisfy differential privacy, and thus, according to the parallel composition property of differential privacy, they can be combined to achieve the vector indistinguishability mechanism. When the vector indistinguishability mechanism is applied to a vector, it samples a random distance l from a variant of the Laplace mechanism centered at distance M over the range [ 0 ,   2 M ] , and simultaneously samples a random direction θ from another variant of the Laplace mechanism centered at direction α over the range ( π + α , π + α ] .
Considering the dependency between two consecutive locations, this paper proposes a location perturbation algorithm based on the vector indistinguishability mechanism, a specified number of candidate point sets can be generated, as shown in Algorithm 2.
Algorithm 2: Location perturbation
Input: sp, p p r e , k, ε O D , ε O M
Output: PTset
1.       PTset=
2.        u = s p p p r e
3.        M = u
4.        α = a n g l e ( u )
5.       for 1 to k do
6.               l = s a m p l e _ f r o m _ K m ( M , ε O M )
7.               θ = s a m p l e _ f r o m _ K d ( α , ε O D )
8.               p = l · c o s θ , l · s i n θ
9.               z = s p + p
10.         P T s e t . a p p e n d ( z )
11.     end for
12.     return PTset
Algorithm 2 takes a stay point sp, the preceding point p p r e , repetition parameter k, and privacy budget ε O D , ε O M as inputs, and outputs a set of candidate perturbed points PTset. First, it calculates the vector u between the stay point sp and its preceding point p p r e , and computes the magnitude M and direction α of this vector. Then, using M, α, ε O D , ε O M as parameters, it obtains a random distance l from the distance indistinguishability mechanism and a random direction θ from the direction indistinguishability mechanism. It calculates the position increment p = l · c o s θ , l · s i n θ based on l and θ , and generates a candidate perturbed point z = s p + p , which is added to the candidate perturbed point set PTset. This process is repeated k times to obtain a candidate perturbed points set of size k.
For ordinary stay points, a set of candidate perturbed points is obtained through the Algorithm 2, and each perturbed point is assigned a location semantic through querying. Specifically, we utilize the Baidu Map API to perform a reverse lookup of semantic locations using latitude and longitude coordinates. The location semantic tree is constructed based on the location semantic of the stay point and the candidate perturbed points, according to the geographical information point of interest (POI) classification criteria, as shown in Figure 2. For a specific POI, its category can be identified. Thus, a candidate point with a different semantic category is randomly selected as the replacement point. For example, if the location semantic of the stay point is Restaurant A, then Mall A or Hotel A could be used as the replacement point. However, it is preferable not to use Restaurant B as the replacement point, because semantic similarity might allow attackers to infer the purpose of the trip.
Perturb Long-Duration Stay Points
Since long-duration stay points contain more sensitive information, we enhance the privacy of stay points by hot spot areas while preserving the dependency between the two points. Hot stay areas are areas formed by users’ stay points, and selecting appropriate replacement points within these areas provides enhanced privacy. This paper proposes an algorithm for extracting hot stay area, as illustrated in Algorithm 3.
Algorithm 3: Hot Stay Area Detection
Input: SPset, T w , ρ , t w
Output: Hot-stay Area Set H
1.      filter S by time window T to obtain St
2.      Perform spatial clustering on S using K-means clustering
3.      For each cluster C:
4.         Compute density d=|C|/Cluster area),
5.         Calculate average stay time t a v g = ( t e n d t s t a r t ) / | C |
6.         if d ≥   ρ , t a v g     t w , then C is a Hot Stay Area and append C to H
7.      return H
The algorithm takes the stay point set SPset, time window T w , density threshold ρ , and duration threshold t w as input parameters. It first filters the stay point set using the time window T w to obtain all stay points within that period. Then, the K-means clustering method is applied to the filtered set of stay points, with the number of clusters k determined by the elbow method. For each cluster, the stay point density and average stay time are calculated. If the density d is greater than ρ and the average stay time exceeds t w , the cluster is identified as a hot stay area and added to the hot stay area set H. After all operations are completed, the algorithm returns a hot stay area set H.
Given that long-duration stay points involve higher privacy sensitivity, we restrict the candidate perturbation point set within hot stay areas to achieve stronger privacy protection. Therefore, this paper proposes a candidate perturbation point set generation algorithm suitable for long-duration stay points, as shown in Algorithm 4.
Algorithm 4: Location perturbation for long-duration stay points
Input: sp, p p r e , k, ε L M , ε L D , H
Output: PTset
1.       PTset=
2.        u = s p p p r e
3.        M = u
4.        α = a n g l e ( u )
5.       for 1 to k do
6.               l = s a m p l e _ f r o m _ K m ( M , ε L M )
7.               θ = s a m p l e _ f r o m _ K d ( α , ε L D )
8.               p = l · c o s θ , l · s i n θ
9.               z = s p + p
10.         if z in H :
11.                 P T s e t . a p p e n d ( z )
12.         else:
13.                k = k + 1
14.     end for
15.     return PTset
The algorithm takes long-duration stay point sp, the preceding point p p r e , repetition parameter k, privacy budget ε L M , ε L D and the hot stay area H where the long-duration stay point sp is located as inputs, and outputs a set of candidate perturbed points PTset. Algorithm 4 is similar to Algorithm 2, except that instead of directly adding the generated perturbation points to the candidate point set, only those points that fall within the hot stay area are added. And the newly allocated privacy budgets ε L M and ε L D are utilized in the process.
To protect long-duration stay points, we first filter them out using a predetermined fixed threshold. Then, Algorithm 4, which is based on hot stay areas and a vector indistinguishability mechanism, is employed to generate a candidate perturbation point set. A location semantic label is queried and obtained for each candidate perturbation point. Subsequently, a location semantic tree, as illustrated in Figure 2, is constructed using the semantic information of the long-duration stay points and the candidate perturbation points. Finally, a candidate point with a semantic label different from that of the long-duration stay point is randomly selected from the semantic tree as the replacement point.

3.3.3. Reconstruction Trajectory

The proposed scheme in this paper perturbs stay points. Simply replacing an entire stay point with a single perturbed point would disrupt the integrity of the trajectory, thereby reducing the usability and reliability of the data. Therefore, it is necessary to generate a certain number of location points near the perturbed point to ensure the temporal continuity and integrity of the trajectory before and after processing.
When reconstructing the trajectory, the requirement is that the range involved in the perturbed stay point should be similar or identical to the range before replacement, and the number of location points within the range of the perturbed stay point should remain the same. The process of reconstructing the trajectory is shown in Figure 3. First, determine the size of the stay point range and the number of location points involved. Then, randomly generate the same number of location points within the range of the perturbed stay point.

3.4. Privacy Budget Allocation

To meet a balance between privacy quality and data utility, we propose an allocation strategy based on the sensitivity of stay points.
Assuming a total privacy budget of ε t o t a l , we introduce a weight factor β , where β represents the proportion of the budget allocated to long-duration stops, with 0 < β < 1 . Then, 1 β represents the proportion of the budget allocated to regular stops. According to the principle of Sequential Composition, the budget for long-duration stay points ε L is equal to β · ε t o t a l , while the budget for ordinary stay points ε O is equal to ( 1 β ) · ε t o t a l .
Given that direction indistinguishability and distance indistinguishability are equally important, the budget can be evenly distributed. Thus, for ordinary stay points, the privacy budgets for direction indistinguishability and distance indistinguishability are allocated as ε O D = 0.5 ε O and ε O M = 0.5 ε O , respectively. Similarly, for long-duration stay points, the corresponding budgets are allocated as ε L D = 0.5 ε L and ε L M = 0.5 ε L .
If there are n 1 ordinary stopping points and n 2 long-duration stopping points, the privacy budget allocated to each ordinary stopping point is ε O E = ε O / n 1 , and the privacy budget allocated to each long-duration stopping point is ε L E = ε L / n 2 .
The entire scheme introduces differential privacy technology only in stay point perturbation, specifically adopting a perturbation algorithm based on vector indistinguishability. When injecting noise into the coordinate vectors of stay points in trajectory data, the algorithm strictly follows the indistinguishability constraints in the vector space, ensuring that the perturbation results of adjacent trajectories (with differences only in a single stay point) satisfy the definition of ε-differential privacy in terms of probability distribution. Furthermore, according to the sequential composition theorem of differential privacy [22], since there is only one application of the differential privacy mechanism in the scheme and no other parallel or serial noise operation superposition, the privacy budget overhead of the entire trajectory privacy protection scheme can be accurately controlled to a single ε, and the whole scheme satisfies the property of ε-differential privacy.

3.5. Complexity Analysis

This section analyzes the time complexity of the proposed trajectory privacy protection scheme.
Let n denote the length of the original trajectory (i.e., the number of GPS points), and let k represent the number of clusters generated by the K-means algorithm. The process of extracting stopping points involves a linear scan of the data, with a time complexity of O ( n ) .
For the detection of Hot Stay Area, we use the K-means clustering algorithm, whose complexity depends on the number of iterations I and the number of clusters k. Therefore, the complexity of this step is O ( I · k · n ) . In practical applications, due to k being much smaller than n and I converging rapidly, this step is approximately linear relative to O ( n ) .
In addition, the vector indistinguishability perturbation mechanism separately processes each stopping point, calculating the noise vector while satisfying direction and distance constraints, which involves linear traversal of O ( n ) .
Overall, the total time complexity of the proposed scheme is O ( I · k · n ) . However, considering that both k and I are constants in actual operation, the algorithm actually operates in linear time O(n) runs. This linear complexity ensures that our method has high scalability and is suitable for processing large-scale trajectory datasets in real-time location services (LBS).

4. Experimental Section

4.1. Experimental Environment

The experiments are conducted in a PC environment with the following specifications: the operating system is Windows 10, the processor (CPU) is an Intel(R) Core(TM) i5-9300H CPU @ 2.40 GHz, the memory (RAM) is 16GB, and the softwares used are PyCharm Community Edition 2023.3.1 and Python 3.11.

4.2. Dataset

The dataset used in the experiments is the Geolife Trajectory Dataset [23]. This dataset consists of real trajectory data from 182 users collected from April 2007 to August 2012, including 17,621 trajectories with a total distance of approximately 1.2 million kilometers and a total duration of 50,176 h. It records a wide range of outdoor activities of users, including not only daily routines such as going home and to work but also recreational and sports activities like shopping, sightseeing, dining, hiking, and cycling. The trajectory density of the Geolife dataset realistically reflects the complexity of real-world scenarios. However, high sampling frequencies can lead to excessive computational costs in experiments. To balance data authenticity with experimental feasibility, the original dataset was downsampled, with the trajectory point collection interval uniformly adjusted to 1 min.
Although the dataset is over a decade old and human behavior patterns have indeed evolved during this period, the fundamental laws of human mobility—which exhibit statistical robustness across different eras—remain unchanged. Therefore, the dataset remains valid for this study.

4.3. Experimental Results and Analysis

4.3.1. Impact of Time and Distance Thresholds on the Number of Stay Points

The experiment employed grid search to evaluate the impact of various parameter combinations on the results. The time threshold was set at 5 min, 10 min, 60 min, 120 min, and 240 min, while the distance threshold was set at 50 m, 100 m, and 200 m. This resulted in 15 cross-experimental conditions to explore the effects of time and distance thresholds on the number of stay points.
Two users’ trajectories are randomly selected for analysis, and the results are shown in Figure 4 and Figure 5. The results showed that User 1 and User 2 had 54 and 32 stay points, respectively, with a duration of up to 240 min within a 50-meter range. These long-duration stay points are highly sensitive and likely to include locations closely related to user privacy, such as home addresses or workplaces. The experimental results indicate the necessity for more meticulous protection of long-duration stay points to further ensure user privacy.

4.3.2. Privacy Quality

In the experiment, the average offset distance is used as an indicator of privacy quality. The average offset distance is the average of the Euclidean distances between all perturbed points and their original points, defined by the following formula:
A O D = 1 n j n D i s ( p , p )
where n is the number of trajectory points, p represents the original trajectory point, and p represents the perturbed trajectory point. A larger average displacement distance indicates higher privacy quality.
To verify the effectiveness of the proposed scheme, we conducted experimental comparisons with similar schemes based on differential privacy (Geoind [24], TLDP [25], and PTPP [26]). The experimental results are shown in Figure 6. As the privacy budget increases, the Average Offset Distance (AOD) decreases. This is because a higher privacy budget results in less noise being added, bringing the perturbed points closer to the original points, thereby reducing the AOD. From Figure 6, it can be seen that the proposed scheme outperforms the Geoind and PTPP schemes and is comparable to the TLDP scheme, especially performing better than TLDP at lower privacy budgets.

4.3.3. Data Utility

In the experiment, the Root Mean Square Error (RMSE) is used as an indicator of data utility. The formula for calculating RMSE is as follows:
R M S E = 1 n j n ( p p ) 2
A smaller RMSE indicates lower data loss and higher data utility. The experimental results, compared with similar schemes, are shown in Figure 7.
As seen from Figure 7, the RMSE decreases with an increase in the privacy budget. This is because a larger privacy budget results in less noise being added, leading to a smaller average error between the true location and the perturbed location. The experimental results indicate that the RMSE of the proposed scheme is significantly lower than that of other schemes, demonstrating better performance in terms of data utility.

4.3.4. Average Recognition Rate Experiment

The average recognition rate is an indicator used to quantitatively evaluate the attacker’s effectiveness and the defense capability of privacy protection mechanisms. The lower the value, the stronger the scheme’s resistance to semantic attacks. The calculation formula for the average recognition rate is as follows:
A R R = 1 N 1 N P i
where N represents the size of the set of sensitive semantic locations, and P i represents the probability that the attacker successfully identifies this sensitive semantic location.
Figure 8 illustrates the changes in the average recognition rates of different methods (SSAR [27], PVSP [28] and SSLR [14]) as the number of semantically sensitive positions increases. It can be observed that as the number of semantically sensitive positions grows, the average recognition rate of all methods shows an upward trend. This is because more semantically sensitive positions provide attackers with additional contextual knowledge, thereby increasing the probability of identifying user trajectories. Although the average recognition rate of the proposed scheme in this paper also increases accordingly, it consistently remains lower than that of the other methods, demonstrating its superiority in semantic privacy protection.

4.3.5. Sensitivity Analysis of β

To evaluate the impact of parameter β on privacy quality and data utility, we conducted experiments under the setting of a total privacy budget ε_total of 1.0. The value range of β in the experiments was set to 0.1 to 0.9.
From Figure 9, it can be observed that as β increases, RMSE exhibits a trend of first decreasing and then increasing. This is primarily attributed to the overwhelming majority of ordinary stopping points. When β is excessively large, the budget allocated to ordinary stopping points decreases sharply, leading to a significant increase in noise. Due to the large number of ordinary points, this increase in local error is amplified, thereby raising the overall average RMSE.
On the contrary, the AOD value continues to decrease as β increases, indicating a gradual increase in privacy risk. This is because a larger β means that more budget is allocated to long-stay points, resulting in too low noise and too high accuracy at these sensitive locations, making them more vulnerable to inference attacks. Considering both utility and privacy, the graph shows that a β value around 0.4 or 0.5 is a more ideal balance point.

4.3.6. Ablation Analysis

To verify the effectiveness of each core component in the proposed framework, we conducted an ablation study where we evaluated four configurations—Configuration A (Baseline, implementing only the vector indistinguishability mechanism), Configuration B (extending A by incorporating the semantic tree), Configuration C (extending B by adding the hot stay region detection), and Configuration D (Full Model, integrating all components)—across two dimensions (privacy quality and data utility), with the results presented in Figure 10 and Figure 11 below.
As observed, under varying privacy budgets, the performance strictly follows the order: Configuration D outperforms C, which outperforms B, which outperforms A. This consistent improvement confirms that each component plays a critical role and that no module is redundant.

5. Limitations and Future Work

While this paper mitigates certain existing challenges, it is not without limitations.
The data utility of this paper is only evaluated through geometric metrics such as RMSE, which can quantify spatial distortion but can fail to directly reflect the performance of downstream tasks. In addition, during the assignment of POI semantic labels, potential impacts of semantic noise, label misclassification, and other factors on experimental results are not considered, affecting the reliability of the results. Finally, the trajectory reconstruction module carries the risk of introducing spatial artifacts. Eliminating such artifacts requires additional constraints, such as integrating user movement pattern features and map-matching with road networks to ensure the physical rationality of trajectories. This optimization involves complex system design and goes beyond the core research scope of this paper, which focuses on constructing privacy protection mechanisms.
Future research will focus on the following aspects: how to link data utility evaluation with downstream tasks; quantifying the impact of noise generated by POI assignment; optimizing the trajectory reconstruction algorithm to eliminate the risk of spatial artifacts. Furthermore, for the protection of long-duration stay points, multi-POI replacement may be considered to enhance privacy and security.

6. Conclusions

Existing trajectory privacy protection schemes generally fail to consider the dependency between a stay point and its preceding location, overlook the relationship between the semantic information of location and privacy, and also suffer from issues such as over-protection. These problems not only reduce the effectiveness of trajectory privacy protection but also diminish the usability of trajectory data. To address these issues, this paper proposes a trajectory privacy protection scheme based on the replacement of stay points. Experimental results on real datasets show that, compared with similar schemes, the proposed scheme achieves higher data utility while ensuring privacy. The scheme can be applied to offline data publication or location-sharing scenarios that allow temporary storage. It fully explores the relationship between stay points and privacy from multiple dimensions, making minimal changes to the original trajectory to achieve both privacy protection and high data usability.

Author Contributions

Conceptualization, W.W. and D.L.; methodology, W.W. and D.L.; software, D.L.; validation, W.W. and D.L.; investigation, W.W. and D.L.; data curation, D.L.; writing—original draft preparation, D.L.; writing—review and editing, W.W.; visualization, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data Availability Statement

The research data utilized in this study are derived from the public Geolife GPS trajectory dataset. The dataset is publicly accessible at: https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/ (accessed on 10 January 2025).

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

References

  1. Jin, F.; Hua, W.; Francia, M.; Chao, P.; Orlowska, M.E.; Zhou, X. A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing. IEEE Trans. Knowl. Data Eng. 2023, 35, 5577–5596. [Google Scholar] [CrossRef]
  2. Cheng, W.; Wen, R.; Huang, H.; Miao, W.; Wang, C. OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing. Neurocomputing 2022, 472, 201–211. [Google Scholar] [CrossRef]
  3. Schlegel, R.; Chow, C.-Y.; Huang, Q.; Wong, D.S. User-Defined Privacy Grid System for Continuous Location-Based Services. IEEE Trans. Mob. Comput. 2015, 14, 2158–2172. [Google Scholar] [CrossRef]
  4. Wang, Y.; Li, M.; Luo, S.; Xin, Y.; Zhu, H.; Chen, Y.; Yang, G.; Yang, Y. LRM: A Location Recombination Mechanism for Achieving Trajectory k-Anonymity Privacy Protection. IEEE Access 2019, 7, 182886–182905. [Google Scholar] [CrossRef]
  5. Xu, Y.X.; Xu, Y.Y.; Xu, Z.Q. Trajectory Protection with Individual Semantic Utility under Local Differential Privacy. IEEE Internet Things J. 2025, 1. [Google Scholar] [CrossRef]
  6. Sweeney, L. k-Anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
  7. Bayardo, R.J.; Agrawal, R. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), Tokyo, Japan, 5–8 April 2005; pp. 217–228. [Google Scholar] [CrossRef]
  8. Gangarde, R.; Sharma, A.; Pawar, A.; Joshi, R.; Gonge, S. Privacy Preservation in Online Social Networks Using Multiple-Graph-Properties-Based Clustering to Ensure k-Anonymity, l-Diversity, and t-Closeness. Electronics 2021, 10, 2877. [Google Scholar] [CrossRef]
  9. Kifer, D. Attacks on privacy and de Finetti’s theorem. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; Association for Computing Machinery: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  10. Dwork, C. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar] [CrossRef]
  11. Andrés, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; Association for Computing Machinery: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  12. Gui, R.; Gui, X.; Zhang, X. A trajectory privacy protection method based on the replacement of points of interest in hotspot regions. Comput. Secur. 2024, 150, 104279. [Google Scholar] [CrossRef]
  13. Xing, L.; Li, B.; Liu, L.; Huang, Y.; Wu, H.; Ma, H.; Zhang, X. Trajectory privacy protection method based on sensitive semantic location replacement. Comput. Netw. 2024, 250, 110562. [Google Scholar] [CrossRef]
  14. Min, M.; Wang, W.; Xiao, L.; Xiao, Y.; Han, Z. Reinforcement learning-based sensitive semantic location privacy protection for VANETs. China Commun. 2021, 18, 244–260. [Google Scholar] [CrossRef]
  15. He, D.; Zhang, J.; Xu, L.; Liu, Y.; Ye, X. DRL-UPPS: User Trajectory Privacy Protection Strategy Based on Deep Reinforcement Learning in Mobile Crowdsensing. IEEE Trans. Comput. Soc. Syst. 2025, 12, 4241–4253. [Google Scholar] [CrossRef]
  16. Dai, Y.; Shao, J.; Wei, C.; Zhang, D.; Shen, H.T. Personalized semantic trajectory privacy preservation through trajectory reconstruction. World Wide Web 2018, 21, 875–914. [Google Scholar] [CrossRef]
  17. Zhang, W.; Xie, Z.; Maradapu, A.V.S.; Zia, Q.; He, Z.; Yin, G. A Local Differential Privacy Trajectory Protection Method Based on Temporal and Spatial Restrictions for Staying Detection. Tsinghua Sci. Technol. 2024, 29, 617–633. [Google Scholar] [CrossRef]
  18. Zheng, Z.; Li, Z.; Li, J.; Jiang, H.; Li, T.; Guo, B. Utility-aware and Privacy-preserving Trajectory Synthesis Model that Resists Social Relationship Privacy Attacks. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–28. [Google Scholar] [CrossRef]
  19. Liu, P.; Wu, D.; Shen, Z.; Wang, H.; Liu, K. Personalized trajectory privacy data publishing scheme based on differential privacy. Internet Things 2024, 25, 101074. [Google Scholar] [CrossRef]
  20. Wu, W.; Gong, J. Improved on Qiu’s schemes to resist long-term observation attacks with semantic attributes of location. In Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Tianjin, China, 8–10 May 2024; pp. 2388–2393. [Google Scholar] [CrossRef]
  21. Zhao, Y.; Chen, J. Vector-Indistinguishability: Location Dependency Based Privacy Protection for Successive Location Data. IEEE Trans. Comput. 2024, 73, 970–979. [Google Scholar] [CrossRef]
  22. McSherry, F.D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 19–30. [Google Scholar] [CrossRef]
  23. Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.-Y. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th international Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; Association for Computing Machinery: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  24. Zheng, Y.; Xie, X.; Ma, W.-Y. GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data Eng. Bull. 2010, 33, 32–39. Available online: https://api.semanticscholar.org/CorpusID:3219429 (accessed on 10 January 2025).
  25. Li, J.; Chen, G. A personalized trajectory privacy protection method. Comput. Secur. 2021, 108, 102323. [Google Scholar] [CrossRef]
  26. Zhao, X.; Pi, D.; Chen, J. Novel trajectory privacy-preserving method based on clustering using differential privacy. Expert Syst. Appl. 2020, 149, 113241. [Google Scholar] [CrossRef]
  27. Ji, Y.; Gui, X.; Dai, H.; An, J.; Zhu, H.; Peng, Z.; Lin, X. Trajectory privacy protection based on Sensitive Stay Area replacement in publishing. Math. Probl. Eng. 2022, 2022, 5114584. [Google Scholar] [CrossRef]
  28. Ye, A.; Zhang, Q.; Diao, Y.; Zhang, J.; Deng, H.; Cheng, B. A Semantic-Based Approach for Privacy-Preserving in Trajectory Publishing. IEEE Access 2020, 8, 184965–184975. [Google Scholar] [CrossRef]
Figure 1. Scheme flowchart.
Figure 1. Scheme flowchart.
Applsci 16 01391 g001
Figure 2. Location Semantic Tree.
Figure 2. Location Semantic Tree.
Applsci 16 01391 g002
Figure 3. Schematic Diagram of Trajectory Reconstruction.
Figure 3. Schematic Diagram of Trajectory Reconstruction.
Applsci 16 01391 g003
Figure 4. Trajectory Data of User 1.
Figure 4. Trajectory Data of User 1.
Applsci 16 01391 g004
Figure 5. Trajectory Data of User 2.
Figure 5. Trajectory Data of User 2.
Applsci 16 01391 g005
Figure 6. AOD Performance under Different Schemes.
Figure 6. AOD Performance under Different Schemes.
Applsci 16 01391 g006
Figure 7. RMSE Performance under Different Schemes.
Figure 7. RMSE Performance under Different Schemes.
Applsci 16 01391 g007
Figure 8. Impact of the number of semantically sensitive positions.
Figure 8. Impact of the number of semantically sensitive positions.
Applsci 16 01391 g008
Figure 9. The impact of parameter β on privacy quality and data utility.
Figure 9. The impact of parameter β on privacy quality and data utility.
Applsci 16 01391 g009
Figure 10. Performance of different configurations on AOD.
Figure 10. Performance of different configurations on AOD.
Applsci 16 01391 g010
Figure 11. Performance of different configurations on RMSE.
Figure 11. Performance of different configurations on RMSE.
Applsci 16 01391 g011
Table 1. Key features comparison: ours vs. SOTA methods.
Table 1. Key features comparison: ours vs. SOTA methods.
MethodPrivacy
Mechanism
Spatial
Dependency
Location SemanticsOver
Protection
Complexity
Gui et al. [12]Differential
Privacy
PartialYesMild O ( N 2 )  
Xing et al. [13]Differential
Privacy
PartialYesMild O ( N )
Zhang et al. [17]Local Differential
Privacy
NoNoMild O ( N )
OursVector indistinguishabilityFullYesNegligible O ( N )
Table 2. Definition and meaning of the symbols used in this work.
Table 2. Definition and meaning of the symbols used in this work.
SymbolsMeaning
TRuser’s trajectory sequence
SPsetset of stay points
ε O D privacy budget for the directional indistinguishability of ordinary stay points
ε O M privacy budget for the distance indistinguishability of ordinary stay points
PTsetset of candidate perturbed points
ε L D privacy budget for the directional indistinguishability of Long-Duration stay points
ε L M privacy budget for the distance indistinguishability of Long-Duration stay points
ε t o t a l total privacy budget
ε L privacy budget for long-duration stay points
ε O budget for ordinary stay points
ε O E privacy budget allocated to each ordinary stay point
ε L E privacy budget allocated to each long-duration stay point
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, W.; Li, D. A Trajectory Privacy Protection Scheme Based on the Replacement of Stay Points. Appl. Sci. 2026, 16, 1391. https://doi.org/10.3390/app16031391

AMA Style

Wu W, Li D. A Trajectory Privacy Protection Scheme Based on the Replacement of Stay Points. Applied Sciences. 2026; 16(3):1391. https://doi.org/10.3390/app16031391

Chicago/Turabian Style

Wu, Wanqing, and Delong Li. 2026. "A Trajectory Privacy Protection Scheme Based on the Replacement of Stay Points" Applied Sciences 16, no. 3: 1391. https://doi.org/10.3390/app16031391

APA Style

Wu, W., & Li, D. (2026). A Trajectory Privacy Protection Scheme Based on the Replacement of Stay Points. Applied Sciences, 16(3), 1391. https://doi.org/10.3390/app16031391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop