Next Article in Journal
Ref-MEF: Reference-Guided Flexible Gated Image Reconstruction Network for Multi-Exposure Image Fusion
Previous Article in Journal
Discrete and Semi-Discrete Multidimensional Solitons and Vortices: Established Results and Novel Findings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Differential Privacy Preservation for Continuous Release of Real-Time Location Data

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(2), 138; https://doi.org/10.3390/e26020138
Submission received: 8 January 2024 / Revised: 31 January 2024 / Accepted: 1 February 2024 / Published: 3 February 2024

Abstract

:
Continuous real-time location data is very important in the big data era, but the privacy issues involved is also a considerable topic. It is not only necessary to protect the location privacy at each release moment, but also have to consider the impact of data correlation. Correlated Laplace Mechanism (CLM) is a sophisticated method to implement differential privacy on correlated time series. This paper aims to solve the key problems of applying CLM in continuous location release. Based on the finding that the location increment is approximately stationary in many scenarios, a location correlation estimation method based on the location increment is proposed to solve the problem of nonstationary location data correlation estimation; an adaptive adjustment model for the CLM filter based on parameter quantization idea (QCLM) as well as its effective implementation named QCLM-Lowpass utilizing the lowpass spectral characteristics of location data series is proposed to solve the problem of output deviations due to the undesired transient response of the CLM filter in time-varying environments. Extensive simulations and real data experiments validate the effectiveness of the proposed approach and show that the privacy scheme based on QCLM-Lowpass can offer a better balance between the ability to resist correlation-based attacks and data availability.

1. Introduction

With the prevalence of intelligent terminal devices equipped with high-precision positioning capabilities, people can access location-based services (LBSs) whenever and wherever they want. As a result, large amounts of individuals’ location data are collected, stored, and analyzed, which has become an important resource for business analysis and academic research. However, location data is inherently sensitive and can be linked to data from other sources since space brings particular constraints [1,2], which enables an attacker to infer individual private information such as home address, company, lifestyle habits, etc. Privacy concerns hinder users’ willingness to share location data. Therefore, location privacy protection is one of the important issues in the big data era.
A variety of location-privacy-preserving mechanisms (LPPMs) have been proposed in the literature, and some works [2,3,4] provide a systematic review of them. Among the existing privacy definitions, differential privacy [5] based on the idea of indistinguishability in cryptography provides a rigorous mathematical proof of the privacy strength, which guarantees that the actual privacy strength is not affected by the attacker’s background knowledge, and thus is widely used in location-based services [6,7,8]. Early works on differential privacy assumed the presence of trusted third-party administrators, but as application environments have become more complex, many untrustworthy servers can also easily access large amounts of users’ data. To solve this problem, local differential privacy (LDP) [9] has been proposed. It perturbs the user’s data locally before sending the data to any entity. Only the data owner can fully access the original data, which provides stronger privacy protection. In the field of location privacy, geo-indistinguishability [10] extended the idea of indistinguishability to continuous geographic space, enabling local privacy protection for individual locations in snapshot publishing.
However, the privacy problem is more serious in scenarios of continuous location data publishing [2]. The continuously observed location data is usually correlated, which may make it difficult for differential privacy mechanisms to achieve the expected privacy-preserving effect [11,12]. Some existing works for correlated location data publishing mainly take advantage of a model to describe the correlation between locations, and thereby adapt the location perturbation approach. Jiang et al. [13] considered time-space constraints between locations according to a basic kinematics model, and used an exponential mechanism to generate azimuths and distances which are more consistent with the motion pattern, thus avoiding irrationally perturbed locations. Chatzikokolakis et al. [14] used a prediction function to characterize the correlation between locations, and privately determined whether to allocate a privacy budget to generate noise for the current location based on the error of the prediction. Instead, Al-Dhubhani et al. [15] adaptively adjusted the privacy budget at each moment based on the level of linear prediction error. In addition, Xiao et al. [16] constructed the “ δ -location set” based on a Markov model to account for the temporal correlations in location data and proposed a planar isotropic mechanism (PIM) for location perturbation. Based on this work, Xiong et al. [17] improved data usability by applying a generalized randomized response mechanism (GRR) to the “ δ -location set” for privacy protection. Unfortunately, these works lacked an exhaustive explanation for why data correlation would lead to a reduction in the privacy strength of differential privacy mechanisms.
Wang and Xu [18] argued that the primary cause of the reduction in privacy strength is the discrepancy in data correlation between the noise and original series. This can be used by an attacker to launch a correlation-distinguishability attack (CDA) [12], such as Weiner filtering, to filter out some of the perturbation noise and thus remove some of the privacy effect. To remedy this problem, the work [18] proposed the notion of series-indistinguishability to guarantee that the correlation between the noise and original series is indistinguishable, and designed the Correlated Laplace Mechanism (CLM) to generate a correlated Laplace noise series to satisfy series-indistinguishability, which provided an effective differential privacy-preserving scheme for correlated time-series data publication. Naturally, it is desirable to apply CLM to protect privacy in continuous location data release.
However, there are some challenges in the implementation of CLM, which mainly include the following two aspects:
  • How to accurately estimate the data correlation. A prerequisite for the effective application of CLM is that the autocorrelation function of the data series is known or can be accurately estimated. However, the time series consisting of continuously observed location data is usually non-stationary, and its autocorrelation function changes over time, which makes it difficult to obtain accurate estimates based on time averaging.
  • How to make CLM dynamically track the time-varying data correlation. CLM based on the filter approach is performed by setting the filter parameters so that the steady-state response satisfies the given autocorrelation function. To track the time-varying data correlation, the CLM filter should be dynamically adjusted, which may cause the undesired transient response to be non-negligible, resulting in a deviation of the actual output from the target.
The above challenges make CLM unsuitable for continuous real-time location data release. This paper presents practical solutions to these challenges. First, the stationarity of the location data series is analyzed, and results imply that the location increment is approximately stationary in many scenarios, from which a location data correlation estimation method using the location increment is derived. Second, an adaptive adjustment model of the CLM filter based on parameter quantization, called Quantized Correlated Laplace Mechanism (QCLM), is proposed, and based on the lowpass spectral characteristics of the location data series, an effective implementation named QCLM-Lowpass is designed. Our main contributions are summarized as follows:
  • A location data correlation estimation method based on the location increment is proposed, in which the problem of correlation estimation of a nonstationary location data series is converted into the problem of estimating real location increments. Thereby, the correlation of nonstationary location data can be accurately estimated in many practical scenarios.
  • An adaptive adjustment model of the CLM filter (QCLM) and its effective implementation (QCLM-Lowpass) are proposed. The CLM filter is adjusted only at the necessary moments to reduce the generation of undesired transient responses, thus producing an output that satisfies series-indistinguishability as much as possible in time-varying environments.
  • Extensive simulations and real data experiments validate the effectiveness of the proposed approaches, and the results show that the privacy scheme based on QCLM-Lowpass outperforms the other schemes compared in terms of data usability and the ability to resist the correlation-based attack.
The remainder of the paper is organized as follows: Section 2 introduces the continuous real-time location data release model and relevant privacy theories and gives the privacy goals in this paper. The Dynamic Correlated Laplace Mechanism (DCLM) and problem statement are presented in Section 3. The location data correlation estimation method and QCLM are proposed in Section 4. We report the experimental evaluation in Section 5 and give the conclusion in Section 6.

2. Preliminaries

In this section, the continuous real-time location data release model is introduced, and then several relevant privacy theories are reviewed, followed by the privacy-preserving goals in this paper.

2.1. Continuous Real-Time Location Data Release Model

Before introducing the continuous location data release model, the location data ecosystem is presented in Figure 1. There are three important roles: the data provider, the data collector, and the data user, each with different demands. Data users expect accurate location data for application analysis, while data providers are reluctant to provide raw data for privacy reasons. The role of privacy protection is to balance these requirements, protecting user privacy while ensuring the usability of published data. Therefore, privacy protection is vital to the healthy and sustainable development of the location data ecosystem.
All spatial data is created in a coordinate system. This paper involves two coordinate systems, denoted as GEO and XOY. In GEO, a location record is expressed as l o c ( t ) = { I D , t , < l o n ( t ) , l a t ( t ) , a l t ( t ) > } , where ID denotes the user identity, and t denotes the time stamp, l o n ( t ) , l a t ( t ) , a l t ( t ) indicate longitude, latitude, and altitude information, respectively. Given that the user usually moves within a limited area where altitude changes can be ignored, a Cartesian coordinate system, XOY, is introduced. Specifically, the coordinate < l o n ( t 0 ) , l a t ( t 0 ) > at the certain moment t 0 is used as the origin, O, and XOY can be constructed with the east direction as the positive direction of X-axis, the north direction as the positive direction of Y-axis, and the scale in meters. Thereby, a location at time t can be expressed by the vector l ( t ) = [ x ( t ) , y ( t ) ] T , where the superscript T denotes the matrix transpose and x ( t ) , y ( t ) are the coordinates on the X and Y axes, respectively. An example for such two coordinate systems is shown in Figure 2.
In practice, many applications require the user to continuously share real-time location data. In a typical scenario, the user releases location data with a fixed time interval, Δ t , starting from a certain moment, t 0 . For convenience, the i-th release time is t i = t 0 + i × Δ t , and the corresponding location data is denoted as l ( i ) = [ x ( i ) , y ( i ) ] T . In the local privacy model shown in Figure 1, the user does not directly release the original location l ( i ) but releases a perturbed location, l ˜ ( i ) = [ x ˜ ( i ) , y ˜ ( i ) ] T , which is the output of the privacy mechanism M :
l ˜ ( i ) = M [ l ( i ) ] = l ( i ) + n ( i )
where n ( i ) = [ n X ( i ) , n Y ( i ) ] T is the noise vector and n X ( i ) , n Y ( i ) are the noise on the X and Y directions, respectively.
Suppose that there have been I releases until now, then the data series formed by original locations can be denoted as L ( I ) = [ X ( I ) , Y ( I ) ] = [ l ( 0 ) , , l ( I ) ] T , where the column vectors X ( I ) = [ x ( 0 ) , , x ( I ) ] T and Y ( I ) = [ y ( 0 ) , , y ( I ) ] T are the data series in the X and Y directions. Similarly, the noise series and the perturbed location series are denoted as N ( I ) = [ N X ( I ) , N Y ( I ) ] = [ n ( 0 ) , , n ( I ) ] T and L ˜ ( I ) = [ X ˜ ( I ) , Y ˜ ( I ) ] = [ l ˜ ( 0 ) , , l ˜ ( I ) ] T respectively. The major notations in this paper are summarized in Table 1.
To protect privacy, it should be guaranteed that the attacker cannot accurately infer true location information based on the published data, including the following demands,
  • The attacker cannot accurately infer the original location l ( i ) from l ˜ ( i ) at each moment.
  • The attacker cannot accurately extrapolate the true location l ( i ) based on the historically published series L ˜ ( I ) .

2.2. Privacy Theories

For this privacy problem, we first review several important privacy theories.

2.2.1. Differential Privacy

Differential privacy [5] is based on the concept of indistinguishability in cryptology and is a state-of-the-art privacy preservation model. It ensures that the output of a private algorithm is not strongly dependent on any one record in the input dataset. Thus, differential privacy provides strict privacy protection, even if the attacker knows all information except the target record. Its formal definition is given below.
Definition 1
(differential privacy [5]). Considering two neighboring datasets, D , D , which have the same cardinality but differ in only one record, a random perturbation mechanism, M , satisfies ( ε , δ ) -differential privacy if for all possible outcomes O R a n g e ( M ) and for any pair of D , D :
P r [ M ( D ) O ] e ε P r [ M ( D ) O ] + δ
where R a n g e ( M ) denotes the value range of M , and P r ( ) denotes the probability. If δ = 0 , we say that M is ε -differentially private. The parameter ε is considered as the privacy budget which indicates the privacy strength. A small ε is associated with better privacy.
A popular ε -differentially private algorithm for numeric queries is the Laplace mechanism. The definition of the Laplace mechanism is given below.
Definition 2
(Laplace mechanism [19]). Let L a p ( λ ) denote the Laplace distribution with mean 0 and scale λ . For any numeric-valued function f : D k , the Laplace mechanism M L is defined as follows:
M L ( D , f ( ) , ε ) = f ( D ) + < N 1 , , N k >
where < N 1 , , N k > are i.i.d. random variables drawn from L a p ( Δ f / ε ) . The sensitivity function Δ f is the maximum effect a single record has on the result of f ( ) :
Δ f = max D , D f ( D ) f ( D ) 1

2.2.2. Geo-Indistinguishability

The idea of indistinguishability is the basis of differential privacy, geo-indistinguishability [10] extends it to the continuous geographic space, which can be formally defined as follows.
Definition 3
(geo-indistinguishability [10]). A mechanism M satisfies ε -geo-indistinguishability if for any two locations l , l in the protection region and for any output location l ˜ R a n g e ( M ) :
P r [ M ( l ) = l ˜ ] e ε d L ( l , l ) P r [ M ( l ) = l ˜ ]
where d L ( , ) denote the distance between two locations, such as Euclidean distance.
Geo-indistinguishability ensures that it is difficult for an attacker to distinguish the true location from its surroundings according to the perturbed location, and thus provides privacy protection for the single location. Its privacy strength is controlled by the parameter ε .

2.2.3. Series-Indistinguishability

Continuously observed location data will constitute a correlated time series, which is vulnerable to correlation-based attacks. To solve this problem, series-indistinguishability [18] has been proposed to complement differential privacy preservation on correlated time-series data. Its formal definition is given as follows.
Definition 4
(series-indistinguishability [18]). Let X , X ˜ denote the original and perturbed data series. If the autocorrelation functions of them, R X ( τ ) and R X ˜ ( τ ) , satisfy the following:
R X ˜ ( τ ) / R X ˜ ( 0 ) = R X ( τ ) / R X ( 0 )
then X , X ˜ are series-indistinguishable to an adversary, where τ denotes the lag of the auto-correlation function.
In practice, however, it is difficult to achieve absolute series-indistinguishability due to various factors such as estimation errors. A more robust extension with noise tolerant interval [ υ , υ ] was presented, which requires the autocorrelation functions to satisfy the following condition:
υ R X ˜ ( τ ) R X ˜ ( 0 ) / R X ( τ ) R X ( 0 ) υ
Series-indistinguishability makes it difficult for an attacker to filter out some of the perturbation noise by exploiting the data correlation, so that the effective privacy protection of the correlated time series can be achieved without increasing the noise intensity.

2.2.4. Correlated Laplace Mechanism

To achieve series-indistinguishability, the privacy mechanism M needs to generate a noise series whose autocorrelation function is consistent with that of the original data series. For this purpose, CLM [18] was proposed to generate a correlated Laplace noise series with the given autocorrelation function. It includes the following two steps:
  • Generate four Gaussian distributed random numbers independently, g m ( i ) ~ N ( 0 , σ G 2 ) , m = 1 ,   2 ,   3 ,   4 , where i denotes the timestamp. Thereby, four Gaussian noise series are generated over time, denoted as G m , m = 1 ,   2 ,   3 ,   4 . Each of them satisfies the same autocorrelation function R G ( τ ) , but are independent from each other.
  • Generate the Laplace distributed random number n ( i ) ~ L a p ( 2 σ G 2 ) , which can be calculated as follows:
    n ( i ) = [ g 1 ( i ) ] 2 + [ g 2 ( i ) ] 2 [ g 3 ( i ) ] 2 [ g 4 ( i ) ] 2
    Thereby, the correlated Laplace noise series can be generated over time, where the autocorrelation function of the Laplace and Gaussian noise series, R N ( τ ) and R G ( τ ) , satisfy the following equation:
    R N ( τ ) = 8 [ R G ( τ ) ] 2
Figure 3 illustrates the generation of the correlated Laplace noise in CLM at a certain moment. There are four independently operating filters. They have the same parameters but different inputs and outputs. The filter used to generate the correlated Gaussian noise series with the input of Gaussian white noise is known as the CLM filter. It can be characterized by the following system function:
H C L M ( z ) = q = 0 Q b q z q u = 0 U a u z u
where the complex variable z = e σ + j ω , and Q , U denote the order of zeros and poles, respectively. Let a 0 = b 0 = 1 by default and let A C L M = [ a 1 , , a U ] , B C L M = [ b 1 , , b Q ] denote the parameter vectors. In addition, there is a gain coefficient κ C L M in the calculation of the Laplace noise n ~ L a p ( λ ) , which can be calculated as follows:
κ C L M = π / π π H C L M ( z ) | z = e j ω 2 2 d ω
To clearly describe the dynamic adjustment problem of the CLM filter, this paper considers CLM as a program object, and redescribes its implementation from a programming perspective. Table 2 lists the main attributes and methods of the CLM object.
At the beginning of the application, the method I n i t i a l i z a t i o n is called to assign initial values to the parameters and clear the state of filters. The methods U p d a t e P a r a m and U p d a t e G a i n C o e f f are used to update the parameters when it is necessary. During each iteration, the method L a p l a c e G e n e r a t i o n is called to generate the Laplace noise, where the output of G a u s s i a n G e n e r a t o r is used as the input of the CLM filter.
This paper focuses on the generation of correlated Laplace noise, and the specific steps of the method L a p l a c e G e n e r a t i o n are given in Algorithm 1.
Algorithm 1. CLM: L a p l a c e G e n e r a t i o n (   )
1Generate four i.i.d random numbers,
    { g 1 , g 2 , g 3 , g 4 } = G a u s s i a n G e n e r a t o r (   )
Set G = [ g 1 , g 2 , g 3 , g 4 ] T
2Calculate four correlated Gaussian noise,
G = S A T + G + S B T
3Update the input state matrix S,
      s m , k + 1 s m , k k = 1 , , Q 1 , m = 1 , 2 , 3 , 4 s m , 1 g m m = 1 , 2 , 3 , 4
4Update the output state matrix S′,
      s m , k + 1 s m , k k = 1 , , U 1 , m = 1 , 2 , 3 , 4 s m , 1 g m m = 1 , 2 , 3 , 4
5Calculate the noise n ~ L a p ( λ ) ,
      n = [ ( g 1 ) 2 + ( g 2 ) 2 ( g 3 ) 2 ( g 4 ) 2 ] κ λ
6return n
In line 1, the method G a u s s i a n G e n e r a t o r is called to generate i.i.d random numbers g 1 , g 2 , g 3 , g 4 ~ N ( 0 , 1 ) , which are used as the inputs of the four filters. In lines 2–4, the outputs g 1 , g 2 , g 3 , g 4 are calculated and the states of filters are updated. In lines 5–6, the noise n ~ L a p ( λ ) is generated and returned.

2.3. Privacy Protection Goals

Based on the above privacy theories, the privacy requirements presented in Section 2.1 can be expressed as follows,
  • At each release moment, if the privacy mechanism M satisfies ε -geo-indistinguishability as defined in Equation (12), then the attacker cannot easily and accurately infer the original location l ( i ) from the published location l ˜ ( i ) :
    P r { M [ l ( i ) ] = l ˜ ( i ) } P r { M [ l ( i ) ] = l ˜ ( i ) } exp { ε d E [ l ( i ) , l ( i ) ] }
    where l ( i ) = [ x ( i ) , y ( i ) ] T is a location near l ( i ) with a Euclidean distance d E [ l ( i ) , l ( i ) ] .
  • As for the original and perturbed location series, L ( I ) , L ˜ ( I ) , if they are series-indistinguishable, then the attacker cannot easily use data correlation to accurately infer l ( i ) from L ˜ ( I ) . This requires that their autocorrelation function matrices, R L ( i , i τ ) , R L ˜ ( i , i τ ) , satisfy the following condition:
    R L ( i , i τ ) R L ( i , i ) = R L ˜ ( i , i τ ) R L ˜ ( i , i )
    where denotes the Hadamard division (element-wise division), and R L ( i , i τ ) , R L ˜ ( i , i τ ) are defined as follows:
    R L ( i , i τ ) = E [ l ( i ) l T ( i τ ) ] = R X X ( i , i τ ) R X Y ( i , i τ ) R Y X ( i , i τ ) R Y Y ( i , i τ )
    R L ˜ ( i , i τ ) = E [ l ˜ ( i ) l ˜ T ( i τ ) ] = R X ˜ X ˜ ( i , i τ ) R X ˜ Y ˜ ( i , i τ ) R Y ˜ X ˜ ( i , i τ ) R Y ˜ Y ˜ ( i , i τ )
Naturally, it is considered that CLM can be applied to achieve the above privacy goals, but this will face some challenges in the practical implementation.

3. Problems of Dynamic Correlated Laplace Mechanism

For streaming data publishing, the Dynamic Correlated Laplace Mechanism is presented, and then its practical problems in continuous location data release are described.

3.1. Dynamic Correlated Laplace Mechanism

In the continuous location data release application, location data is dynamically generated and its correlation changes over time. To achieve series-indistinguishability, the CLM filter must be dynamically adjusted to track the time-varying data correlation. This scheme is called Dynamic Correlated Laplace Mechanism (DCLM).
Let us first consider the one-dimensional data series and take X ( I ) as an example, DCLM can be implemented with a sliding window model. At the moment t i , the data in the window with size M L , denoted as X W ( i ) = [ x ( i M L + 1 ) , , x ( i ) ] T , can be used to estimate the autocorrelation function vector R X ( i ) = [ R X X ( i , i ) , , R X X ( i , i ϒ ) ] T , where R X X ( i , i τ ) = E [ x ( i ) x ( i τ ) ] , τ = 0 , , ϒ , and ϒ is the maximum lag. Then, the CLM filter is adjusted so that the autocorrelation function of the Laplace noise series can be matched to R X ( i ) .
Similarly, this paper considers DCLM as a program object in which CLM is a member. The attributes of DCLM include the CLM filter’s parameter vectors, B , A , and the adjustment step μ D C L M . In addition, there are two main methods in DCLM: one is used to initialize the attributes as well as CLM, and the other is used to dynamically adjust the CLM filter to generate Laplace noise, denoted as I t e r a t i o n , which is described in Algorithm 2.
Algorithm 2. DCLM: I t e r a t i o n
Input: the autocorrelation function vector R
Output: the Laplace noise n
1Calculate the CLM filter’s estimated parameter vector B ^ , A ^ according to R ;
2Update the CLM filter’s parameter vector,
B = B + ( B ^ B ) μ D C L M A = A + ( A ^ A ) μ D C L M
3Calculate the gain coefficient κ according to B , A ;
4Update the parameters of the CLM filter
C L M . U p d a t e P a r a m ( B , A ) C L M . U p d a t e G a i n C o e f f ( κ )
5Generate the Laplace noise n = C L M . L a p l a c e G e n e r a t i o n (   ) ;
6return n;
In line 1, the autocorrelation function vector is used to compute the estimated parameter vectors of the CLM filter, which can be referred to the work [18]. Lines 2–3 update the parameter vectors and the gain coefficient. Lines 4–6 call the methods of the CLM object to adjust the CLM filter and to generate the Laplace noise to return.
From the above steps, it can be found that DCLM can be effectively applied when X ( I ) satisfies stationary or short-time stationary, in which case R X ( i ) can be accurately estimated and μ D C L M can be smaller to ensure that the CLM filter’s output is as expected. However, there are some challenges in implementing DCLM in the continuous location data release application.

3.2. Problems of DCLM in Continuous Location Data Release

In fact, location data series usually do not satisfy the stationary conditions, which makes it difficult to accurately estimate the autocorrelation function. In addition, the dynamic adjustment of the CLM filter causes the unwanted transient response. This results in a deviation between the actual output and the desired output.

3.2.1. Correlation Estimation

As for the dynamically generated location data series L ( I ) , the data in the window with size M L , denoted as L W ( i ) = [ l ( i M L + 1 ) , , l ( i ) ] T , is used to calculate the estimate of autocorrelation function matrix, R ^ L ( i , i τ ) ,
R ^ L ( i , i τ ) = 1 M L k = 0 M L τ 1 l ( i k ) l T ( i τ k )
From the above equation, the estimation accuracy depends on the stationarity of L ( I ) . If the following stationary conditions are satisfied, the window size M L can be larger to ensure the accurate estimation,
E [ l ( i ) ] = E [ l ( i k ) ]
R L ( i , i τ ) = R L ( i k , i k τ )
where k . However, this requires the user must stay at a certain location, which results in an excessively limited application scenario.
A real location data series is shown in Figure 4a, and Figure 4b presents the data series X ( I ) on the X direction. As the user moves, X ( I ) shows a corresponding trend and thus no longer satisfies the stationary conditions. Therefore, it is difficult to accurately estimate the autocorrelation function directly from the original location data series.

3.2.2. Transient Response in Dynamic Adjustment

Based on the knowledge of filter theory, the actual output of the CLM filter, R C L M ( i ) , consists of two parts: the steady-state response, R S ( i ) , and the transient response, R T ( i ) :
R C L M ( i ) = R S ( i ) + R T ( i )
where i is the timestamp. The former is determined by the input and the filter parameters B C L M , A C L M , while the latter is determined not only by these two factors but also by the initial state of the filter. If B C L M , A C L M remain unchanged, then R T ( i ) will decays over time and eventually only R S ( i ) will remains. Therefore, to ensure the validity of the final result, B C L M , A C L M are adjusted to allow R S ( i ) to satisfy the given correlation, where R T ( i ) is the undesired part.
However, when B C L M , A C L M are changed, the previous steady state will be destroyed, and a new undesired transient response will be generated. This implies that a transition phase is required for the filter to return to steady state. Here, a second-order all-pole filter is taken as an example to illustrate this transition phase, its system function is as follows:
H A P ( z ) = 1 ( 1 ζ z 1 ) 2
where ζ denotes the second-order pole, and we allow the Gaussian white noise G ~ N ( 0 , 1 ) as the input. Figure 5a shows the change of the filter’s normalized power spectrum in three adjustments, and Figure 5b presents the output variance over time, where the results are normalized according to the corresponding steady-state response.
As shown in Figure 5b, the filter’s adjustment produces the new transient response highlighted in red, which makes the actual results deviate from expectations. It takes some time for the filter to return to the steady state. Therefore, the undesired transient response needs to be suppressed in the implementation of DCLM.

4. Methodology

For the aforementioned practical problems, our solution idea includes the following two aspects:
  • Location data correlation estimation based on the location increment. The autocorrelation function of the location data is expressed in terms of true location increments. Thus, if the location increment series satisfies stationary conditions, the location data correlation can be calculated indirectly using the estimated location increments.
  • Quantized Correlated Laplace Mechanism (QCLM). On the one hand, the CLM filter should remain unchanged to suppress the transient response; on the other hand, it needs to be adjusted to track the time-varying data correlation. To balance these two requirements, this paper proposes an adaptive adjustment method based on parameter quantization. It adjusts the CLM filter only at the necessary moments, so that series-indistinguishability can be satisfied as much as possible.

4.1. Location Data Correlation Estimation Based on The Location Increment

In practice, it is desirable to find a more stationary intermediate variable to compute the autocorrelation function estimate for the non-stationary location data series. As illustrated in Figure 4c, the increments of X ( I ) show approximate stationarity in the time segments highlighted in red. Our analysis reveals that the location series with approximately stationary increments account for more than 27% of the real dataset (details are given in Section 5.1.3). Moreover, the location series and its increments can be mutually transformed with the given starting point. Therefore, we employ location increments to compute the autocorrelation function estimate for the location data.
For the locations l ( i 1 ) , l ( i ) , their increment is denoted as v ( i ) = l ( i ) l ( i 1 ) = [ v X ( i ) , v Y ( i ) ] T , and then the increment series of L ( I ) is denoted as V ( I ) = [ V X ( I ) , V Y ( I ) ] = [ v ( 1 ) , , v ( I ) ] T , where V X ( I ) and V Y ( I ) are the increments in the X and Y directions. Similarly, stationarity require that E [ v ( i ) ] is constant over time, which means that the user remains at rest or in uniform linear motion, covering a wider range of scenarios.
Here, we describe the location data correlation estimation method based on location increments. To eliminate the effect of the coordinate origin, the location data correlation is characterized by the autocorrelation function matrix R ˜ L ( i , i τ ) after removing the center of L W ( i ) :
R ˜ L ( i , i τ ) = E { [ l ( i ) O ( i ) ] [ l ( i τ ) O ( i ) ] T }
where the center O ( i ) is defined as follows:
O ( i ) = 1 M L E [ k = 0 M L 1 l ( i k ) ]
In fact, l ( i ) is composed of two parts: the true value l t r ( i ) = E [ l ( i ) ] and the error l e r ( i ) , where t r , e r denote the truth and error, respectively. In this paper, it is considered that the error series L e r ( I ) = [ l e r ( 0 ) , , l e r ( I ) ] T is a zero-mean white noise, then Equation (21) can be further expressed as follows:
R ˜ L ( i , i τ ) = [ l t r ( i ) O t r ( i ) ] [ l t r ( i τ ) O t r ( i ) ] T
where O t r ( i ) = k = 0 M L 1 l t r ( i k ) / M L . Furthermore, if l ( i M L ) is considered as the observation origin, then l t r ( i k ) can be expressed as follows:
l t r ( i k ) = l t r ( i M L ) + m = k M L 1 v t r ( i m )
where the true increment v t r ( i ) = l t r ( i ) l t r ( i 1 ) . By substituting Equation (24) into Equation (23), the following equation can be obtained:
R ˜ L ( i , i τ ) = [ m = 0 M L 1 v t r ( i m ) O V ( i ) ] [ m = τ M L 1 v t r ( i m ) O V ( i ) ] T
where O V ( i ) = m = 0 M L 1 ( m + 1 ) v t r ( i m ) / M L .
By the above steps, the autocorrelation function of the location data is expressed in terms of true location increments. When V ( I ) satisfies the stationary conditions, the true location increments v t r ( i ) can be accurately estimated, and thus the location correlation R ˜ L ( i , i τ ) can be calculated according to Equation (25).

4.2. Quantized Correlated Laplace Mechanism

As for the dynamic adjustment of the CLM filter, an adaptive adjustment based on parameter quantization is proposed to achieve a balance between the suppression of transient response and the dynamic tracking of the time-varying correlations.

4.2.1. Adaptive Adjustment Based on Parameter Quantization

To suppress the transient response, the CLM filter should be kept unchanged to reach the steady state. A simple strategy is to control the time interval for parameter adjustment. For example, consider the filter with the system function given in Equation (20). As shown by the results highlighted in blue in Figure 6, the filter is adjusted at the set time interval. However, this method faces the following two problems,
  • It is difficult to determine the appropriate adjustment time. The CLM filter needs to be adjusted in time when the correlation of the output noise series does not match that of the original data. However, the time-varying data correlation is unpredictable, which makes it difficult to set the adjustment time interval in advance.
  • It is difficult to compute the appropriate parameters of the CLM filter. To reduce the effects of estimation errors or transient changes of data, it is necessary to determine the final adjustment based on the correlation estimate over a period of time. However, the computation of the filter parameters is non-linear, which makes it difficult to solve for the optimized results.
To this end, an adaptive adjustment strategy is desired. The basic idea is to constantly perceive the difference between the output of CLM and the target data series in terms of data correlation, and the CLM filter is only adjusted when the difference exceeds a certain threshold. As shown by the results highlighted in red in Figure 6, it adjusts the filter only when necessary, ensuring that the filter remains as unchanged as possible.
In this paper, we consider an adaptive adjustment scheme of the CLM filter based on parameter quantization, called Quantized Correlated Laplace Mechanism (QCLM). Let H C L M = [ B C L M , A C L M ] denote the CLM filter’s parameter vector, and H denote the space including all possible value of H C L M . In QCLM, H is divided into Q N u m mutually exclusive subspaces H 1 , , H Q N u m , which are mapped as different vectors H 1 , , H Q N u m respectively. Thereby, the adjustment is realized as the following steps:
  • Calculated the estimated parameter vector H ^ C L M according to the autocorrelation function vector R X ( i ) ;
  • Identify the corresponding subspace H ^ C L M H q , q = 1 , , Q N u m , and set H C L M = H q .
The above parameter quantization inevitably introduces errors into the series-indistinguishability, which affects the actual privacy-preserving performance. To ensure the effectiveness of QCLM, the following requirements should be satisfied:
  • For any subspace H q , q = 1 , , Q N u m , it should be guaranteed that the actual privacy strength does not change significantly when replacing H C L M H q with H q ;
  • It is not trivial to determine Q N u m . The larger the value of Q N u m , the smaller the quantization error, but it causes the CLM filter to change more frequently. Conversely, a smaller Q N u m introduces a larger quantization error, but it allows the CLM filter to remain unchanged for a longer period of time.
Obviously, it is very difficult to quantize H C L M without any constrains. To implement QCLM, a basic idea is to map H C L M to a feature parameter, and thus the parameter quantization can be achieved by dividing the definition domain of this feature parameter.

4.2.2. QCLM Based on Lowpass Characteristic

Given that the autocorrelation function and the power spectrum form a Fourier pair, the power spectrum characteristics of the data series in the X and Y directions were analyzed. The results indicated that the power spectrum energy is mainly concentrated in the low frequencies (details are given in Section 5.1.4).Therefore, it is considered that series-indistinguishability can be approximated by constructing noise series with the same lowpass characteristics as the data series, and this method is called CLM-Lowpass.
In the case of the series X ( I ) , let S X ( ω ) denote the power spectrum at a certain moment, where ω [ π , π ] is the normalized angle frequency, and its lowpass cutoff frequency ω X can be calculated by the following equation:
arg min C I L F , ω X [ 0 , π ] π π [ C I L F S I L F ( ω | ω X ) S X ( ω ) ] 2 d ω
where C I L F denotes the amplitude factor and S I L F ( ω | ω X ) denotes the power spectrum of an ideal lowpass filter with cutoff frequency at ω X , which is defined as follows:
S I L F ( ω | ω X ) = 1 ω ω X 0 ω X < ω
To approximate series-indistinguishability, the CLM filter is constructed to generate the Laplace noise series N X ( I ) , whose power spectrum cutoff frequency, ω N X , is also at ω X . The detailed calculation of the CLM filter’s parameters is discussed in Section 4.3.2.
As an example, the normalized power spectrum of the data and noise series, S X ( ω ) , S N X ( ω ) , are shown in Figure 7a. Approximate series-indistinguishability does not require S N X ( ω ) to exactly match S X ( ω ) , but rather ensures that they have the same lowpass characteristics. Extensive experiments verified that, compared to original CLM, the changes in actual privacy strength induced by CLM-Lowpass are acceptable (details are given in Section 5.3.1).
In CLM-Lowpass, H C L M can be determined by the cutoff frequency of the noise power spectrum, ω N , so that the quantization of H C L M can be achieved by dividing ω N . Specifically, a basic quantization model is illustrated in Figure 7b, the normalized angular frequency interval [ 0 , π ] is divided into Q N u m mutually exclusive subintervals { Ω 1 , Ω 2 , , Ω Q N u m } , which are respectively mapped to 0 < ω 1 < < ω Q N u m < π . It can be defined as follows:
Q u a n t i z e ( ω N ) = ω 1 ω N Ω 1 ω Q N u m ω N Ω Q N u m
In this way, the adaptive adjustment of the CLM filter can be implemented through the following steps:
  • Estimate the power spectrum S X ( ω ) from the autocorrelation function vector R X ( i ) and calculate its lowpass cutoff frequency ω X .
  • Identify the corresponding subinterval ω X Ω q , q = 1 , , Q N u m , and set ω N = ω q , from which H C L M can be determined.
Thereby, QCLM is successfully implemented based on the lowpass characteristics of the data power spectrum, and this specific scheme is called QCLM-Lowpass. Next, we analyze the feasibility of applying QCLM-Lowpass for privacy preservation in continuous location data release.

4.2.3. Feasibility Analysis of QCLM-Lowpass

As for the two-dimensional location data series L ( I ) = [ X ( I ) , Y ( I ) ] , the privacy mechanism M in this paper is implemented by applying QCLM-Lowpass separately in the X and Y directions. Specifically, it contains two independently operating QCLM-Lowpass components, denoted as M X , M Y , and the privacy process is as follows:
  • At the release moment t i , the Laplace noise n X ( i ) , n Y ( i ) ~ L a p ( λ ) are generated independently by M X , M Y , and then the perturbed location can be obtained l ˜ ( i ) = l ( i ) + [ n X ( i ) , n Y ( i ) ] .
  • During the continuous location data release, the CLM filters in M X , M Y are separately adjusted by QCLM-Lowpass, so that the autocorrelation functions of N X ( I ) , X ( I ) , and N Y ( I ) , Y ( I ) satisfy the following condition:
    R N X ( i , i τ ) / R N X ( i , i ) = R X X ( i , i τ ) / R X X ( i , i ) R N Y ( i , i τ ) / R N Y ( i , i ) = R Y Y ( i , i τ ) / R Y Y ( i , i )
    Here, we analyze whether M can satisfy the privacy goals presented in Section 2.3.
a. 
The requirement of geo-indistinguishability.
Theorem  1.
The privacy scheme M satisfies 2 / λ -geo-indistinguishability at each moment.
Proof of Theorem 1. 
In this scheme, the left side of Equation (12) can be expressed as follows:
P r { M [ l ( i ) ] = l ˜ ( i ) } P r { M [ l ( i ) ] = l ˜ ( i ) } = L a p [ x ˜ ( i ) x ( i ) | λ ] L a p [ y ˜ ( i ) y ( i ) | λ ] d x d y L a p [ x ˜ ( i ) x ( i ) | λ ] L a p [ y ˜ ( i ) y ( i ) | λ ] d x d y = exp { 1 λ [ x ˜ ( i ) x ( i ) x ˜ ( i ) x ( i ) + y ˜ ( i ) y ( i ) y ˜ ( i ) y ( i ) ] } exp { 1 λ [ x ( i ) x ( i ) + y ( i ) y ( i ) ] } exp { 2 λ [ x ( i ) x ( i ) ] 2 + [ y ( i ) y ( i ) ] 2 }
Hence, the following result can be obtained:
P r { M [ l ( i ) ] = l ˜ ( i ) } P r { M [ l ( i ) ] = l ˜ ( i ) } exp { 2 λ d E [ l ( i ) , l ( i ) ] }
 □
b. 
The requirement of series-indistinguishability.
Theorem 2.
The privacy scheme M achieves series-indistinguishability if the correlation between X ( I ) and Y ( I ) can be ignored.
Proof of Theorem 2. 
In this situation, the series X ( I ) , Y ( I ) , N X ( I ) , N Y ( I ) are independent of each other, then the following cross-correlation functions can be obtained:
R X Y ( i , i τ ) = R Y X ( i , i τ ) = 0 R X ˜ Y ˜ ( i , i τ ) = R Y ˜ X ˜ ( i , i τ ) = 0
According to Equation (29), the normalized autocorrelation functions of X ˜ ( I ) , Y ˜ ( I ) satisfy the following:
R X ˜ X ˜ ( i , i τ ) R X ˜ X ˜ ( i , i ) = R X X ( i , i τ ) + R N X ( i , i τ ) R X X ( i , i ) + R N X ( i , i ) = R X X ( i , i τ ) R X X ( i , i ) R Y ˜ Y ˜ ( i , i τ ) R Y ˜ Y ˜ ( i , i ) = R Y Y ( i , i τ ) + R N Y ( i , i τ ) R Y Y ( i , i ) + R N Y ( i , i ) = R Y Y ( i , i τ ) R Y Y ( i , i )
Thus, the autocorrelation function matrix of L ˜ ( I ) satisfies the following:
R L ˜ ( i , i τ ) R L ˜ ( i , i ) = R X ˜ X ˜ ( i , i τ ) / R X ˜ X ˜ ( i , i ) 0 0 R Y ˜ Y ˜ ( i , i τ ) / R Y ˜ Y ˜ ( i , i ) = R X X ( i , i τ ) / R X X ( i , i ) 0 0 R Y Y ( i , i τ ) / R Y Y ( i , i ) = R L ( i , i τ ) R L ( i , i )
Thus, L ( I ) , L ˜ ( I ) are series-indistinguishable. □
c. 
The analysis of quantization errors.
This paper considers the more robust series-indistinguishability defined by Equation (7) and analyzes the effect of quantization errors introduced by QCLM-Lowpass on series-indistinguishability.
In some works [20,21], the location data series was described using a low-order autoregressive model. Consequently, this paper assumed that the noise series N satisfies a second order autoregressive model, and its system function can be expressed as follows:
H N ( z ) = 1 ( 1 ξ 1 z 1 ) ( 1 ξ 2 z 1 )
where ξ 1 , ξ 2 are the poles. It is considered that the cutoff frequency of the noise power spectrum ω N ( 0.05 π , 0.8 π ) , and ξ 1 , ξ 2 can be calculated by the following function:
arg min ξ 1 , ξ 2 1 , C N π π [ C N H N ( z ) | z = e j ω 2 2 S I L F ( ω | ω N ) ] 2 d ω
where C N denotes the amplitude coefficient. It is found that the results of Equation (31) are conjugate complex poles with the modulus and azimuth, γ , θ , which can be approximated in terms of ω N :
γ = ξ 1 ξ 2 0.0487 ω N 2 0.3544 ω N + 0.9802 θ = arccos ( ξ 1 + ξ 2 2 ξ 1 ξ 2 ) 0.1058 ω N 2 + 0.7907 ω N 0.0222
In this case, the normalized autocorrelation function, R ¯ N ( τ | ω N ) , can be expressed as follows:
R ¯ N ( τ | ω N ) = γ τ [ 1 γ 2 1 + γ 2 cot ( θ ) sin ( τ θ ) + cos ( τ θ ) ]
where τ denotes the lag of the autocorrelation function. As shown in Figure 8a, the larger the value of ω N , the faster R ¯ N ( τ | ω N ) decays to 0.
Hence, the difference in autocorrelation function caused by the quantization error Δ ω can be expressed as follows:
Δ R ¯ N ( τ | ω N , Δ ω ) = R ¯ N ( τ | ω N + Δ ω ) R ¯ N ( τ | ω N ) = R ¯ N ( τ | ω ˜ N ) Δ ω
where the mean value theorem is utilized, ω ˜ N ω N Δ ω , and R ¯ N ( τ | ω N ) is defined as follows:
R ¯ N ( τ | ω N ) = d R ¯ N ( τ | ω N ) d ω N = γ τ 1 [ τ cos ( τ θ ) τ γ 4 + 4 γ 2 τ ( 1 + γ 2 ) 2 cot ( θ ) sin ( τ θ ) ] d γ d ω N + γ τ [ 1 γ 2 1 + γ 2 τ sin ( 2 θ ) cos ( τ θ ) 2 sin ( τ θ ) 2 sin 2 ( θ ) τ sin ( τ θ ) ] d θ d ω N
By substituting Equation (32) into Equation (35), the approximation of R ¯ N ( τ | ω N ) can be obtained as shown in Figure 8b. Thereby, the following conclusions can be drawn:
  • Δ R ¯ N ( τ | ω N , Δ ω ) is bounded with respect to Δ ω .
  • Δ R ¯ N ( τ | ω N , Δ ω ) is not linearly related to Δ ω and shows a tendency to decay to 0 as ω N , τ increase.
Based on the same analytical approach, the above conclusions can be generalized to higher-order models.
In summary, the impact of the quantization error in QCLM-Lowpass on series-indistinguishability is controllable. Based on this quantitative relationship, the deviation of series-indistinguishability can be kept within the expected range by setting the quantization scheme. In addition, the division of ω N should not be uniform. Intuitively, the granularity of the division is finer in the low-frequency part, while the high-frequency part can be relaxed.

4.3. Algorithmic Implementation

There are some problems in the application of QCLM-Lowpass that should be discussed, and then the algorithmic implementation is given.

4.3.1. Quasi-Stationary State Identification

The correlation estimation method proposed in Section 4.1 is applicable only if the location increments satisfy the stationary conditions. Therefore, it is necessary to identify the stationary state of the location increments.
In this paper, the stationary conditions are relaxed to allow the mean and autocorrelation function to vary within a reasonable range, called the quasi-stationary state, denoted as Q S . Considering that V ( I ) is stationary when the user is moving in uniform linear motion, the quasi-stationary state is identified based on the change of the location increments’ modulus v ( i ) and azimuth φ ( i ) :
v ( i ) = [ v X ( i ) ] 2 + [ v Y ( i ) ] 2 φ ( i ) = arctan [ v Y ( i ) / v X ( i ) ]
For the data in window with size M V + 1 , V W ( i ) = [ v ( i M V ) , , v ( i ) ] T , let | v | i , | v | i denote the minimum and maximum modulus and Δ φ i denote the maximum change in azimuth:
| v | i = min k [ 0 , M V ] [ | v ( i k ) | ]
| v | i = max k [ 0 , M V ] [ | v ( i k ) | ]
Δ φ i = max k , m [ 0 , M V ] a n g l e [ φ ( i k ) φ ( i m ) ]
where a n g l e ( θ k , θ m ) is the angle between two azimuths:
a n g l e ( θ k θ m ) = θ k θ m θ k θ m π 2 π θ k θ m o t h e r
In the quasi-stationary state, | v | i , | v | i , Δ φ i should satisfy the following conditions:
Δ φ i Φ Q S
| v | i | v | i η Q S m = 1 M V | v ( i m ) |
( | v | i ) 2 ( | v | i ) 2 η Q S m = 1 M V [ | v ( i m ) | ] 2
where Φ Q S , η Q S , η Q S are the thresholds for relative changes in azimuth, modulus, and squared modulus, respectively. Intuitively, the smaller the value of Φ Q S , η Q S , η Q S , the more stationary the series V ( I ) .
To reduce the effects of estimation errors or transient changes in the data, the actual state is determined based on the estimated results over a period of time. Let S ^ ( i ) denote the estimated state from Equations (41)–(43) at the moment t i , and S ( i ) denote the actual state. As for the estimated results in the window with the size of M S ^ , if any of the following conditions hold:
k [ 0 , M S ^ 1 ] : S ^ ( i k ) = Q S
S ( i 1 ) = Q S a n d k [ 0 , M S ^ 1 ] : S ^ ( i k ) = Q S
then the actual state S ( i ) = Q S .
In addition, there are inevitable errors in V ( I ) , which affect the accuracy of the quasi-stationary state identification and the location correlation estimation. Therefore, the noise reduction is necessary to estimate the true location increments. However, how to achieve the adaptive filtering for V ( I ) is not the focus of this paper and is not discussed in detail here.

4.3.2. CLM Filter Design

The CLM filter is constructed by an all-pole filter, and the system function is as follows:
H C L M ( z ) = 1 m = 0 O C L M a m z m = 1 m = 0 O C L M ( 1 ξ m z 1 )
where O C L M denotes the filter order, the parameter vector A C L M = [ a 1 , , a O C L M ] while a 0 = 1 . Let Ξ C L M = [ ξ 1 , , ξ O C L M ] T denote the pole vector. To ensure the stability of the filter, it should be satisfied that | ξ m | < 1 , m = 1 , , O C L M .
According to Equation (9) with the convolution theorem, it can be seen that the Laplace noise series’ power spectrum S N ( ω ) is the result of the convolution of the correlated Gaussian noise’ power spectrum S G ( ω ) with itself. In the interval [ 0 , 2 π ] , it behaves as a circular convolution, which can be expressed as follows:
S N ( ω ) = 4 π 0 2 π S G ( υ ) S G [ ( ω υ ) m o d 2 π ] d υ
and S G ( ω ) can be calculated as follows:
S G ( ω ) = σ G 2 H C L M ( z ) | z = e j ω 2 2
Similar to Equation (26), the poles vector Ξ C L M can be computed by allowing S N ( ω ) to approximate S I L F ( ω | ω N ) , which can be expressed as follows:
arg min C N , Ξ C L M π π [ C N S N ( ω | Ξ C L M ) S I L F ( ω | ω N ) ] 2 d ω
where C N denotes the amplitude coefficient, and the CLM filter is limited to a lowpass filter. In this way, the parameter vector A C L M can be determined by the cutoff frequency ω N .
In QCLM-Lowpass, the CLM filter is set with different levels by parameter quantization, and these levels, corresponding to ω N = ω 1 , ω 2 , , ω Q N u m , are respectively denoted as L = 1 , 2 , , Q N u m . Intuitively, the correlation of the Laplace noise series will decrease as L increases. In practice, a matrix with size Q N u m × O C L M , denoted as P a r a m , is used to store the parameters of the CLM filter in different levels, where each row corresponds to one level. The level L can be used as the index to obtain corresponding parameter vector A C L M = P a r a m ( L ) .

4.3.3. Quantization Scheme

Recalling the basic requirements presented in Section 4.2.1 and the conclusions on quantization error in Section 4.2.3, based on experiments with real data, it is empirically found that the following quantization scheme achieves a better privacy-preserving effect,
Q u a n t i z e ( ω N ) = 0.1 π ω N 0 , 0.1 π 0.125 π ω N 0.1 π , 0.15 π 0.175 π ω N 0.15 π , 0.2 π 0.25 π ω N 0.2 π , 0.3 π 0.35 π ω N 0.3 π , 0.4 π 0.45 π ω N 0.4 π , π
Note that the value of Q u a n t i z e ( ω N ) is limited between 0.1 π and 0.45 π . This is because when ω N is less than 0.1 π , the poles of the CLM filter are close to the unit circle, and it takes longer for the filter to return to the steady state after adjustment. This affects the real-time processing performance of the scheme and also tends to make the filter unstable. In addition, due to the limited filter order, there is a trailing phenomenon in the noise power spectrum. As ω N increases, there will be some high-frequency noise that can be easily filtered out. Therefore, considering the power spectrum characteristics of real data, ω N is limited to no more than 0.45 π .

4.3.4. Level Identification

To satisfy the demands of real-time processing, it is necessary for the privacy mechanism to quickly determine the level L of the CLM filter. Given that R N ( τ ) and S N ( ω ) are Fourier transform pairs, there is a certain correspondence between them in the shape of the curves. Intuitively, the steeper the curve of R N ( τ ) , the flatter the curve of S N ( ω ) . Therefore, a linear regression is performed on R N ( τ ) and the opposite of its slope, denoted as χ , is considered as the feature parameter for level identification:
χ = 6 ϒ τ = 0 ϒ R N ( τ ) 12 τ = 0 ϒ τ R N ( τ ) ϒ ( ϒ + 1 ) ϒ + 2 R N ( 0 )
where ϒ denotes the maximum lag of R N ( τ ) . The larger the value of χ , the steeper the curve of R N ( τ ) and the lower the correlation of the noise series, requiring a higher level of the CLM filter.
Based on extensive real data, the distribution of χ corresponding to different quantized intervals in Equation (50) is analyzed, where ϒ = 3 . The details are given in Section 5.3.3. The following level identification method L = I d e n t i f y ( χ ) is empirically obtained:
I d e n f i t y ( χ ) = 1 χ [ 0 , 0.1055 ) 2 χ [ 0.106 , 0.1855 ) 3 χ [ 0.1855 , 0.2235 ) 4 χ [ 0.2235 , 0.3595 ) 5 χ [ 0.3595 , 0.4705 ) 6 χ [ 0.4705 , 1 )
Similar to the quasi-stationary state identification, the actual level of the CLM filter is determined based on the estimated results over a period of time. Let L ^ ( i ) denote the estimated level from Equation (52) at the moment t i , and the estimated level change is denoted as ψ ( i ) = sgn [ L ^ ( i ) L ^ ( i 1 ) ] , where sgn ( ) is sign function. Then, the results in the window with the size of M Ψ + 1 are used to determine the actual level L ( i ) . Specifically, the CLM filter is adjusted only if m [ 0 , M Ψ 1 ] : ψ ( i m ) = 0 :
L ( i ) = L ( i 1 ) + ψ ( i M Ψ )
In this way, the CLM filter can only be adjusted between adjacent levels to avoid drastic changes.

4.3.5. Gain Coefficients

As described in Section 3.2.2, due to the transient response, the actual output cannot be normalized based on the steady-state response during the transition phase. Therefore, it is necessary to set the gain coefficient vector K = [ κ 0 , , κ M T ] T for the transition phase, where M T is the window size of the transition phase, while κ m , m = 0 , , M T denotes the gain coefficient at the m -th moment after adjustment; in particular, let κ M T = κ C L M given in Equation (11).
Due to the parameter quantization, K can be easily obtained. More specifically, by simulating the CLM filter switched between different levels, the Laplace scales of the actual output at different moments can be estimated, so that the reasonable value of M T and K can be determined.
A matrix with size Q N u m × Q N u m × ( M T + 1 ) , denoted as G a i n C o e f f , is used to store the gain coefficients in different level shifts, and ( L 1 , L 2 ) is used to as the index to obtain the gain coefficient vector, K = G a i n C e o f f ( L 1 , L 2 ) , which corresponds to the CLM filter switching from level L 1 to level L 2 .

4.3.6. Implementation of QCLM-Lowpass

Similar to DCLM, QCLM-Lowpass is considered to be a program object in which CLM is a member. The main attributes and methods are given in Table 3.
At the beginning of the application, the method I n i t i a l i z a t i o n is used to set the attributes of QCLM-Lowpass and initialize CLM. By default, the CLM filter’s initial level is set as Q N u m , and its parameter vector and gain coefficient are set accordingly. At each moment, the method I t e r a t i o n is called to adjust the CLM filter and generate the Laplace noise, and its technical description is given in Algorithm 3.
Algorithm 3. QCLM-Lowpass: I t e r a t i o n
Input: the autocorrelation function vector R
Output: the Laplace noise n
1Calculate the feature parameter χ according to R ;
2Calculate the estimated level L ^ C L M according to χ ,
L ^ C L M = I d e n t i f y ( χ )
3Update Ψ ,
ψ m 1 ψ m m = 2 , , M Ψ + 1 ψ m sgn ( L ^ C L M L ^ ) m = M Ψ + 1
4if  ι M T  then
5  Keep the actual level unchanged L C L M = L ;
6  Update the index ι = ι + 1 ;
7  Set the gain coefficient κ = K ( ι ) ;
8  Update the gain coefficient of CLM
C L M . U p d a t e G a i n C o e f f ( κ )
9elseif  ψ 1 0 and m = 2 M Ψ + 1 ψ m = 0  then
10  Calculate the actual level L C L M = L + ψ 1 ;
11  Calculate the parameter vector A = P a r a m ( L C L M ) ;
12  Calculate the gain coefficient vector K = G a i n C o e f f ( L , L C L M ) ;
13  Set ι = 1 , and the gain coefficient κ = K ( ι )
14  Update the parameters of CLM
C L M . U p d a t e P a r a m ( [ ] , A ) C L M . U p d a t e G a i n C o e f f ( κ )
15else
16  Keep the actual level unchanged L C L M = L ;
17end if
18Update L ^ , L ,
L ^ L ^ C L M L L C L M
19Generate the Laplace noise n = C L M . L a p l a c e G e n e r a t i o n (   )
20return n
Lines 1–3 calculate the feature parameter χ , from which the estimated level L ^ C L M is obtained, and then update the estimated level change record. In lines 4–8, when the CLM filter is in the transient phase, only the gain coefficient is updated without adjusting the parameter vector. When the CLM filter is in the steady state, the estimated level change record is used to determine whether a parameter adjustment is required, and the conditions are shown in line 9. If it is, then the actual level of the CLM filter is calculated, and its parameters and the gain coefficient are updated in lines 10–14; otherwise, the filter is left unchanged in line 15. Finally, CLM is called to generate the Laplace noise and return in lines 19–20.

5. Experimental Evaluation

In this study, we conducted experiments on real-life datasets to evaluate QCLM-Lowpass. First, the stationarity and the power spectrum characteristics of the actual location data were analyzed. Then, the effectiveness of QCLM-Lowpass was verified. Finally, its performance in privacy protection and data availability were evaluated. All experiments were performed using MATLAB 2023a on a computer with the Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz and 16GB of memory.

5.1. Experimental Datasets

5.1.1. Real Datasets

The following three real-life datasets were used in experiments:
  • GeoLife [22,23,24]. The dataset contains trajectory data from 182 volunteers between April 2007 and August 2012, containing 17,621 trajectories with a total distance span of 1,292,951 km.
  • T-Drive [25,26]. This dataset collects the trajectories of 10,357 taxis in Beijing between 2 February 2008 and 8 February 2008, and contains more than 1.5 × 10 7 location points with a total distance of 9 × 10 6 km.
  • OpenStreetMap(OSM) (https://www.openstreetmap.org/traces, accessed on 17 July 2022). It is a collaborative online mapping project that allows users to share their trajectories. We downloaded 406,399 trajectories with more than 1.7 × 10 9 locations between May 2016 and May 2022, including location data with high-frequency sampling (less than 1 s) and high accuracy (less than 1 m).

5.1.2. Data Preprocessing

Since increased time interval Δ t leads to decreased data correlation and the appearance of spectral aliasing, this paper required that 1 Δ t 10 . The location series satisfying this condition were extracted from real datasets with linear interpolation, which was constrained by two conditions: (1) consecutive interpolations cannot exceed two times; (2) total interpolations cannot exceed 20% of the current data series length. After applying a length threshold of 200, a total of 177,089 location series were retained for analysis.
Figure 9 presents the distribution of sampling intervals and velocity in different datasets. In OpenStreetMap, approximately 92.3% of the data is collected with a sampling interval of 1 s, with 38.25% of these observations corresponding to low-velocity conditions (0–1.5 m/s). In the GeoLife dataset, the sampling intervals predominantly occurred at 1 s (40.55%), 2 s (31.89%), and 5 s (23.51%), covering a range of motion scenarios with varying velocities. Conversely, the T-Drive dataset exhibited sampling intervals chiefly at 5 s (62.30%) and 10 s (29.44%), which are typically associated with low-velocity conditions. These results demonstrate the diversity of the experimental data and reflect the applicability of the experimental conclusions in this paper.

5.1.3. Stationary Analysis

Recall from Section 4.3.1 that the location series whose increments satisfy the quasi-stationary state conditions were screened out, where the quasi-stationary state thresholds were set as Φ Q S = 5 π / 36 , η Q S = η Q S = 0.1 and the window size for estimation and state identification were M V = 60 / Δ t and M S ^ = 30 / Δ t .
Figure 10a presents the results in different sampling intervals. In the real datasets covered in this paper, the data series with quasi-stationary increments occupy 27.46%, where the results in the sampling intervals of 1 s, 2 s, and 5 s are higher than 25%. This indicates that the correlation estimation method proposed in this paper has a wider range of applicable scenarios.
Afterwards, segments with lengths less than 200 were filtered out, resulting in 134,691 location series for subsequent experiments.

5.1.4. Power Spectrum Characterization

Based on the correlation estimation method in Section 4.1, the power spectrum characteristics of real location data were analyzed using a third order AR model, where the estimation window size was set as M L = 60 / Δ t . The power spectrum of each sliding window in the X and Y directions was estimated and its 6 dB, 12 dB, and 20 dB attenuation frequency were counted, and the cumulative distribution are shown in Figure 10b. Results reveals that the energy of the power spectrum is mainly concentrated in the low frequency part, where the 20 dB attenuation frequency is mainly distributed below 0.04 Hz. To achieve series-indistinguishability, the noise power spectrum should also satisfy this lowpass characteristics, which is the basis of QCLM-Lowpass.

5.2. Experimental Configurations

5.2.1. Competitors

To evaluate the performance of QCLM-Lowpass, we compare it with several CLM filter adjustment schemes and a representative privacy scheme based on the Markov model:
  • IID. This is the classical Laplace mechanism that adds independently and identically distributed Laplace noise to the data series, which is used as the basic reference in this paper.
  • DCLM. Recall from Section 3.1 that this scheme calculates the CLM filter parameter directly based on the estimated data correlation, and thus dynamically adjusts the filter. In the experiments, the adjustment step is set as u D C L M = 0.01 .
  • NonQCLM. In contrast to QCLM-Lowpass, this scheme dynamically adjusts the noise power spectrum cutoff frequency ω N instead of quantizing it, with the adjustment step being set as μ ω = 0.01 in experiments.
  • Markov-GRR, which was proposed in [17]. The setup is described in Section 5.4.
In DCLM, NonQCLM, and QCLM-Lowpass, the CLM filter was implemented using a third order all-pole filter, with initial parameter values set according to the highest level in QCLM-Lowpass. The window size of correlation estimation was set as M L = 60 / Δ t . The window size of level identification and transition phase in QCLM-Lowpass were set as M Ψ = 30 / Δ t and M T = 30 respectively.

5.2.2. Evaluation Metrics

The following metrics were used to evaluate the effectiveness of the privacy scheme with respect to data availability and privacy protection.
The data availability was measured by the mean perturbation distance (MPD):
M P D = 1 | L | l L d E ( l , l ˜ )
where | L | denotes the num of locations, l , l ˜ denote the original and perturbed location, respectively. This metric is independent of the specific application and can therefore be considered a generalized availability evaluation metric [16]. Intuitively, the larger the MPD value, the more the data is disturbed, resulting in a greater loss of data availability.
To measure the privacy protection strength for the single location l , we calculated the ε l -geo-indistinguishability achieved by the privacy scheme within the region A r e a = { l | d E ( l , l ) r e f f } , i.e., for l ˜ , l ˜ A r e a , the privacy scheme M satisfies the following inequation:
P r [ M ( l ) = l ˜ ] P r [ M ( l ) = l ˜ ] exp [ ε l d E ( l ˜ , l ˜ ) ]
where r e f f denotes the radius of the focus area.
In addition, for the location dataset, L S e t , we characterized its overall privacy strength, E ϕ , 0 ϕ 1 , using the privacy strength satisfied by most locations in it, that is, for l L S e t :
P r ( ε l E ϕ ) 1 ϕ
Specifically, E 0 indicates the worst-case privacy loss. In this way, the impact of individual statistical results on the overall evaluation results can be reduced.
In the experiment, each privacy mechanism was repeated 500,000 times and the distribution of the noise at each moment was counted, from which MPD and the privacy strength was calculated.

5.2.3. Filtering Attack

The filtering attack [12] was used to evaluate the actual effectiveness of the privacy scheme. The difference in privacy strength before and after the attack indicates the ability to resist the correlation-based attack.
Specifically, the 20 dB attenuation frequency of the data power spectrum is considered the cutoff frequency, and a fourth order Butterworth model is adopted to design the lowpass filter to implement the attack, where the minimum normalized cutoff frequency is set as 0.1 π . For the time-varying data correlation, it is necessary to adjust the lowpass cutoff frequency accordingly, and the adjustment step was set to 0.05 in the experiment.

5.3. Validity Analysis of QCLM-Lowpass

There are two simplified operations in QCLM-Lowpass: CLM-Lowpass and parameter quantization. First, the effectiveness of CLM-Lowpass was analyzed, and then the effect of parameter quantization on privacy strength was analyzed. Finally, the performance of QCLM-Lowpass was demonstrated in continuous location data release.

5.3.1. Effectiveness of CLM-Lowpass

To validate the CLM-Lowpass, extensive stationary one-dimensional data series with different power spectra were simulated based on the results in Section 5.1.4, and then the privacy strength of original CLM and CLM-Lowpass were compared under the filtering attack.
Figure 11 illustrates the results of one simulation experiment, with the power spectrum of the target data and noise series shown in Figure 11a, where the lowpass characteristic is marked by the dashed line.
In this experiment, the length of simulated data series was 1000 and the privacy budget and sensitivity were set as ε = 0.2, 0.4, 0.6, and 0.8 and Δ f = 10 , while δ = 0.05 in the calculation of differential privacy strength ε under filtering attack. Figure 11b presents the results of different privacy schemes. The results of original CLM and CLM-Lowpass are very close, and the maximum relative percentage difference between them is 4.03%. This indicates that there is no significant difference between the two methods with the significance criterion of 5%. And this conclusion was supported by all simulation experiments. Therefore, CLM-Lowpass is feasible in the scenarios covered in this paper.

5.3.2. Effectiveness of Parameter Quantization

To analyze the effect of parameter quantization on the privacy strength, we compared the privacy strength achieved by noise series with different lowpass cutoff frequencies under the filtering attack.
In the experiment, the lowpass cutoff frequency of the data power spectrum, ω X , was simulated to vary from 0.05 π to 0.5 π with the interval 0.05 π . The corresponding lowpass filters were constructed for the filtering attack, and the noise series with the same cutoff frequency were generated by CLM-Lowpass while IID was considered as a reference, resulting in 110 different combinations. The length of the simulated data series was 1000, the scale of Laplace distribution and sensitivity were λ = 20 and Δ f = 10 , and δ = 0.05 in the calculation of differential privacy strength ε under filtering attack.
Here, the combination where the noise series has the same cutoff frequency as the data series is referred to as the series-indistinguishable scheme, denoted as S I . Under the same filtering attack, we computed the absolute value of the relative percentage difference, | R P D | , between different combinations with S I in terms of privacy strength ε , which can be calculated as follows:
| R P D | = | ε ε S I | ε S I × 100
where ε S I denotes the privacy strength in S I after the filtering attack.
As shown in Figure 12, only the results in the same row can be compared, where | R P D | = 0 corresponds to S I under the current filtering attack. There are some combinations close to S I that have absolute percentage differences of less than 5%, which are acceptable in practical applications. This indicates that the actual privacy strength can be acceptably maintained despite variations in the noise power spectrum induced by the parameter quantization. These results also provide a basis for the parameter quantization scheme given in Equation (50).

5.3.3. Feature Parameter in Level Identification

Recall from Section 4.3.4 that the feature parameter χ defined in Equation (51) is used to identify the CLM filter’s level in QCLM-Lowpass. Based on the real dataset, the distribution of χ corresponding to different levels was analyzed. The autocorrelation function and the power spectrum are estimated as in Section 5.1.4, while the max lag of the autocorrelation function was set as ϒ = 3 .
As can be seen from Figure 13, there are obvious differences between the cumulative distributions of χ corresponding to different levels. This indicates that χ is effective for level identification. Taking the ratio of 0.95 as a criterion, the level identification method given in Equation (52) was determined.

5.3.4. The Effectiveness of QCLM-Lowpass for Continuous Location Data Publishing

To demonstrate the performance of QCLM-Lowpass, we simulated a continuous publishing scenario using a randomly selected location series with a sampling interval of 5 s. Figure 14a,b show the location series and the velocity–time curve in the X and Y directions, respectively. Figure 14c demonstrates the cutoff frequency of the noise power spectrum in the QCLM-Lowpass and NonQCLM in the X and Y directions. It can be seen that both methods can track the changes of the data power spectrum, but QCLM-Lowpass based on parameter quantization allows the CLM filter to remain in the steady state for a longer time.
Under the setting of the Laplace distribution scale λ = 20 , radius of protected area r e f f = 50 m, we compared the degree of geo-indistinguishability ε achieved by these mechanisms at each moment after the filtering attack. As shown in Figure 14d, the expected privacy strength is indicated by the dashed line 2 / λ , and the results of DCLM, NonQCLM, and QCLM-Lowpass are close to it, demonstrating their effectiveness against the filtering attack. However, QCLM-Lowpass’s results are more consistent with expectations, while the other two show fluctuations due to the undesired transient response. In addition, CLM-Lowpass in NonQCLM may exacerbate variations of the filter parameter, resulting in greater fluctuations in the privacy strength.
In summary, QCLM-Lowpass has good performance both in terms of the stability of privacy protection for location series and the capability of resisting the filtering attack, and thus it is feasible to apply in continuous location data release.

5.4. Performance Evaluation

The actual performance of the privacy scheme was evaluated on real datasets, considering two main aspects: the privacy performance and the data availability.
  • For privacy performance, this paper focuses on the ability to resist correlation-based attacks, which was evaluated by comparing the change in privacy strength before and after the filtering attack.
  • For data availability, this paper evaluated the ability to balance privacy protection and data availability by comparing the data availability at the same level of privacy strength under the filtering attack.
These methods based on CLM in this paper were compared with the method based on a Markov model, called Markov-GRR [13]. Referring to the setup of that work, all trajectories within the second ring of Beijing (116°35′05″ E–116°45′59″ E, 39°87′36″ N–39°95′71″ N) were extracted from the GeoLife dataset to train the probability transfer matrix of the Markov model, i.e., the public matrix, and the region was divided into 200 × 200 grids, where each grid was approximately 44 × 44   m 2 .
From the datasets with Δ t = 1 s and Δ t = 5 s in Section 5.1.2, we randomly selected 10 location series within the second ring of Beijing as the experimental dataset. Considering that the noise cannot be too large in practical applications, the scale of Laplace distribution was set as λ = 20 , 30 , 40 , 50 , 60 , the parameter of general random response in Markov-GRR was set as ε G R R = 1.5 , 2 , 2.5 , 3 , 3.5 , 4 with δ G R R = 0.01 for the “ δ -location set”, and the radius of the focused area was r e f f = 50 m. The privacy strength of location dataset before and after the filtering attack, E 0.05 , E 0.05 , were compared.

5.4.1. Privacy Evaluation

Figure 15 shows the privacy strength achieved by different privacy schemes before and after the filtering attack, and the black dashed line E 0.05 = E 0.05 indicates the case where the privacy strengths do not change. As can be seen from Figure 15a,c, the Markov-GRR is located above the curve of E 0.05 = E 0.05 , which implies that the actual privacy strengths are significantly lower than the expected results. This is because the scheme independently selects a perturbed location from the “ δ -location set” at each moment, which is still an independent perturbation method, and thus struggles to resist correlation-based attacks.
In addition, as shown in Figure 15b,d, the results of DCLM, NonQCLM, and QCLM-Lowpass are close to the curve of E 0.05 = E 0.05 , which illustrates their capabilities against the filtering attack. Table 4 gives the relative changes in privacy strength, where QCLM-Lowpass exhibits lower changes in most cases, reflecting its better performance in privacy preservation.

5.4.2. Usability Evaluation

First, MPD defined in Equation (54) was used to evaluate the generalized data availability. In Figure 16, as the actual privacy strength decreases, i.e., E 0.05 becomes larger, MPD decreases correspondingly, resulting in better data availability. The performance of Markov-GRR is affected by the grid granularity; Figure 16a,c present the results under the gridding scheme in this paper. As shown in Figure 16b,d, QCLM-Lowpass is essentially located below the rest of the schemes, which implies that QCLM-Lowpass induces less data availability loss at the same level of privacy strength. In contrast, DCLM and NonQCLM induce unwanted transient responses during filter adjustment, which leads to increased loss of data availability.
Furthermore, we clustered all location series within the second ring of Beijing using DBSCAN [27], and evaluated the data availability achieved by the privacy scheme using homogeneity, completeness, and V-measure [28], as shown in Figure 17. In Markov-GRR, the perturbed locations are scattered over the grid, which leads to an increase in the cluster number and worse performance in terms of completeness. In the CLM-based schemes, the results of QCLM-Lowpass are located above other schemes at the same privacy strength, indicating that it can achieve better data availability in clustering application.
To summarize the above experimental results, it can be obtained that all of DCLM, NonQCLM, and QCLM-Lowpass can effectively resist the correlation-based attack, but QCLM-Lowpass can achieve a better balance between the actual privacy protection performance and the data availability.

6. Discussion and Conclusions

In this paper, the correlation Laplace mechanism was applied to ensure differential privacy protection for continuous release of real-time location data. To solve the problem of nonstationary location data correlation estimation, this paper analyzed the stationarity of real location data series and found that more than 27% of the real location series have approximately stationary increments. Therefore, a location data correlation estimation method based on the location increment was proposed, which provides support for CLM to track the time-varying data correlation. In addition, by exploiting the lowpass spectral characteristic of the location data series, CLM-Lowpass was proposed, in which series-indistinguishability is approximated by constructing a noise series with the same lowpass characteristics. This method simplifies the calculation process of CLM filters. Finally, an adaptive adjustment scheme for CLM filters based on parameter quantization, QCLM-Lowpass, was proposed to suppress the unwanted transient response when tracking time-varying data correlation. Extensive experiments show that the privacy scheme based on QCLM-Lowpass can provide a better balance between the ability to resist correlation-based attacks and data usability.
Although QCLM-Lowpass is effective, there are still some aspects to be improved in the future. First, there are opportunities to further optimize the parameter settings, such as quasi-stationary thresholds, quantization scheme, etc. Second, this paper ignores the inter-correlation between the data in the X and Y directions, which can be exploited by attackers to launch attacks on privacy preservation, and needs to be investigated in subsequent work on privacy preservation of 2D location data series.
Future work would consider more personalized privacy preservation, i.e., allowing the user to specify the degree of series-indistinguishability between the original and noise series, thus enabling a trade-off between location privacy and motion state privacy.

Author Contributions

Conceptualization, L.M. and Z.X.; methodology, L.M. and Z.X.; software, L.M.; validation, L.M.; writing—original draft preparation, L.M.; writing—review and editing, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 41971407).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Three publicly available datasets were analyzed in this study. Both datasets can be found here: [https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide, https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample, and https://www.openstreetmap.org/traces] (accessed on 17 July 2022).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, S.; Sinnott, R.O. Protecting Personal Trajectories of Social Media Users through Differential Privacy. Comput. Secur. 2017, 67, 142–163. [Google Scholar] [CrossRef]
  2. Katsomallos, M.; Tzompanaki, K.; Kotzinos, D. Privacy, Space and Time: A Survey on Privacy-Preserving Continuous Data Publishing. J. Spat. Inf. Sci. 2019, 19, 57–103. [Google Scholar] [CrossRef]
  3. Jiang, H.; Li, J.; Zhao, P.; Zeng, F.; Xiao, Z.; Iyengar, A. Location Privacy-Preserving Mechanisms in Location-Based Services: A Comprehensive Survey. ACM Comput. Surv. 2022, 54, 1–36. [Google Scholar] [CrossRef]
  4. Chatzikokolakis, K.; ElSalamouny, E.; Palamidessi, C.; Anna, P. Methods for Location Privacy: A Comparative Overview. Found. Trends Priv. Secur. 2017, 1, 199–257. [Google Scholar]
  5. Dwork, C. Differential Privacy. In Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4052, pp. 1–12. ISBN 978-3-540-35907-4. [Google Scholar]
  6. Zhao, X.; Pi, D.; Chen, J. Novel Trajectory Privacy-Preserving Method Based on Clustering Using Differential Privacy. Expert Syst. Appl. 2020, 149, 113241. [Google Scholar] [CrossRef]
  7. Kim, J.W.; Edemacu, K.; Kim, J.S.; Chung, Y.D.; Jang, B. A Survey of Differential Privacy-Based Techniques and Their Applicability to Location-Based Services. Comput. Secur. 2021, 111, 102464. [Google Scholar] [CrossRef]
  8. Ma, T.; Song, F. A Trajectory Privacy Protection Method Based on Random Sampling Differential Privacy. ISPRS Int. J. Geo-Inf. 2021, 10, 454. [Google Scholar] [CrossRef]
  9. Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What Can We Learn Privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
  10. Andrés, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-Indistinguishability: Differential Privacy for Location-Based Systems. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 901–914. [Google Scholar] [CrossRef]
  11. Cao, Y.; Yoshikawa, M.; Xiao, Y.; Xiong, L. Quantifying Differential Privacy in Continuous Data Release Under Temporal Correlations. IEEE Trans. Knowl. Data Eng. 2019, 31, 1281–1295. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, H.; Xu, Z.; Jia, S.; Xia, Y.; Zhang, X. Why Current Differential Privacy Schemes Are Inapplicable for Correlated Data Publishing? World Wide Web 2021, 24, 1–23. [Google Scholar] [CrossRef]
  13. Jiang, K.; Shao, D.; Bressan, S.; Kister, T.; Tan, K.-L. Publishing Trajectories with Differential Privacy Guarantees. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management, Baltimore, MD, USA, 29–31 July 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 1–12. [Google Scholar] [CrossRef]
  14. Chatzikokolakis, K.; Palamidessi, C.; Stronati, M. A Predictive Differentially-Private Mechanism for Mobility Traces. In Privacy Enhancing Technologies; De Cristofaro, E., Murdoch, S.J., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; Volume 8555, pp. 21–41. ISBN 978-3-319-08505-0. [Google Scholar]
  15. Al-Dhubhani, R.; Cazalas, J.M. An Adaptive Geo-Indistinguishability Mechanism for Continuous LBS Queries. Wirel. Netw. 2018, 24, 3221–3239. [Google Scholar] [CrossRef]
  16. Xiao, Y.; Xiong, L. Protecting Locations with Differential Privacy under Temporal Correlations. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security—CCS ’15, Denver, CO, USA, 12–16 October 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1298–1309. [Google Scholar] [CrossRef]
  17. Xiong, X.; Liu, S.; Li, D.; Wang, J.; Niu, X. Locally Differentially Private Continuous Location Sharing with Randomized Response. Int. J. Distrib. Sens. Netw. 2019, 15. [Google Scholar] [CrossRef]
  18. Wang, H.; Xu, Z. CTS-DP: Publishing Correlated Time-Series Data via Differential Privacy. Knowl. Based Syst. 2017, 122, 167–179. [Google Scholar] [CrossRef]
  19. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
  20. Elnagar, A.; Gupta, K. Motion Prediction of Moving Objects Based on Autoregressive Model. IEEE Trans. Syst. Man Cybern. Part Syst. Hum. 1998, 28, 803–810. [Google Scholar] [CrossRef]
  21. Zaidi, Z.R.; Mark, B.L. Mobility Tracking Based on Autoregressive Models. IEEE Trans. Mob. Comput. 2011, 10, 32–43. [Google Scholar] [CrossRef]
  22. Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.-Y. Mining Interesting Locations and Travel Sequences from GPS Trajectories. In Proceedings of the 18th international conference on World Wide Web—WWW ’09, Madrid, Spain, 20–24 April 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 791–800. [Google Scholar] [CrossRef]
  23. Zheng, Y.; Li, Q.; Chen, Y.; Xie, X.; Ma, W.-Y. Understanding Mobility Based on GPS Data. In Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Republic of Korea, 21–24 September 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 312–321. [Google Scholar] [CrossRef]
  24. Zheng, Y.; Xie, X.; Ma, W.-Y. GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data Eng. Bull. 2010, 33, 32–39. [Google Scholar]
  25. Yuan, J.; Zheng, Y.; Xie, X.; Sun, G. Driving with Knowledge from the Physical World. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’11, San Diego, CA, USA, 21–24 August 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 316–324. [Google Scholar] [CrossRef]
  26. Yuan, J.; Zheng, Y.; Zhang, C.; Xie, W.; Xie, X.; Sun, G.; Huang, Y. T-Drive: Driving Directions Based on Taxi Trajectories. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems—GIS ’10, San Jose, CA, USA, 2–5 November 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 99–108. [Google Scholar] [CrossRef]
  27. Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; AAAI Press: Portland, OR, USA, 1996; Volume 96, pp. 226–231. [Google Scholar]
  28. Rosenberg, A.; Hirschberg, J. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In Proceedings of the Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; Eisner, J., Ed.; Association for Computational Linguistics: Prague, Czech Republic, 2007; pp. 410–420. [Google Scholar]
Figure 1. Location data ecosystem based on local privacy protection.
Figure 1. Location data ecosystem based on local privacy protection.
Entropy 26 00138 g001
Figure 2. Two coordinate systems.
Figure 2. Two coordinate systems.
Entropy 26 00138 g002
Figure 3. The process of CLM.
Figure 3. The process of CLM.
Entropy 26 00138 g003
Figure 4. Real location series. (a) Real location series. (b) Location data series on X direction. (c) Location increment series on X direction.
Figure 4. Real location series. (a) Real location series. (b) Location data series on X direction. (c) Location increment series on X direction.
Entropy 26 00138 g004
Figure 5. Transition phase after adjustment. (a) Changes in the normalized power spectrum of the filter during the adjustment process. (b) The variance of the actual output at each moment normalized by the steady-state response.
Figure 5. Transition phase after adjustment. (a) Changes in the normalized power spectrum of the filter during the adjustment process. (b) The variance of the actual output at each moment normalized by the steady-state response.
Entropy 26 00138 g005
Figure 6. Different parameter adjustment schemes.
Figure 6. Different parameter adjustment schemes.
Entropy 26 00138 g006
Figure 7. CLM-Lowpass. (a) Lowpass characteristics. (b) Quantization of noise power spectrum cutoff frequency.
Figure 7. CLM-Lowpass. (a) Lowpass characteristics. (b) Quantization of noise power spectrum cutoff frequency.
Entropy 26 00138 g007
Figure 8. The results of the autocorrelation function in AR model. (a) R ¯ N ( τ | ω N ) . (b) R ¯ N ( τ | ω N ) .
Figure 8. The results of the autocorrelation function in AR model. (a) R ¯ N ( τ | ω N ) . (b) R ¯ N ( τ | ω N ) .
Entropy 26 00138 g008
Figure 9. The distribution of sampling intervals and velocity.
Figure 9. The distribution of sampling intervals and velocity.
Entropy 26 00138 g009
Figure 10. Statistic results. (a) Distribution of the data series with quasi-stationary increments. (b) Cumulative distribution of attenuation frequencies.
Figure 10. Statistic results. (a) Distribution of the data series with quasi-stationary increments. (b) Cumulative distribution of attenuation frequencies.
Entropy 26 00138 g010
Figure 11. Comparison results between the original CLM and CLM-Lowpass. (a) The noise power spectrum. (b) The actual privacy strength under the filtering attack.
Figure 11. Comparison results between the original CLM and CLM-Lowpass. (a) The noise power spectrum. (b) The actual privacy strength under the filtering attack.
Entropy 26 00138 g011
Figure 12. |RPD| (%).
Figure 12. |RPD| (%).
Entropy 26 00138 g012
Figure 13. Cumulative distribution of the feature parameter χ .
Figure 13. Cumulative distribution of the feature parameter χ .
Entropy 26 00138 g013
Figure 14. Comparison results in continuous location release. (a) Location series. (b) Velocity on the X, Y directions. (c) The cutoff frequency of noise power spectrum over time. (d) The privacy strength at each moment after the filtering attack.
Figure 14. Comparison results in continuous location release. (a) Location series. (b) Velocity on the X, Y directions. (c) The cutoff frequency of noise power spectrum over time. (d) The privacy strength at each moment after the filtering attack.
Entropy 26 00138 g014
Figure 15. Actual privacy strength under the filtering attack. (a) Compared with Markov-GRR under Δ t = 1 . (b) Compared without Markov-GRR under Δ t = 1 . (c) Compared with Markov-GRR under Δ t = 5 . (d) Compared without Markov-GRR under Δ t = 5 .
Figure 15. Actual privacy strength under the filtering attack. (a) Compared with Markov-GRR under Δ t = 1 . (b) Compared without Markov-GRR under Δ t = 1 . (c) Compared with Markov-GRR under Δ t = 5 . (d) Compared without Markov-GRR under Δ t = 5 .
Entropy 26 00138 g015
Figure 16. Mean perturbation distance and actual privacy strength. (a) Compared with Markov-GRR under Δ t = 1 . (b) Compared without Markov-GRR under Δ t = 1 . (c) Compared with Markov-GRR under Δ t = 5 . (d) Compared without Markov-GRR under Δ t = 5 .
Figure 16. Mean perturbation distance and actual privacy strength. (a) Compared with Markov-GRR under Δ t = 1 . (b) Compared without Markov-GRR under Δ t = 1 . (c) Compared with Markov-GRR under Δ t = 5 . (d) Compared without Markov-GRR under Δ t = 5 .
Entropy 26 00138 g016
Figure 17. Evaluation of clustering results. (a) Homogeneity. (b) Completeness. (c) V-measure.
Figure 17. Evaluation of clustering results. (a) Homogeneity. (b) Completeness. (c) V-measure.
Entropy 26 00138 g017
Table 1. The definition of major notations.
Table 1. The definition of major notations.
VariableDefinition
l ( t ) = [ x ( t ) , y ( t ) ] T a location data in XOY at time t
Δ t the time interval for data release
t i the time of i-th data release
l ( i ) = [ x ( i ) , y ( i ) ] T the simplified representation of l ( t i ) in time discretization
L ( I ) = [ X ( I ) , Y ( I ) ] the location data series up to I-th data release
R L ( i , i τ ) the autocorrelation function matrix of L ( I ) , R L ( i , i τ ) = E [ l ( i ) l T ( i τ ) ]
n ( i ) = [ n X ( i ) , n Y ( i ) ] T the perturbation noise added to l ( i ) at time t i
N ( I ) = [ N X ( I ) , N Y ( I ) ] the noise series up to I-th data release
l ˜ ( i ) = [ x ˜ ( i ) , y ˜ ( i ) ] T the perturbed location data at time t i
L ˜ ( I ) = [ X ˜ ( I ) , Y ˜ ( I ) ] the perturbed location data series up to I-th data release
R L ˜ ( i , i τ ) the autocorrelation function matrix of L ˜ ( I ) , R L ˜ ( i , i τ ) = E [ l ˜ ( i ) l ˜ T ( i τ ) ]
v ( i ) = [ v X ( i ) , v Y ( i ) ] T the location increment at time t i , v ( i ) = l ( i ) l ( i 1 )
| v ( i ) | , φ ( i ) the modulus | v ( i ) | and azimuth φ ( i ) of v ( i )
V ( I ) the location increment series up to I-th data release
Table 2. Main attributes and methods of the CLM object.
Table 2. Main attributes and methods of the CLM object.
CLM
Attributes
Q , U integerthe order of zeros and poles of the CLM filter
B = [ b 1 , , b Q ] 1 × Q vector b q , q = 1 , , Q are coefficients of the numerator of H C L M ( z ) , where b 0 = 1
A = [ a 1 , , a U ] 1 × U vector a u , u = 1 , , U are coefficients of the denominator of H C L M ( z ) , where a 0 = 1
κ , λ real κ : the gain coefficient; λ : the scale of Laplace distribution
S = [ s m , k ] 4 × Q matrixthe input state of four CLM filters, and each row corresponds to one filter
S = [ s m , k ] 4 × U matrixthe output state of four CLM filters and each row corresponds to one filter
G = [ g 1 , g 2 , g 3 , g 4 ] T 4 × 1 vector g m ~ N ( 0 , 1 ) , m = 1 ,   2 ,   3 ,   4 are the input of four CLM filters
G = [ g 1 , g 2 , g 3 , g 4 ] T 4 × 1 vector g m , m = 1 ,   2 ,   3 ,   4 are the output of four CLM filters
Methods
I n i t i a l i z a t i o n ( Q , U , B , A , κ , λ ) :
   Set Q , U , B , A , κ , λ ;
   Set S : 4 × Q zeros matrix, S : 4 × U zeros matrix.
{ g 1 , g 2 , g 3 , g 4 } = G a u s s i a n G e n e r a t o r (   ) :
   Generate four i.i.d random numbers g 1 , g 2 , g 3 , g 4 ~ N ( 0 , 1 ) .
U p d a t e P a r a m ( B , A ) : Update B , A .
U p d a t e G a i n C o e f f ( κ ) : Update κ .
n = L a p l a c e G e n e r a t i o n (   ) :
   Generate the Laplace distributed noise n .
Table 3. Main attributes and methods of the QCLM-Lowpass object.
Table 3. Main attributes and methods of the QCLM-Lowpass object.
QCLM-Lowpass
Attributes
P a r a m Q N u m × O C L M matrixthe parameter vectors at different levels, defined in Section 4.3.2
G a i n C o e f f Q N u m × Q N u m × ( M T + 1 ) matrixthe gain coefficients in different level shifts, defined in Section 4.3.5
K = [ κ m ] 1 × ( M T + 1 ) vectorthe gain coefficient vector
ι integerthe index of the gain coefficient vector
Ψ = [ ψ m ] 1 × ( M Ψ + 1 ) vector ψ m , m = 1 , , M Ψ + 1 is the estimated level change, defined in Section 4.3.4
L ^ , L integer L ^ , L denote the estimated and actual level of the CLM filter
Methods
I n i t i a l i z a t i o n ( P a r a m , G a i n C o e f f , Ψ W , L ^ , L , C L M ) :
   Set P a r a m , G a i n C o e f f ;
   Set Ψ : 1 × ( M Ψ + 1 ) zeros vector, and L ^ = L = Q N u m ;
   Set K = G a i n C o e f f ( Q N u m , Q N u m ) , and the index ι = 1 ;
   Initialize the member C L M ;
L ^ C L M = I d e n t i f y ( χ ) :
   Determine the estimated level L ^ C L M according to the feature parameter χ , defined in Section 4.3.4.
n = I t e r a t i o n ( R ) :
   Adjust the CLM filter based on the autocorrelation function vector R , and generate the Laplace noise n .
Table 4. Relative percentage difference in privacy strength under the filtering attack (%).
Table 4. Relative percentage difference in privacy strength under the filtering attack (%).
λ 2030405060
Δ t = 1 IID150.17107.0274.2544.8520.33
DCLM27.5414.402.35−6.41−8.07
NonQCLM33.774.54−5.52−9.61−11.88
QCLM-Lowpass11.541.12−4.18−6.77−5.63
Δ t = 5 IID52.4730.378.14−5.10−9.52
DCLM7.800.38−5.07−6.30−5.65
NonQCLM4.04−2.64−6.79−7.70−6.78
QCLM-Lowpass1.67−1.69−2.42−3.13−2.58
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mao, L.; Xu, Z. Differential Privacy Preservation for Continuous Release of Real-Time Location Data. Entropy 2024, 26, 138. https://doi.org/10.3390/e26020138

AMA Style

Mao L, Xu Z. Differential Privacy Preservation for Continuous Release of Real-Time Location Data. Entropy. 2024; 26(2):138. https://doi.org/10.3390/e26020138

Chicago/Turabian Style

Mao, Lihui, and Zhengquan Xu. 2024. "Differential Privacy Preservation for Continuous Release of Real-Time Location Data" Entropy 26, no. 2: 138. https://doi.org/10.3390/e26020138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop