Storage Efﬁcient Trajectory Clustering and k -NN for Robust Privacy Preserving Spatio-Temporal Databases

: The need to store massive volumes of spatio-temporal data has become a difﬁcult task as GPS capabilities and wireless communication technologies have become prevalent to modern mobile devices. As a result, massive trajectory data are produced, incurring expensive costs for storage, transmission, as well as query processing. A number of algorithms for compressing trajectory data have been proposed in order to overcome these difﬁculties. These algorithms try to reduce the size of trajectory data, while preserving the quality of the information. In the context of this research work, we focus on both the privacy preservation and storage problem of spatio-temporal databases. To alleviate this issue, we propose an efﬁcient framework for trajectories representation, entitled DUST (DUal-based Spatio-temporal Trajectory), by which a raw trajectory is split into a number of linear sub-trajectories which are subjected to dual transformation that formulates the representatives of each linear component of initial trajectory; thus, the compressed trajectory achieves compression ratio equal to M : 1. To our knowledge, we are the ﬁrst to study and address k -NN queries on nonlinear moving object trajectories that are represented in dual dimensional space. Additionally, the proposed approach is expected to reinforce the privacy protection of such data. Speciﬁcally, even in case that an intruder has access to the dual points of trajectory data and try to reproduce the native points that ﬁt a speciﬁc component of the initial trajectory, the identity of the mobile object will remain secure with high probability. In this way, the privacy of the k -anonymity method is reinforced. Through experiments on real spatial datasets, we evaluate the robustness of the new approach and compare it with the one studied in our previous work.


Introduction
The research area of moving object databases has become an emerging technological discipline, and has consequently gained a lot of interest during the last decade due to the development of ubiquitous location-aware devices, such as PDAs (Personal Digital Assistant), mobile phones, GPS-enabled (Global Positioning System) mobile devices, and RFID (Radio Frequency Identification), or road-side sensors.The technological achievements and advances in sensing and communication/networking, along with the innovative technological design features (thin and light) of computing devices and the development of embedded systems have enabled the recording of a large volume of spatio-temporal data.Mobile object trajectories are among the wide variety of spatio-temporal data that are especially important to scientists.Actually, they help them in discovering movement patterns (individual or group) and knowledge which, in recent literature, have been established as trajectory or mobility mining [1].Also, the technology of databases is evolving to support the querying and representation of the trajectory of moving objects (e.g., humans, animals, vehicles, natural phenomena).Hence, the main parts of trajectory data-mining include pre-processing, data management, query processing, trajectory data-mining tasks, and privacy protection [2].
Real-life applications, such as the analysis of traffic congestion, intelligent transportation, animal immigration habits analysis, cellular communications, military applications, structural and environmental monitoring, disaster/rescue management, as well as remediation, Geographic Information Systems (GIS), Location-Based Services (LBS), and other domains have increased the interest in the area of trajectory data-mining and efficient management of spatio-temporal data.
It should be noted that the explosive growth of social media has produced large-scale mobility datasets whose publication puts people's personal lives at severe risk.Indeed, users get used to sharing their most-visited or potentially sensitive locations, such as their home, workplace, and holiday locations that are easy to obtain through social media.Nowadays, the amount of spatio-temporal data has been growing exponentially.Therefore, there is an urgent need to develop efficient methods for storing and managing this large amount of information.A plethora of studies have been conducted for handling mobile objects' trajectory data.More precisely, several of them attempt to reduce the storage size [3][4][5], while others investigate the privacy preservation of trajectory data [6,7].Nowadays, not only are storage-efficient spatio-temporal transformation schemes needed, but also secure querying on large-scale spatio-temporal data [8].An accurate capture of a moving object trajectory usually needs a high sampling rate to collect its location data.Thus, massive trajectory data will be generated, which is difficult to fit into the memory for utilizing data-mining algorithms.A common idea is to compress the trajectory data to reduce the storage requirements while maintaining the utility of the trajectory.In the context of this work, we present the storage efficiency of dual methods and experiment on data from the SMaRT system, through which the data of moving object trajectories are generated and used as input to our methods in order to evaluate the security level they offer.As already stated, the privacy of the k-anonymity method recommended in [9] is reinforced.More specifically, we summarize the main contributions of our paper as follows: 1. We compare the proposed methods on addressing k-NN queries on moving objects' trajectories data, which are stored both in dual and native dimensional space.Our implementation shows that the innovative method of Dual Transformation constitutes a practical solution that can provide secure k-NN queries.2. We conduct an extensive experimental evaluation that studies various scenarios that can affect the vulnerability of the k-NN queries and proceed to a comparative analysis of the underlying methods.We prove the efficiency of our solution using real data drawn from SMaRT. 3. We recall two protocols for Pseudonyms Recovery and Registration with the aim of reinforcing the individuals' privacy in the released data.An individual cannot be re-linked to specific users with a high degree of certainty, as it is described in Section 3.7.
The rest of this paper is organized as follows: In Section 2, previous related works are presented in relation to our approach.The following are described in Section 3: (a) the dual transformation methods used; (b) the problem definition; (c) the problem formulation (d) the privacy-preserving analysis; (e) the experimental environment, and source of datasets.Section 4 presents the graphical outcomes gathered from experiments, while Section 5 evaluates experimental results in relation to the pros and cons of the proposed methods.Finally, Section 6 records the conclusions in terms of the studied problem and future directions of this work.

Related Work
In this section, we review existing related works in the domain of secure querying on spatio-temporal databases.Our discussion includes privacy-preserving approaches for trajectory-based queries.
In recent years, trajectory databases have constituted an important research area that has received a lot of interest.Most researchers have focused on the querying of moving objects and their trajectory.The so-called trajectory-based queries are also gaining much interest.The queries based on trajectory data require the knowledge of the whole, or at least a part of the mobile objects' trajectory to be processed.Such queries may provide useful information about an object's average speed, travelled distance, and so forth.In [8], three common mechanisms in privacy-preserving trajectory publishing are described.Generalization and suppression are the most common ones used to implement the k-anonymity.However, the main drawback of these mechanisms is that they suffer from a high possibility of information loss-thus, perturbation techniques based on randomization (e.g., adding noise) may be utilized as an alternative.
Actually, the problem of secure querying on spatio-temporal data in combination with k-anonymity has gained much attraction among researchers.Indeed, authors in [10] describe the historical k-anonymity based on each mobile user's trajectory data history, known as Personal History Locations (PHL).According to PHL anonymity, a user, U is camouflaged by k − 1 users whose PHLs have a common part with its own, rendering him/her indistinguishable among them.Privacy preservation is enforced as the generalization method has been applied.More specifically, by trying to preserve historical k-anonymity, authors increased the uncertainty related to the user's real location data at the time of the query by modifying the spatio-temporal information of the query.More precisely, in [11], by employing the k l anonymity privacy model, authors ensure that an intruder, who has knowledge of any sub-trajectory TS of size l of a user's trajectory T j , cannot distinguish their one among k − 1 trajectories that protect them with probability, based on TS, at most 1  k .In a more recent work [9], the authors investigated the privacy-preserving problem based on real spatio-temporal data.That paper employed the k-anonymity method and formed the anonymity set based on motion vectors with the aim of executing secure spatial k-NN queries.More specifically, the problem of k-anonymity from a dimensionality perspective and the impact of used dimensions on the vulnerability of suggested methods was investigated.The experiments presented the effectiveness of the proposed method, such as the clustering under particular attributes combination, and observed that it benefited from attributes suppression during the k-anonymity set computation.Authors in [12] suggested a novel spatio-temporal Mysql ReTrieval framework based on the MySQL and PostgreSQL database management system.In the context of that work, authors employed Hough-X transformation so as to evaluate the efficiency of range queries on nonlinear two-dimensional trajectories of mobile objects.Indeed, they demonstrated that the Hough-X dual approach, in combination with the range-tree variant, was quite efficient.
Generally, the trajectory of a mobile user is non-linear.However, it can be approximated by a discrete number of linear sub-trajectories with the use of a trajectory segmentation application.Each partition is represented by a line segment between two consecutive partition points, and is expected to provide an effective and efficient way to obtain insights into motion characteristics and behavioral preferences of mobile objects.Our approach performs low-rate sampling and considers linear interpolation between successive sampled points, where each line segment represents the continuous moving of the object during sampled points.The duality transformation of line segments operates as a pre-processing step and aims at increasing the security level and reinforcing the privacy of k-NN queries, which is the main subject of this work.Also, we have in our disposal linear components of the initial trajectory, as well as storage of the first and last spatial point in order to represent that line along with the dual representative, that is, the Hough-X (and/or Hough-Y) dual points.Lastly, this step will turn out to be useful from a storage perspective in Big Data applications, and will render the proposed methods a strong candidate for efficient querying on massive data, in combination with the appropriate indexing method.

Dual Transform for Moving Objects
In general, the geometric dual transform maps a hyper-plane h from R m to a point in R m , and vice versa.In this section, we briefly present how the duality transformation operates in a one-dimensional case.A line from the plane (t, y) or (t, x) is mapped to a point on the dual plane (see Figure 1).
1. Hough-X: The equation y(t) = ut + a is mapped to a point (u, a), where axes u, a represent the slope (that is, velocity) and intercept of an object's trajectory, respectively.Thus, we get the dual point (u, a), the so-called Hough-X transform.2. Hough-Y: The equation y(t) = ut + a is rewritten as t = 1 u y − a u , a different dual representation, the so-called Hough-Y transform.The point in the dual plane is represented as (b, c), where b = −a u (the intersection with the line y = 0) and c = 1 u .It is worth mentioning that the Hough-X transform cannot represent vertical lines, while horizontal lines cannot be represented using the Hough-Y transform.Nonetheless, both transforms are valid, since in our setting, velocity is bounded by [u min , u max ], and thus lines have a minimum and maximum slope.

kNN Classification and Clustering in Dual Space
Here, we consider points in dual space P. Given two dual points dp 1 and dp 2 , we define as dist(dp 1 , dp 2 ) the distance between dp 1 and dp 2 in P. In the context of this work, we utilize the Euclidean distance metric, which is defined as where dp 1 [i], dp 2 [i] denote the values of dp 1 , dp 2 along the i dimension in P. For example, in Hough-X space, the distance between the dual points dp

Definition 3.
Clustering: Given a finite data-set of dual points DP = {dp 1 , dp 2 ,. . ., dp N } in R p , and number of clusters K, the clustering procedure produces K partitions of DP such that among all K partitions (clusters) where |C c | is the number of dual points in cluster C c .
Note that the aforementioned dual methods act as a feature extraction technique.More specifically, they extract the dual point of each of the x, y coordinates of a mobile user trajectory.The k nearest neighbors algorithm is then applied on dual points features and allowed to return dual points, whose distance from the query dual point is less than the distance from the rest of the training dual points.Considering the Hough-X transformation of attribute x or y, the search area is a circle with the center being the query point and a radius such that k nearest neighbors exist.If we assume Hough-X of (x, y) attributes, the k nearest neighbor search area is four-dimensional (u x , a x , u y , a y ) with complex hypercube geometry.

Problem Definition
Here, we consider a database that records the location information of mobile objects in the two-dimensional space on a finite area.Also, we assume that objects move with small velocities that lie in the range [u min , u max ] starting from a specific location at a specific time-stamp and which move along a non-linear trajectory.In order to be able to store and handle queries in an efficient way, a mobile object's trajectory is approximated with a series of linear ones, as depicted in Figure 2.

Start Point
End Point

Figure 2.
A raw trajectory approximation with a discrete number of R linear sub-trajectories.In the dual dimensional space, each one is represented as a dual point-for example, the linear sub-trajectory [l(t 0 ), l(t 1 )] is represented as a dual point dp 1 , and the linear sub-trajectory [l(t 1 ), l(t 2 )] is represented as a dual point dp 2 .

Definition 4.
A linear trajectory is a straight line that an object keeps track of, starting from a location l(t 0 ) = [x 0 , y 0 ] at time t 0 .Then, its location for t > t 0 will be l where u = (u x , u y ) is the object's velocity in each plane [12].
Definition 5. A trajectory partition or sub-trajectory segment is a line segment L i L j , where for i < j, both points belong to the same trajectory and are connected in order to form a partition denoted by TS i [13].Authors in [14] claim that the compression ratio constitutes a common metric for evaluating the effectiveness of compression algorithms that can accurately reflect the change of a trajectory's data size.It is influenced by the original signal's data-sampling rate, as well as the quantization accuracy.

Problem Formulation
In the context of this study, the problem of privacy preservation when dealing with spatio-temporal databases goes one step further, and is related to the work [9].The spatio-temporal data is the location data of a number of mobile users along with the time-stamp of each position, as shown in Table 1.Through the SMaRT system, we have in our disposal offline trajectory data that give us information about Hough-X, as well as Hough-Y of spatial data (x, y).Hence, for each database record per time-stamp, that is, the mobile user trajectory point, we can consider the values of four attributes (x, y, θ, u) (as in Table 1) along with the values of an additional eight attributes' (U x , a x , U y , a y , b x , w x , b y , w y ) (as in Table 2).So, we have chosen to anonymize dual point attributes by employing the k-NN method, which enables us to form the k-anonymity set of each mobile object per time-stamp, as depicted in Table 3.The data anonymization is handled both as a clustering and a no-clustering problem.In both approaches, the anonymity set is formed again by the k nearest neighbors IDS.For each mobile user i and per time-stamp l, we compute its k nearest neighbors IDS and keep them in a vector with form knns il = [id il1 id il2 . . .id ilk ] for l = 1, 2, . . ., L. In Table 3, an example of such sets for N mobile users' dual points is presented.For each user, we measure the number of the k nearest neighbors dual points that remained the same from one time-stamp to another.By employing the dual transformation methods as described in Section 3.1, the k-anonymity set of mobile users is formulated based on their dual points.Hence, an alternative definition for the k-anonymity is as follows: Definition 10. (kDUST-anonymity).A transformed database record is k-anonymous with respect to Hough-X dual points-that is, velocity and intersection attributes (U x , a x ) or (U y , a y ), if k − 1 discrete records in the same specific time-stamp τ at least have the same dual point attributes so that no record of k is distinguished from its k − 1 neighboring records.
Remark 1.As we already mentioned in [9], k-anonymization intuitively hides each individual among k − 1 others.This means that linking cannot be performed with confidence greater than 1 k .Nevertheless, k-anonymity may not protect users against the unveiling of dual point attributes.

System Model
Here, we consider a spatio-temporal database with N records-that is, N moving objects in the xy plane.Each record (x j i , y j i ) represents the spatial coordinates of the mobile user j in time-stamp t j i , or point i of its trajectory j [15].From the location coordinates (x, y), we can extract the corresponding dual points by employing the methods described in Section 3.1.Suppose a trajectories database T = {T 1 , . . ., T N } of equal length L in which each trajectory is represented via a sequence of L triples, that is, For each point i in trajectory j, we define in four-dimensional space a vector DP j i = (U x ij , a x ij , U y ij , a y ij ) which denotes the dual points array.Hence, we can redefine and store the trajectory j as T j = {DP The privacy preservation of k-NN query in trajectory databases is addressed with the use of two different methods.The first one is entitled dual-based k-NN (DukNN) which applies k-NN directly onto dual points, while the second one is called dual-based clustering k-NN (DuCLkNN).The main difference between these two methods lies in the fact that the latter is applied in clustered dual point data.In addition, the operations involved in addressing a k-NN query are thoroughly described in Algorithms 1 and 2, respectively.end for 10: end for In the case of employing the Algorithm 1 in order to run a k-NN query, we must focus on a specific time-period during which we will have in our disposal the dual point of all users' locations.Given that each user stands in the same sub-trajectory during the study period, the privacy is preserved in that segment since the k nearest neighbors remain unchanged.On the other hand, in the case of employing Algorithm 2, the clustering step is ahead; we can again claim that the clusters composition remains the same, since the clustering method is applied in dual space and mobile users have the same dual point.As a result, the k nearest neighbors inside the cluster will remain the same.Hence, without loss of generality, in both cases, the privacy is piecewise preserved, except for the points of discontinuity (known as characteristic points) where the motion characteristics may change.

Vulnerability and Storage Efficiency
In this paper, we assume the mobile users' trajectory on a real map with small velocities; thus, we use the Hough-X transform, since an object's motion is mapped to the (U, a) dual point.To answer a k-NN query, the following steps are performed: 1. Decompose the k-NN query into 1D queries for the (t, x) and (t, y) projection.2. For each projection, get the dual k-NN query by using a Hough-X transform.3.Return the anonymity set, which contains the trajectories IDS that satisfy the dual k-NN query in each projection.
In following, the analysis is focused on the robustness estimation of the proposed approach based on Hough-X.Specifically, the ensuing steps are followed: 1. Split the initial trajectory into a number of linear sub-trajectories, each of which consists of the same number of M spatial points.2. Apply Hough-X in each part.
Suppose that M is the number of points of the 1D trajectory, which a dual point represents, and D is the number of dual points, which describe the 1D trajectory projection (t, x) or (t, y) in dual space.
Therefore, the whole trajectory has a length equal to DM spatial points, for which M D should hold.In the following, we camouflage a mobile user who keeps track of a linear trajectory x(t) or y(t) or its corresponding dual point with the k nearest neighboring dual points, which is very probable to remain the same in the next timestamp.Actually, while users move onto the linear sub-trajectory, which relates to the same dual point, the k-NN set will remain intact.Therefore, for as long as it happens, we can claim that the k-anonymity holds.Indeed, the privacy preservation is reinforced by a factor M that formulates the so-called vulnerability level to 1  kM .We recall the spatial data security metric that we have already defined in [9] for the quantification and of the robustness of our methods.Again, the vulnerability remains equal to 1  k in dual point space.Nonetheless, the definition of vulnerability in the initial dataset is measured as the following.Since the points inside a sub-trajectory are protected by the same dual points, it is obvious that their vulnerability is considerably reduced to 1  Mk ; this aspect entails that with a probability equal to 1  Mk , an intruder can distinguish the identity of a mobile user.The same holds for all sub-trajectories.Hence, the vulnerability in each projection is defined as: where V x and V y is the vulnerability measure based on Hough-X in projection (t, x) and (t, y), accordingly.
Next, the vulnerability in each projection is combined, and the total vulnerability is written as in the following equation: where ( 2 M ) represents all combinations of M points that correspond to 2 dual points of the initial trajectory.
Several trajectory compression approaches have been proposed aiming at reducing the trajectory's size.An initial discrimination classifies the compression methods either as offline (after trajectory generation) or online (instantly as objects move).The data compression constitutes a method that decreases the size of the data in order to limit the memory space and ameliorate the efficiency of storage, processing, and/or transmission without loss of information.Various trajectory compression algorithms exist in literature that try to balance the tradeoff between accuracy and storage size.We refer to some major ones-namely, distance-based, velocity-based, semantic, similarity-based, and priority queue [4].The proposed Hough-X based approach achieves trajectory compression suitable for either a single or multiple trajectory set.Without loss of information, Hough-X maps each linear sub-trajectory spatial point to its representative dual point.
Compression can be achieved by applying dimensionality transformation to increase the storage efficiency of the data.Suppose we reduce three-dimensional data (x, y, t) to Hough-X space of (t, x), that is, (U x , a x ).Storage space-saving is achieved through the number of available dual points D, being less than the number of points M in the corresponding linear sub-trajectory; hence, achieving in the whole trajectory CR = MD D or CR = M : 1, where for example, M spatial points correspond to one dual point, as shown in Figure 1.This conserves space and achieves more compression, as depicted in Figure 3, and thus it is expected to have a greater impact on large-scale spatio-temporal databases.
Potentially, by employing a dual method based on Hough-X, we could generate a trajectory codebook by applying Hough-X transformation to all linear sub-trajectories of a given set of trajectories in a map region.In the training step, dual points that stem from the same linear part are similar and must be grouped into the same cluster; also, each cluster is assigned to a single representative vector, called a dual code-vector.Hence, each trajectory inside the codebook is represented by its dual points.At this point, we should note that the Hough method acts as a clustering one.Actually, K-means is a popular method for both clustering and codebook design.In the coding step, each input dual points vector is compressed to the nearest dual code-vector referenced by a simple index.The index of the matched code-vector in the codebook is then transmitted to the decoder over a channel and is used by the decoder in order to retrieve the similar trajectory dual points from an identical codebook.The key operation is that it is stored and transmits the index of the dual code-vector, rather than the entire code-vector.
As a result, the recommended schema is space-compressed because of the duality, and also more robust in comparison with the suggested methods in previous works [9,12].

Privacy Preservation Analysis
Privacy relates to individual data protection and the human right to be able to determine the information about themselves that is to be hidden.Privacy-preserving data management includes k-anonymity, a noted method for data anonymization before publication, which has also been studied in the context of trajectory data.Authors in [16] claim that given a set of trajectories, the objective of the data publication is to transform them into some k-anonymized form in order to prevent original data publication, putting at risk the privacy of individuals related with the data.In addition, they mention that an intruder, who knows a sub-trajectory of the original trajectory of an individual, may utilize it with the aim of extracting the whole trajectory of that person based on the published data.Finally, they recognize an upper bound for the re-identification probability of the whole trajectory within the released data, namely 1/k, where the parameter k reflects the expected level of privacy.
Our solution transforms the original spatial point into the dual-point one using bijective mapping, such as Hough.This technique allows for a k-NN search directly on the transformed points, thus providing stronger location privacy.Assuming an insecure Transformed Database Management System (TDBMS) possibly located at a third party (e.g., a service provider in the cloud), an attacker sees its environment.In particular, the attacker has access to the transformed database, to the queries upon the transformed data, as well as to the results.Also, we suppose that the attacker is aware of the dual transformation scheme and aims to retrieve the original database executing Hough-X and/or Hough-Y algorithms with respect to the size of the database.Nonetheless, in our proposed paper, we aim to prevent an attacker from obtaining the original database, as it is possible that they may occupy extra knowledge about this original database.To better evaluate the power of the transformation scheme, we taxonomize the attacks into different levels based on the possessed knowledge.
1. Level 1: The attacker only observes the transformed database.2. Level 2: Except the transformed database, the attacker is familiar with a set of plain tuples of the original database, but does not know the corresponding encoded values of those tuples in the transformed database.3. Level 3: Apart from the transformed database, the attacker observes a set of tuples in the original database, and thus knows the corresponding encoded values of those tuples.
A few cryptography-based approaches, such as homomorphic encryption (HE), verifiable computation (VC), and secure multi-party computation (MPC) have been designed in order to provide secure big-data processing in the Cloud [17].However, other approaches, such as Asymmetric Key Cryptography and trusted Public Key Infrastructure have been developed over the years in order to support privacy preservation in the spatio-temporal domain.The basic idea behind these techniques is to encrypt the identity of the user prior to sending it to the service provider.In this way, the service provider does not have any knowledge about the real identity of the individual who initiated the k-NN query.To prevent an external adversary from linking queries to the same mobile object, its pseudonym has to be secure.For this reason, we are concerned about pseudonyms' recovery, as well as registration protocols consisting of three entities, namely, Users (U), Identity Provider (IP), and Service Providers (SP).Recall that they are based on Brand's credentials and have been suggested by Brand in the context of "The New System" with the aim of making the communication more reliable and secure.We believe that the adoption of these protocols will reinforce the identity privacy of mobile objects and the spatio-temporal databases at large.For the sake of completeness, the main steps of these protocols, along with the privacy preservation properties they offer, are presented.
The mobile user, U performs the following protocol in order to retrieve a set of pseudonyms with the identity provider (IP): Initially, user U chooses random values r (1,1) , r (1,2) , . . ., r (1,m) , e ∈ Z q where e is known only to user U, then computes the quantity g e m+1 ∈ Z q and finally sends it to IP (g 1 , g 2 , . . . ,g m+1 ∈ G q ).Secondly, IP recovers the quantity t .Thirdly, user U creates the r i 's according to the equation r i = r(1, i) + r(2, i) for i = 1, 2, . . ., m and computes the quantity t = g r 1 1 g r 2 2 . . .g r m m .Hence, the corresponding user creates m pseudonyms (P i , sign(P i )) and values s i ∈ Z q , such that P i = (t f 0 ) s i for i = 1, 2, . . ., m.
A mobile user U registers a pseudonym (P i , sign(P i )) with a service provider SP i presenting the pseudonyms (P i , sign(P i )) and uncovering the value r i encoded in P i .The user with the service provider SP i performs the following proof of knowledge, provided that P i = 1, so as the tuple (P i , sign(P i )) will be a valid one.
Then, the service provider SP i stores the tuple (P i , r i ) and associates it with either a new or an existing user account.Through this protocol, it is demonstrated that the user owns the pseudonyms and proves that the disclosed value r i is actually the value encoded in P i .
The privacy preservation lies in the following facts: 1.The service provider cannot find out any additional information about the quantities encoded in P i , except for the disclosed value r i .2. The random set (r 1 , r 2 , . . ., r m , e) is created so that nobody (user, identity provider) can control its end value.

Vulnerability Evaluation in Hybrid Space
In this subsection, we consider the fact that clustering takes place in spatial coordinate space while the k-NN query is in Hough space.In this case, the dataset concerns the compressed version of mobile users' trajectories as derived from SMaRT. Figure 7a,b depicts information about the initial trajectory's length and the selected points, as well as the compression ratio per trajectory ID.Note that the compression ratio in the system is computed as (1 − SelectedPoints InitialPoints ) × 100%, where the selected points are 100.From the dataset, we exclude 13 trajectories whose length is much less than 100, such as the average length of compressed trajectories.
Another observation is introduced in Figure 7c,d, where the vulnerability in hybrid space has similar behavior and performance with the one in spatial coordinates space.This relates to the fact that Hough-X constitutes a linear transformation on spatial coordinates.Again, employment of the suppressing method, as shown in Table 6, makes the vulnerability of the k-NN query with the clustering method even more secure.This work investigated the impact of Hough-X, which has already been applied in range queries, to the robustness of the methods proposed in [9] for addressing secure k-NN queries.The experimentation with the number of clusters K, which should be known in advance, obviously only affected the method utilizing clustering; there, it is obvious that vulnerability behaves a little worse as the cluster number increases.Indeed, when adding features, the data cluster density decreases, where the model becomes more sparse, and hence the clustering task becomes even more difficult.A usual phenomenon and important part in Machine Learning is the reduction of a higher dimensional space into a lower dimensional one in order to avoid the Curse of Dimensionality.
An important property of Hough is its robustness to low quality or uncertain data (either due to non-uniform sampling or noise) [21].Therefore, even if a trajectory is represented by different sample points in 2D Euclidean space, in Hough space it may have the same dual points.Under this condition, Hough space reflects mobility patterns better than the original trajectory spatial data (x, y), leading to more homogeneous clusters and improving the k-NN performance.As experimental results verify in Figure 6a,b, the above properties have a positive impact on vulnerability which is pair-wise preserved, showing that the clusters occur within cluster space-time similarity.The authors in [22] provide an efficient scheme for representative clustering on uncertain data.Finally, assuming feature suppression, the method with clustering demonstrates higher robustness or lower vulnerability, which is the main issue in k-anonymity, and thus in privacy preservation.It is a case which shows the superiority of the method with clustering in terms of vulnerability.Indeed, when the mobile users are protected by k nearest neighbors based on lower dimensionality data than the ones used in clustering, it is more difficult for an attacker, who has access to history data, to link the public information of the k − 1 nearest neighbors (that is, unlinkability).

Conclusions
In conclusion, we carried out research on privacy preservation based on real spatio-temporal data, through which we demonstrated the impact of parameters k and K in terms of the vulnerability of the proposed methods.We observed that the increase of k benefits both methods, verifying that the security of a mobile user is more robust when the latter is protected by a high number of nearest neighbors.This paper proposes the application of k-NN queries based on dual points of Hough-X projection with the aim of reinforcing anonymity of k-NN queries and decreasing storage requirements.More specifically, we investigated the problem from the perspective of dual point attributes.The experimental results indicate that although the outcomes of the Hough-X based vulnerability are not optimal in comparison with spatial coordinates space, the difference between them is less than 5%, which still makes Hough-X an appropriate choice for storage-efficient privacy preservation.
The SMaRT framework approximates users' non-linear trajectories with linear ones from time-stamp to time-stamp, and the current results a concerned with the low data-sampling rate.
A challenging and open issue is experimentation on the impact of the data sampling rate (low and high) in the described procedures and transformations.Also, we plan to extend and/or enhance proposed methods to be applicable to 3D (x, y, z) trajectories to represent the real situation of, for example, tracing the GPS trajectories of observed birds with devices or drones.In such a case, the dual methods can be applied in z projection in the same way as x, y ones.Additionally, we intend to evaluate the efficiency and scalability of the suggested approaches on big spatio-temporal databases in a distributed environment, that is, in the cloud, and compare its performance with appropriate indexing methods.Our aim is to make SMaRT suitable for supporting k-NN queries based on the proposed methods.
Ultimately, it will be useful to evaluate the time of some transactions (e.g., roll-back where the end user lost himself and decides to come back the same way to a certain point or look for another way), that is, how long the end-user will take to receive the answer from the Database Management System (DBMS) with the aforementioned implemented procedures compared to those used nowadays.

Figure 1 .
Figure 1.An overview of trajectory segmentation and Hough-X transformation for a linear trajectory segment (TS), which consists of M points.The dual points of M points in TS are the same, for example, a 1 = . . .= a M , u 1 = . . .= u M , where (a) shows the y(t) line and (b) shows the Hough-X points.

Definition 1 .Definition 2 .
DukNN: Given a dual point dp, a data-set of dual points Y and an integer k, the k nearest neighbors of dp from Y, denoted as DukNN(dp, Y), is a set of k points from Y such that ∀l ∈ DukNN(dp, Y) and ∀q ∈ {Y − DukNN(dp, Y)}, dist(l, dp) < dist(q, dp).DukNN Classification: Given a dual point dp, a training dual points data-set Y, and a set of classes Cl Y where the dual points of Y belong, the classification process produces a pair (dp,cl dp ), where cl dp is the majority class to which dp belongs.

Definition 6 .Definition 7 .Definition 9 .
Characteristic points are the points where the trajectory changes rapidly.The dual points array constitutes a set containing points of a trajectory that are represented in the dual space.Definition 8.A compressed trajectory path is a subset of the trajectory's points that indicate a significant change in the motion characteristics, that is, the speed or direction of a moving object.Given a trajectory T of size |T| and a compressed trajectory T c of T with size |T c |, the Compression Ratio (CR) is |T| |T c | .

Algorithm 1 1 : 6 : 7 : 1 :
DukNN input The number of k nearest neighbors 2: input The number of mobile users N 3: input The dual points array of N users in L time-stamps 4: output k nearest neighbors indexes of N users in L time-stamps 5: for i = 1 to L do for j = 1 to N do Apply k-NN for the dual points of all users in order to identify the set of k-NN indexes I j i of user j in time-stamp i input The number of k nearest neighbors 2: input The number of mobile users N 3: input The dual points array of N users in L time-stamps 4: output k-NN indexes of N users in L time-stamps 5: Apply K-Means of dual points (U x , a x ) of N users for the L time-stamps 6: for i = 1 : L do 7: for j = 1 : N do 8:Apply k-NN method between the dual point of user j and the dual point of users inside the cluster C j i of user j in time-stamp i and find the set of k-NN indexes I j i 9:

Figure 6 .
Figure 6.Clustering with (U x , a x ) and k-NN with (U x , a x ): (a) Mobile User 10 and (b) Mobile User 100 for N = 995 trajectories, L = 50 time-stamps, (c) Vulnerability measure in dual Hough-X and native dimensional space of (x, y).

Table 1 .
An overview of an original spatio-temporal database.

Table 2 .
An overview of the transformed spatio-temporal database.