Mining Individual Similarity by Assessing Interactions with Personally Significant Places from GPS Trajectories

Human mobility is closely associated with places. Due to advancements in GPS devices and related sensor technologies, an unprecedented amount of tracking data has been generated in recent years, thus providing a new way to investigate the interactions between individuals and places, which are vital for depicting individuals’ characteristics. In this paper, we propose a framework for mining individual similarity based on long-term trajectory data. In contrast to most existing studies, which have focused on the sequential properties of individuals’ visits to public places, this paper emphasizes the essential role of the spatio-temporal interactions between individuals and their personally significant places. Specifically, rather than merely using public geographic databases, which include only public places and lack personal meanings, we attempt to interpret the semantics of places that are significant to individuals from the perspectives of personal behavior. Next, we propose a new individual similarity measurement that incorporates both the spatio-temporal and semantic properties of individuals’ visits to significant places. By experimenting on real-world GPS datasets, we demonstrate that our approach is more capable of distinguishing individuals and characterizing individual features than the previous methods. Additionally, we show that our approach can be used to effectively measure individual similarity and to aggregate individuals into meaningful subgroups.


Introduction
Advancements in GPS devices and related sensor technologies have resulted in the generation of an unprecedented amount of tracking data, which has enabled investigations of human mobility across a wide range of disciplines, such as urban planning, traffic management, tourism, location-based services, and public health.Movement tracks are usually recorded as trajectories, which are temporal sequences of spatio-temporal points, such as (x, y, t).Among the large number of studies on trajectory data, researchers have shown particular interest in the trajectories of moving individuals because of the latent social and commercial benefits of these trajectories.In particular, mining the similarity between individuals based on trajectory data, which plays an important role in inter-trajectory studies [1], is a major focus due to its potential for use in characterizing individual movement features [2][3][4], inferring personal preferences [5,6], and predicting individuals' future positions [7].
In this paper, individual similarity refers to the commonality that two individuals share in their interactions with places.Most existing works concerning measuring individual similarity have been proposed in the context of location-based social networks services (LBSNSs), in which users' common interests are the main focus.Thus, previous methods of similarity measurement have been developed based on the sequential properties of visits to public places.For example, Zheng et al. [5] proposed step of the framework.Experiments using a real-world individual trajectory dataset are presented in Section 4. Finally, in Section 5, we draw conclusions, discuss limitations, and suggest future work.

Related Work
The framework proposed in this paper is closely related to previous studies of place semantics extraction and user similarity measurements.

Significant Place Semantics Extraction
Since Spaccapietra et al. first introduced the concept of semantic trajectory in [14], many researchers have attempted to enrich raw trajectory data with relevant semantic information.In the cited work, the authors proposed a well-known semantic trajectory model, known as stops and moves.In this model, stops are places where moving objects stay for a certain amount of time, and moves are the movements between any two stops.Motivated by this model, many studies on semantic trajectories have used stop detection as part of their semantic enrichment processes (e.g., [14][15][16]).Moreover, greater emphasis is placed on stops than on moves, because stops can be further clustered into visited places.Frequently visited places are termed "significant places" for the individuals, and they are believed to generally bear rich semantics, which is crucial for a better understanding of moving objects [17][18][19][20].Methods of extracting the semantics of significant places can be divided into two main types: location-based and time-based methods.
Pioneering location-based methods were proposed by Alvares et al. [19] and Bogorny et al. [20], who integrated background geographical information into trajectories and then extracted the semantics of a potential stop, when that stop intersected with a given geographical object for some minimum amount of time.Their preprocessing step required all geographic places relevant to the application of interest to be defined a priori.More general location-based methods have attempted to infer significant place semantics by associating points of interest (POIs) with stops that are based on spatial proximity.Xiao et al. [21] transformed individuals' location histories from the geographic space into a semantic space using a POI database.Considering the fact that the POI nearest to a stop may not be the place that was actually visited, they constructed a feature vector for each stay region that reflects the uncertainty of possible POI categories that are assigned to that region.In [22], Spinsanti et al. proposed a more sophisticated approach in which a probability-ranked list of possible POIs was generated for each visited place.After classifying the POIs into different categories, they computed the most likely POIs for each significant place by incorporating additional domain information about the POIs, such as their opening times.Next, they summed all of the probabilities for POIs belonging to the same category and used the aggregate probabilities to assign possible POI categories to each significant place to obtain the semantics.In general, the location-based studies discussed above failed to identify personally significant place semantics because they extracted places and their semantics solely from public geographic databases.
In contrast to location-based methods, which extract place semantics by comparing the spatial positions of visited places to those of predefined POIs, time-based methods extract the semantics of significant visited places based on the temporal signatures of those visits.Using the behavioristic assumption that "what you are can be determined by when you are", Ye et al. [23] identified the semantics of various places by observing when large numbers of users interacted with those places.However, they attempted to assign category tags to untagged common public places, and thus were unable to address the problem of identifying place semantics with personal meanings.Shen et al. [24] implemented the ST-DBSCAN to detect spatio-temporal regions of interest (ST-ROIs) and then differentiated between individuals' behaviors in the same ST-ROIs by assessing the differences between the individuals' visit times.Because this framework was proposed to identify activity groups, the authors divided the ST-ROIs into generic region types and grouped individuals by comparing the time that was allocated to different generic ST-ROI types instead of to personally significant places.To address the personal semantics gap, Andrienko et al. [25] used a procedure to find individual POIs and created "temporal signatures" that were characterizing the temporal distribution of a person's presence at each POI.Their experiments demonstrated that there are only a small proportion of significant places (e.g., homes and workplaces) whose semantics can be derived from temporal and statistical information.Thus, their results suggested that the personal meanings of individual POIs could not be inferred solely from their temporal signatures.

User Similarity Measurement
Many existing methods of user similarity measurement have been proposed to provide recommendation services.Li et al. [26] proposed a framework for modeling users' location histories and mining user similarity.They combined all users' trajectory data and hierarchically clustered those data into geographic regions, which were then used to build individual hierarchical graphs.When measuring the similarity between users, they incorporated both the sequence of visited regions and the geographic granularity at which similar sequences were found.Zheng et al. [5] extended the work of Li et al. [26] by using a new sequence matching strategy and considering the popularity of the visited locations.Specifically, the enhanced framework considered three aspects of users' location histories: the sequence of movements, the hierarchical properties of the geographic space, and the popularity of the visited places.The users' location histories were represented by hierarchical graphs (HGs), and similar sequences shared between two users in each layer of the hierarchy were further matched and used to calculate the similarity between the users.Although these approaches measure user similarity in geographic space, they do not consider geographic properties, such as the distances between locations.Moreover, they do not take the semantics of locations into account.The idea that semantic meaning should be considered when measuring user similarity was presented in the work of Ying et al. [9], who proposed the Maximal Semantic Trajectory Pattern Similarity (MSTP-Similarity) to measure the similarity between two maximal semantic trajectory patterns based on the longest common sequence (LCS) of these two patterns.Next, they extended the MSTP-Similarity to explore the similarity between two users, measuring user similarity based on a weighted average obtained by incorporating all possible MSTP-Similarities between the patterns from the two pattern sets.However, Chen et al. [27] found that this weighted average of pattern similarities that is proposed in [9] is unsuitable for measuring user similarity and cannot guarantee the maximum similarity between two identical users.Rather than considering all of the maximal patterns of the other user, as in the MSTP, they proposed an MTP-Similarity measurement that considers only the most similar pattern of each maximal sequence pattern.Both the MSTP and MTP calculate the similarity between maximal patterns in the same way-based on the lengths of the LCSs; they then use different strategies to integrate the similarity values between maximal trajectory patterns.
We suggest that the existing approaches for mining individual similarity have the following drawbacks: (1) While emphasizing the sequential properties of individuals' movements, most existing methods of similarity measurement have not considered the spatial and temporal aspects of those movements; thus, they cannot be used to assess the distinctive characteristics of individuals for many GIScience applications.(2) Although previous studies have incorporated semantic information in individual similarity measurements, many of them have focused on public places, while neglecting the essential role of personally significant places in characterizing individuals.In addition, most works have ignored the distinct semantics of different individuals' visits to the same places.

Overview
Individual human mobility is closely associated with places.Each individual moves from place to place to perform various activities, driven by either daily routines or interests [28,29].Studies have shown that individuals have a remarkable propensity to return to places that they frequently visit [30,31].Hence, how individuals spatially and temporally interact with their frequently visited places (i.e., personally significant places) shows promise for revealing individuals' characteristics [22,23].By investigating the spatio-temporal distributions of individuals' visits to their personally significant places, we can mine similar individuals and aggregate them into meaningful subgroups.For example, consider the distinctive characteristics of young versus elderly people: from a spatial perspective, elderly people's significant places tend to be distributed within a smaller area, whereas young people's significant places may be far apart, and long-distance commuting to them is common.From a temporal perspective, young people may visit their various significant places for long intervals of time during the day; they also frequently visit public places of interest at night.By contrast, elderly people's nighttime visits are generally limited to their homes.
In addition to the spatial and temporal properties of individual movements, we believe that semantic information must be considered.In contrast to some existing methods (for example, [32]), in which semantics is treated as an independent dimension, similar to space and time, we consider semantics as a precondition.Specifically, we consider the spatial and temporal properties of two individuals' visits to their significant places to be comparable only when they are identical from the semantic perspective.In this way, we can mine individual similarity by separately measuring individuals' movements that are related to different semantics, and we can apply this approach in various fields by synthesizing individual similarity in the relevant semantics.For example, we can identify families and colleagues who share high spatio-temporal similarities in their homes and workplaces, respectively.Additionally, we can identify friends or potential friends from their high degree of similarity in visits to entertainment venues, even if they do not share other regular visits.
Motivated by these goals, we propose a framework for mining individual similarity that consists of two major phases (see Figure 1).In phase 1, we first detect stay points using raw GPS trajectories; we then identify each individual's significant places by separately clustering their stay points using a density-based clustering algorithm.Based on the temporal signatures of their visits, personal places of interest (homes and workplaces, in this paper) are identified.The remaining significant places are assigned to public places of interest, such as shopping malls and hotels, using additional geographical contextual information (i.e., a POI dataset).In phase 2, we mine individual similarity using a new measurement, the Individual Similarity Measurement considering interactions with Personally Significant Places (ISM-PSP).The ISM-PSP computes the spatial and temporal similarity of two individual's visits when their personal semantics match.Based on the similarity scores for visits to personally significant places, which are grouped by diverse semantic information, the ISM-PSP measures the similarity between individuals by computing the weighted sum of their similarity scores, based on different semantics.
Below, we clarify some basic concepts and notations that are used in this paper prior to detailing the methods used in each step of our framework.

Definition 1. (GPS point and GPS trajectory)
A GPS point is a triple of the form p = (latitude, longitude, t) that represents a latitude-longitude location and a timestamp.A GPS trajectory is sequence of triples T = <p 1 , p 2 , . . ., p n >, where p i is a GPS point and p 1 .t< p 2 .t< . . .< p n .t.

Definition 2. (Stay point)
A stay point represents a geographic region in which an individual stays longer than a given time threshold θ time within a distance threshold θ distance .A stay point is denoted by a quadruple of the form s = (latitude, longitude, t arrive , t leave ), which represents the latitude-longitude location of s, and the individual's arrival time at s and departure time from s. Definition 3. (Individual significant place) An individual significant place is a collection of stay points denoted by SP i k = {s i k1 , s i k2 , • • • , s i kn }, where s i kj is the jth stay point corresponding to the kth significant place SP i k of a specific individual i.The coordinates of SP i k are represented by the average latitude and longitude of the constituent points.An individual significant place SP i k represents a region that is frequently visited by i; this fact implies that the place possesses some personal meaning for i.Given the diversity in possible personal meanings, the set of individual i's significant places SP i = {SP i 1 , SP i 2 , . . ., SP i K } is divided into personal places of interest, PeSP i and public places of interest, PuSP i .

Definition 4. (Personal place of interest and public place of interest)
A personal place of interest PeSP i k is a place that is frequently visited by an individual i due to its special personal meaning for i.A typical example of a PeSP i k is an individual i's home.By contrast, a public place of interest PuSP i k is a place that is of interest to the individual i and has a personal meaning for i that is identical to its functionality for the general public.Typical examples of PuSP i k are places where the individual i goes during his or her leisure time, such as a restaurant or a shopping mall.

Extracting the Semantics of Personally Significant Places
In this section, we describe the extraction of the semantics of individual significant places when considering the problem from the perspective of personal behavior.The details of the process are given in Algorithm 1. Foreach i∈I do s i = ∅; // stay points of i Foreach T i ∈TH i do s i .Add(StayPointDetection(T i , θtime, θdistance)); SP i = OPTICS(s i , r, MinPts); // obtain significant places from stay points PeSP i = MatchPe(SP i , STI, ε); // identify semantics of personal places of interest SP i = SP i − PeSP i ; PuSP i = MatchPu(SP i , POI); // identify semantics of public places of interest SPps i = PeSP i ∪ PuSP i ; SPps.Add(SPps i ); Return SPps;

Extracting the Semantics of Personally Significant Places
In this section, we describe the extraction of the semantics of individual significant places when considering the problem from the perspective of personal behavior.The details of the process are given in Algorithm 1.
; SP i = OPTICS(s i , r, MinPts); // obtain significant places from stay points PeSP i = MatchPe(SP i , STI, ε); // identify semantics of personal places of interest SP i = SP i − PeSP i ; PuSP i = MatchPu(SP i , POI); // identify semantics of public places of interest SP ps i = PeSP i ∪ PuSP i ; SP ps .Add(SP ps i ); Return SP ps ;

Identification of Individual Significant Places
As illustrated in Figure 2, a four-layered model is applied to identify the significant places with personal meaning for an individual.The lowest layer of the model consists of the raw historical GPS trajectory data of individual i, which are semantically poor.In the second layer, stay points are detected from every GPS trajectory in i's historical trajectory data.In the third layer, these stay points are clustered in order to identify the individual i's significant places.These clusters bear rich semantic information and are used to further extract personal places of interest and public places of interest.Finally, the extracted significant places with personal meaning constitute the top layer of the model.

Identification of Individual Significant Places
As illustrated in Figure 2, a four-layered model is applied to identify the significant places with personal meaning for an individual.The lowest layer of the model consists of the raw historical GPS trajectory data of individual i, which are semantically poor.In the second layer, stay points are detected from every GPS trajectory in i's historical trajectory data.In the third layer, these stay points are clustered in order to identify the individual i's significant places.These clusters bear rich semantic information and are used to further extract personal places of interest and public places of interest.Finally, the extracted significant places with personal meaning constitute the top layer of the model.Stay point detection is a fundamental problem in trajectory studies and has been addressed by numerous researchers.Common solutions to the problem include (1) density-based methods [33,34] that are derived from the well-known density-based clustering algorithm DBSCAN, which incorporates physical parameters of trajectories, such as speed, acceleration and changes in direction; (2) spatio-temporal constraint-based methods [26,35], in which a stay point is detected when a sub-trajectory remains within a spatial region for longer than a certain time threshold and within a certain distance threshold; and, (3) index-based methods [36], in which customized indices are used to measure the status of each trajectory point.In this paper, we use the intuitive concept of stay points as a starting point and then apply the most popular approach, the spatio-temporal constraint-based method.
After detecting stay points from raw GPS trajectories, in the third layer, we cluster these points separately for each individual to find the individual significant places.Specifically, for each individual, we cluster all of the stay points detected from that individual's trajectories by applying the density-based clustering algorithm OPTICS; this procedure identifies places that are frequently visited by that individual.The OPTICS algorithm was selected from among the several available clustering methods because it is rather insensitive to the input parameters; thus, a broad range of parameter settings can produce results of similar quality [37].

Semantic Interpretation of Individual Significant Places
At the identified significant places, individuals frequently participate in various activities.Hence, individual significant places generally bear rich personal meanings.In this step, the semantics of such places are extracted for each individual from that individual's perspectives.The results of this step constitute the fourth layer of our model, as shown in Figure 2.
Most previous studies [9][10][11][12][13] have interpreted the semantics of individual significant places by means of reverse geocoding (i.e., comparing the locations of significant places to those of predefined POIs).As mentioned earlier, the two major drawbacks of these approaches are that (1) they consider Stay point detection is a fundamental problem in trajectory studies and has been addressed by numerous researchers.Common solutions to the problem include (1) density-based methods [33,34] that are derived from the well-known density-based clustering algorithm DBSCAN, which incorporates physical parameters of trajectories, such as speed, acceleration and changes in direction; (2) spatio-temporal constraint-based methods [26,35], in which a stay point is detected when a sub-trajectory remains within a spatial region for longer than a certain time threshold and within a certain distance threshold; and, (3) index-based methods [36], in which customized indices are used to measure the status of each trajectory point.In this paper, we use the intuitive concept of stay points as a starting point and then apply the most popular approach, the spatio-temporal constraint-based method.
After detecting stay points from raw GPS trajectories, in the third layer, we cluster these points separately for each individual to find the individual significant places.Specifically, for each individual, we cluster all of the stay points detected from that individual's trajectories by applying the density-based clustering algorithm OPTICS; this procedure identifies places that are frequently visited by that individual.The OPTICS algorithm was selected from among the several available clustering methods because it is rather insensitive to the input parameters; thus, a broad range of parameter settings can produce results of similar quality [37].

Semantic Interpretation of Individual Significant Places
At the identified significant places, individuals frequently participate in various activities.Hence, individual significant places generally bear rich personal meanings.In this step, the semantics of such places are extracted for each individual from that individual's perspectives.The results of this step constitute the fourth layer of our model, as shown in Figure 2.
Most previous studies [9][10][11][12][13] have interpreted the semantics of individual significant places by means of reverse geocoding (i.e., comparing the locations of significant places to those of predefined POIs).As mentioned earlier, the two major drawbacks of these approaches are that (1) they consider only public places, and (2) they do not consider the personal meanings of those places.Inspired by the idea that semantic information about public places can be derived from mobility data at the collective level [23,38], our solution is based on the assumption that at the individual level, the semantics of personally significant places can be derived from a person's long-term trajectory data.To avoid the problems that are associated with existing studies, the inherent subjectiveness of individuals is considered.Therefore, individual significant places are divided into two types: personal places of interest and public places of interest.The semantics of each type are interpreted separately.
Personal places of interest, such as homes and workplaces, normally exhibit high levels of visit frequency and temporal regularity.Generally speaking, individuals spend the most time at their homes in the evening and at their workplaces during the daytime.Accordingly, we estimate the semantics of personal places of interest, as follows: We define a set of standard time intervals {STI ps }, in which STI ps = [t arrive , t leave ] is the typical temporal signature of a visit based on its personal semantics ps.For example, STI home = [00:00, 07:00]∪[19:00, 24:00], and STI work = [08:00, 17:00] workday .Given the set of an individual i's significant . ., SP i K }, we calculate its matching score for the personal semantics ps.For each ps, the significant place with the highest matching score, calculated as shown below, is assigned the corresponding personal meaning.
Most individuals have only one personal place of interest with respect to a given personal meaning-the one that shows the greatest similarity to the corresponding temporal signature.However, it is unlikely that a similar one-to-one mapping can be established between temporal signatures and the visits to public places of interest.For instance, at night, people may frequently go shopping, or to a gym, a park, or a bar.Such visits have undistinguishable temporal signatures, even though their semantics are disparate.Thus, the geographic contextual information is used instead of temporal signatures to interpret the semantics of public places of interest.
Specifically, for a set of an individual i's significant places SP i = {SP i 1 , SP i 2 , . . ., SP i K }, the identified personal places of interest PeSP i are first filtered out, and the remaining places are then regarded as public places of interest PuSP i .Next, the semantics of the public places of interest are extracted by associating each of these places with a spatial context (e.g., a POI).Given the set of remaining public places of interest PuSP i , for each PuSP i k , we compute the distance r between the center coordinates of PuSP i k and the farthest stay point in PuSP i k , and we construct a searching circle c of radius r (see Figure 3).Next, c is used to associate POIs with PuSP i k and to interpret the corresponding semantics.If at least one POI is contained in c, then we annotate PuSP i k with the category having the greatest numbers of POIs in c; otherwise, we find the nearest POI and annotate PuSP i k with its category.After extracting the semantics of PuSP i k , the temporal signatures of visits to PuSP i k can be used for verification.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 8 of 21 only public places, and (2) they do not consider the personal meanings of those places.Inspired by the idea that semantic information about public places can be derived from mobility data at the collective level [23,38], our solution is based on the assumption that at the individual level, the semantics of personally significant places can be derived from a person's long-term trajectory data.
To avoid the problems that are associated with existing studies, the inherent subjectiveness of individuals is considered.Therefore, individual significant places are divided into two types: personal places of interest and public places of interest.The semantics of each type are interpreted separately.
Personal places of interest, such as homes and workplaces, normally exhibit high levels of visit frequency and temporal regularity.Generally speaking, individuals spend the most time at their homes in the evening and at their workplaces during the daytime.Accordingly, we estimate the semantics of personal places of interest, as follows: We define a set of standard time intervals {STIps}, in which STIps = [tarrive, tleave] is the typical temporal signature of a visit based on its personal semantics ps.For example, STIhome = [00:00, 07:00]∪ [19:00, 24:00], and STIwork = [08:00, 17:00]workday.Given the set of an individual i's significant places we calculate its matching score for the personal semantics ps.
For each ps, the significant place with the highest matching score, calculated as shown below, is assigned the corresponding personal meaning.
Most individuals have only one personal place of interest with respect to a given personal meaning-the one that shows the greatest similarity to the corresponding temporal signature.However, it is unlikely that a similar one-to-one mapping can be established between temporal signatures and the visits to public places of interest.For instance, at night, people may frequently go shopping, or to a gym, a park, or a bar.Such visits have undistinguishable temporal signatures, even though their semantics are disparate.Thus, the geographic contextual information is used instead of temporal signatures to interpret the semantics of public places of interest.
Specifically, for a set of an individual i's significant places

Mining Individual Similarity
Intuitively, the more commonality that two moving objects share, the more similar they are [39].As a result, a universal measurement suitable for more general GIScience applications should be proposed.Here, we present such a new measurement, the Individual Similarity Measurement considering interactions with Personally Significant Places (ISM-PSP).The basic assumptions of the

Mining Individual Similarity
Intuitively, the more commonality that two moving objects share, the more similar they are [39].As a result, a universal measurement suitable for more general GIScience applications should be proposed.Here, we present such a new measurement, the Individual Similarity Measurement considering interactions with Personally Significant Places (ISM-PSP).The basic assumptions of the ISM-PSP are that (1) the spatial and temporal interactions of individuals with their significant places can be used to mine the similarity among individuals, which can help to characterize individuals' features, and (2) the significant places of two individuals can be compared only when they are semantically identical from each individual's own perspective.The details of the individual similarity measurement process are shown in Algorithm 2. n }; these sets are extracted from a's significant places SP a and from b's significant places SP b , respectively.We group all of a's significant places into n a groups, each with identical personal semantics among all the members of that group: SP a ps = {SP a k |SP k belongs to a and SP k .semanics= ps}.Next, we similarly group b's significant places into n b groups: SP b ps = {SP b k |SP k belongs to b and SP k .semanics= ps}.Example 1: Grouping the significant places of two individuals, a and b (Table 1).Five significant places {SP a 1 , SP a 2 , SP a 3 , SP a 4 , SP a 5 } are identified from a's historical trajectories.In accordance with the extracted semantics PS a = {Home, Workplace, Restaurant, Bookstore}, a's significant places are grouped into Home = {SP a

Measuring Individual Similarity
After the significant places have been grouped based on the diverse personal place semantics of each individual, the ISM-PSP measures the similarity between individual a and b, as follows: where w ps is the weight of personal semantics ps and is determined by the level of the importance of ps to the corresponding applications, and Sim ps (a, b) is the similarity score of a and b for their specific personal semantics ps, and is calculated as follows: In this way, we convert the individual similarity measurements into a sum over the set of spatial and temporal similarities between two sets.In the ISM-PSP approach, the similarity between two individuals is determined by measuring their spatio-temporal similarity for every type of personal semantics ps ∈ PS a ∪ PS b .For a given ps ∈ PS a ∪ PS b , we compute the spatial and temporal similarity between SP a ps and SP b ps using the following eaquation: where I(SP a k , SP b k ) is an indicator function that is defined as follows: Regarding spatial distance, as described in Definition 3, a significant place SP k consists of a collection of stay points.The coordinates of SP k are represented by the average latitude and longitude of the constituent points.The Euclidean distance between the coordinates of SP a k and SP b k is used as the spatial distance measurement.
Regarding temporal distance, we measure the temporal difference between a's visit to SP a k and b's visit to SP b k by dividing the day into 24 h and constructing an hourly distribution of each individual's visits to his or her significant places.Given the probability distributions pd 1 (t) and pd 2 (t) of two individuals' visits to their respective significant places, when the two places are semantically identical, their temporal distance is measured by the Kullback-Leibler divergence of pd 1 and pd 2 , as follows: To address the case in which D KL (pd 1 (t)||pd 2 (t)) becomes infinite when pd 1 = 0 but pd 2 = 0, a small constant C is introduced, and the Kullback-Leibler divergence is computed using a smoothing method.

Experiments
In this section, several experiments with the real-world Geolife dataset are performed to evaluate our proposed framework.The datasets and their preparation are described in Section 4.1.Section 4.2 corresponds to phase 1 of our proposed framework and presents the results of semantics extraction of personally significant places.In Section 4.3, to assess the performance of the proposed ISM-PSP in phase 2 of our proposed framework, we perform comparative experiments with two previous approaches and a modified version of the ISM-PSP.To illustrate the possible applications of our framework, we also use the proposed method to generate individual groups in Section 4.4.

GPS Trajectory Dataset
The GPS trajectory dataset was collected in the Geolife project by 182 users over a period of more than five years (from April 2007 to August 2012) [35].This dataset covers widely distributed areas, including over 30 cities in China and several cities that are located in the United States of America (USA) and Europe.In our experiments, we used only those trajectories from Beijing, which constitute the majority of the Geolife dataset.
The Geolife dataset contains records for a broad range of individuals' outdoor movements, covering daily routines such as going home and going to work as well as leisure activities, such as dining and shopping.In other words, this dataset includes visits to both personal and public places of interest.

POI Dataset
To interpret the semantics of public places of interest in Beijing, we obtained a dataset of public POIs in Beijing from Dianping.As shown in Table 2, this Beijing POI dataset includes 181,924 POIs, corresponding to eight major types of individual daily activities.

Semantics Extraction of Significant Places
This experiment was designed to interpret the semantics of individual significant places extracted from the Geolife historical trajectory data.
First, we used each individual's GPS trajectory histories to detect stay points by applying the spatio-temporal constraint-based method with the distance threshold set to 30 m and the time threshold set to 30 min.In total, 19,374 stay points were detected from the raw GPS trajectories of 182 individuals in this stage.Next, the OPTICS algorithm was applied to each individual's collected stay point data to identify that individual's significant places.During this process, we set the reachability-distance to 100 m and the value of MinPts to 10; consequently, in our experiment, a place was considered to be significant to an individual only when it was visited more than 10 times by that individual.A total of 154 significant places were discovered, and 63 of the 182 individuals had at least one significant place.Among these 63 individuals, the median number of significant places possessed by each individual was 2, and the maximum number was 10 (corresponding to individual #4).Since it can be inferred that the spatio-temporal interactions between individuals and places are generally similar in the Geolife dataset, we set the parameters relatively strictly to improve the accuracy of significant places identification.As a result, there were many individuals for whom no significant places were captured.
Then, we interpreted the semantics of the identified significant places.The semantics of two types of personal places of interest were extracted in this experiment: homes and workplaces.We set STI home = [00:00, 07:00] ∪ [19:00, 24:00] and STI work = [08:00, 17:00] workday .For each identified significant place, the home matching score was calculated first.If the highest home matching score for an individual exceeded the threshold value ε (ε was set to 0.3 in our experiment), the significant place corresponding to that matching score was interpreted as that individual's home.After filtering out the significant places identified as individuals' homes, we computed the workplace matching scores for the remaining significant places.Similarly, when the highest workplace matching score for an individual exceeded ε (ε was set to 0.3 in our experiment), the significant place corresponding to that matching score was interpreted as that individual's workplace.Figure 4 shows the results of semantics extraction.Regarding the personal places of interest, the identified homes were all located in the northern part of Beijing, whereas the identified workplaces were mostly located in the northwestern part of the city.This is because most of the individuals who participated in the Geolife program came from academic institutions in northwestern Beijing, such as Tsinghua University or Microsoft Research Asia.In addition, nine homes and 52 workplaces were discovered.The reason that more personal places of interest were identified as workplaces than as homes may be because the participating individuals tended to record their trajectories more often during the daytime than at night.Because we considered a place to be an individual's home only when his or her stay duration in that place sufficiently overlapped with STI home , sparse records may have caused failures in home identification.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 12 of 21 individual exceeded the threshold value ε (ε was set to 0.3 in our experiment), the significant place corresponding to that matching score was interpreted as that individual's home.After filtering out the significant places identified as individuals' homes, we computed the workplace matching scores for the remaining significant places.Similarly, when the highest workplace matching score for an individual exceeded ε (ε was set to 0.3 in our experiment), the significant place corresponding to that matching score was interpreted as that individual's workplace.Figure 4 shows the results of semantics extraction.Regarding the personal places of interest, the identified homes were all located in the northern part of Beijing, whereas the identified workplaces were mostly located in the northwestern part of the city.This is because most of the individuals who participated in the Geolife program came from academic institutions in northwestern Beijing, such as Tsinghua University or Microsoft Research Asia.In addition, nine homes and 52 workplaces were discovered.The reason that more personal places of interest were identified as workplaces than as homes may be because the participating individuals tended to record their trajectories more often during the daytime than at night.Because we considered a place to be an individual's home only when his or her stay duration in that place sufficiently overlapped with STIhome, sparse records may have caused failures in home identification.After the personal places of interest were excluded, the remaining significant places were compared to the Beijing POI dataset using the method that is described in Section 3.2.2.As shown in Table 3, 50 public places of interest were identified in total, and the majority of which were restaurants, education facilities and shopping locations.These public places of interest are distributed, as shown in Figure 4. We found that (1) many public places of interest were in close proximity to individuals' personal places of interest, implying that most individuals' leisure activities occurred near their homes and workplaces, and (2) other public places of interest were generally located in commercial centers, such as the eastern part of Beijing, which contains several major super-malls that are capable of satisfying individuals' higher entertainment needs.After the personal places of interest were excluded, the remaining significant places were compared to the Beijing POI dataset using the method that is described in Section 3.2.2.As shown in Table 3, 50 public places of interest were identified in total, and the majority of which were restaurants, education facilities and shopping locations.These public places of interest are distributed, as shown in Figure 4. We found that (1) many public places of interest were in close proximity to individuals' personal places of interest, implying that most individuals' leisure activities occurred near their homes and workplaces, and (2) other public places of interest were generally located in commercial centers, such as the eastern part of Beijing, which contains several major super-malls that are capable of satisfying individuals' higher entertainment needs.

Comparative Analysis of Individual Similarity Measurements
To evaluate the performance of the proposed framework for measuring individual similarity and to demonstrate its ability to discern individuals, a comparative analysis was performed between our method and the existing MSTP [9] and MTP [27] methods.In this experiment, each individual's trajectory data were divided into two parts, which were then treated as trajectories from two different people.Specifically, for each individual i's trajectory data, trajectories from odd weeks were assigned to u i , and trajectories from even weeks were assigned to v i .In this way, we generated two datasets, called Dataset #1 and Dataset #2.To implement the MSTP and MTP methods, the Beijing POI dataset was used to transform raw GPS trajectories into semantic trajectories.In addition, to verify the necessity of considering personal meaning when extracting the semantics of significant places, phase 1 of our proposed framework was modified to consider only public places.
For each individual u i in Dataset #1, four different individual similarity measurements were applied to find u i 's neighbors in Dataset #2 and the neighbors were sorted in descending order by their similarity scores.Here, the ground truth concerning the nearest neighbors was known, because for every individual, u i and v i were artificially generated from i. Thus, ideally, v i should appear in the first position of the neighbors list for u i .
To evaluate the performance of the individual similarity measurements, we defined a metric called the average rank, as follows.Given an individual u i , suppose that the obtained similarity score between u i and its neighbor nb l is denoted by sim u i , nb l and that the ordered neighbors list is denoted by N u i , where N u i = { nb 1 , sim u i , nb 1 , nb 2 , sim u i , nb 2 , . . ., nb m , sim u i , nb m }.Then, the average rank of the neighbor nb l , denoted by R(nb l ) , is defined as the average of nb l 's best and worst possible ranks in the ordered neighbors list, and is calculated as follows: Example 2: Calculating the average rank.
For the individual u 000 , three different similarity measurements are applied to calculate similarity scores and identify the neighbors.The results of the three similarity measurements are {(v 001 , 1), (v 000 , 0.8), (v 002 , 0.3)} (case #1), {(v 000 , 0.9), (v 001 , 0.9), (v 002 , 0.5)} (case #2) and {(v 000 , 0.9), (v 001 , 0.5), (v 002 , 0.5)} (case #3).Next, we calculate the average rank of v 000 (which is the true nearest neighbor of u 000 ) for the three different cases, as follows: Using the average rank, we can evaluate the results of the similarity measurements.We choose to use the average rank instead of the absolute rank because this metric represents the ability of the similarity measurements to find identical individuals, as well as to discern different individuals.In example 2, although v 000 was found to be the most similar to u 000 in both case #2 and case #3, R v 000 in case #3 is smaller because this was the only nearest neighbor obtained, whereas in case #2, v 001 was also considered to be the most similar to u 000 .For each individual u i in Dataset #1, we calculated the average rank of its corresponding ground-truth identical individual v i .A smaller R v i value indicates a better similarity result.In the ideal case, R v i should be equal to 1, meaning that v i was identified as the only nearest neighbor of u i .Therefore, we defined the average rank score to evaluate the performance of an individual similarity measurement by incorporating its ability to find the true identical individual v i for each u i .A better individual similarity measurement will have a higher average rank score.If for every u i , a similarity measurement could always achieve R v i = 1, its average rank score would be equal to 1.
Figure 5 shows the total numbers of v i whose average rank is less than or equal to various R values using the MSTP approach, the MTP approach, the ISM-PSP approach without personal semantics (ISM-PSP without ps) and the proposed ISM-PSP approach (which considers the perspective of personal behavior).As mentioned earlier, a better similarity measurement should identify the true nearest neighbors at smaller R values more often.The results show that the ISM-PSP method yields the highest number of v i with R = 1.Thus, the ISM-PSP results are the closest to the ground truth, indicating that the ISM-PSP is more capable of finding identical individuals than the other measurements are.In addition, both the ISM-PSP and the ISM-PSP without ps achieve better performance than the MSTP and the MTP approaches at smaller R values, because using the spatio-temporal properties of visits to significant places is more accurate than using the sequential properties.We note that in this case, the MSTP approach performs better than the MTP approach because when comparing two individuals u i and v i , for each LCS of u i , the MSTP compares all of the LCSs of v i , while the MTP compares only the nearest LCS of v i .Therefore, when the LCSs are generally short, as in our dataset, the MTP approach fails to obtain good results because of the lack of information.As R increases, the ISM-PSP still achieves the best results.However, for large R values, a large number of v i , for which the average rank is less than or equal to R is not necessarily a guarantee of good performance.An unsatisfactory similarity measurement could also generate many v i of average rank ≤ a large R value by increasing the sim u i , nb l values of more candidates nb l .In other words, such a method may appear to achieve good performance in finding identical individuals, but it improves its probability of including u i 's identical individual v i in the resulting neighbor list by including as many nb as possible.In this way, as R increases, more identical individuals v i are identified.
Therefore, we further compared the four individual similarity measurements based on their ability to identify v i in the first position (by absolute position rank), which reflects the probability with which they identify v i as the most similar to u i ; we also assessed the average rank score, which represents the ability to discern other different individuals.As shown in Figure 6, both the ISM-PSP and the MTP approaches have a high probability of successfully identifying v i in the first position.However, the average rank score of the MTP is much lower than that of the ISM-PSP, which means that for those v i in the first position obtained by the MTP, R v i is generally large.This finding indicates that the MTP achieves its high probability of finding identical individuals at the cost of generating many false nearest neighbors.We also compared the ISM-PSP results with and without ps, as shown in Figures 5  and 6.These figures show that the ISM-PSP approach is more capable of discerning individuals when place semantics are considered with personal meaning than when only the semantics of public places without personal meaning are interpreted.This result suggests that it is insufficient to assess the distinctive characteristics of individuals based solely on their visits to public places, as most previous studies have done.Thus, personally significant places are helpful in characterizing individuals and should not be neglected.

Grouping Individuals
Our proposed framework allows for us to investigate individual similarity using any single or combined semantics.Specifically, by setting different personal semantics ps in the ISM-PSP, one can compute the individual similarities in terms of different semantics aspects.Here, only personal places of interest were used in the example application.
In this experiment, we first mined individual similarities based solely on their visits interpreted as "going to work" (ps = {Workplace}).We set distThre spatial = 300 m and distThre temporal = 1, and we assigned two individuals to the same group only when their similarity was equal to 1. Figure 7 shows that four groups were identified, all of which correspond to research institutes and universities in northwestern Beijing.The largest group (Group #1) is Microsoft Research Asia, located on Zhichun Road in Haidian District; Group #2 is near the School of Software of Tsinghua University; Group #3 is in the Chinese Academy of Sciences; and, Group #4 is the Beijing University of Aeronautics and Astronautics.The results reflect the fact that most participants in the Geolife project worked at the research institutes and universities listed above [40].Our method successfully identified their workplaces and aggregated individuals into different groups based on the spatio-temporal similarity of their visits to their workplaces.

Grouping Individuals
Our proposed framework allows for us to investigate individual similarity using any single or combined semantics.Specifically, by setting different personal semantics ps in the ISM-PSP, one can compute the individual similarities in terms of different semantics aspects.Here, only personal places of interest were used in the example application.
In this experiment, we first mined individual similarities based solely on their visits interpreted as "going to work" (ps = {Workplace}).We set distThrespatial = 300 m and distThretemporal = 1, and we assigned two individuals to the same group only when their similarity was equal to 1. Figure 7 shows that four groups were identified, all of which correspond to research institutes and universities in northwestern Beijing.The largest group (Group #1) is Microsoft Research Asia, located on Zhichun Road in Haidian District; Group #2 is near the School of Software of Tsinghua University; Group #3 is in the Chinese Academy of Sciences; and, Group #4 is the Beijing University of Aeronautics and Astronautics.The results reflect the fact that most participants in the Geolife project worked at the research institutes and universities listed above [40].Our method successfully identified their workplaces and aggregated individuals into different groups based on the spatio-temporal similarity of their visits to their workplaces.After identifying individuals who exhibited high similarity in their visits to their workplaces, we also took the individuals' homes into consideration to discover groups whose visits to both their homes and workplaces were similar (ps = {Home, Workplace}).As shown in Figure 8, only two groups were identified under this constraint.Group #1 included individuals who worked at Microsoft Research Asia and lived at Tsinghua University, and Group #2 included individuals whose homes and workplaces were both at Tsinghua University.We inferred that Group #1 could include students from Tsinghua University who were interns at Microsoft Research Asia, whereas Group #2 consisted of students or staff who both lived and studied or worked at Tsinghua University.We separately calculated the temporal distributions of homes and workplaces visits for the individuals in Groups #1 and #2 (Figure 9).Substantial differences were found between the temporal signals of the home and workplace visits in both groups.When compared with the individuals in Group #2, the individuals in Group #1 were less likely to be found at home during the daytime, which is consistent with the daily routines of an internship.After identifying individuals who exhibited high similarity their visits to their workplaces, we also took the individuals' homes into consideration to discover groups whose visits to both their homes and workplaces were similar (ps = {Home, Workplace}).As shown in Figure 8, only two groups were identified under this constraint.Group #1 included individuals who worked at Microsoft Research Asia and lived at Tsinghua University, and Group #2 included individuals whose homes and workplaces were both at Tsinghua University.We inferred that Group #1 could include students from Tsinghua University who were interns at Microsoft Research Asia, whereas Group #2 consisted of students or staff who both lived and studied or worked at Tsinghua University.We separately calculated the temporal distributions of homes and workplaces visits for the individuals in Groups #1 and #2 (Figure 9).Substantial differences were found between the temporal signals of the home and workplace visits in both groups.When compared with the individuals in Group #2, the individuals in Group #1 were less likely to be found at home during the daytime, which is consistent with the daily routines of an internship.

Conclusions
Individuals have a remarkable propensity to return to their frequently visited places.Hence, the interactions between individuals and these places are likely to represent individuals' characteristics.To facilitate the capture of these characteristics of individuals and the mining of their similarity, this study investigated how individuals spatially and temporally interact with their personally significant places.A framework was presented for mining individual similarity based on visits to personally significant places extracted from long-term trajectory data.Our framework includes two major phases: extracting the semantics of personally significant places and mining individual similarity.In contrast to many previous studies, we extracted place semantics with personal meaning, and our semantic extraction process considered individuals' visits to both personal and public places of interest.We also proposed a new individual similarity measurement, the ISM-PSP, which incorporates both the spatio-temporal and semantic properties of individuals' visits to significant places.Experiments using a real-world GPS dataset suggest that (1) when compared with

Conclusions
Individuals have a remarkable propensity to return to their frequently visited places.Hence, the interactions between individuals and these places are likely to represent individuals' characteristics.To facilitate the capture of these characteristics of individuals and the mining of their similarity, this study investigated how individuals spatially and temporally interact with their personally significant places.A framework was presented for mining individual similarity based on visits to personally significant places extracted from long-term trajectory data.Our framework includes two major phases: extracting the semantics of personally significant places and mining individual similarity.In contrast to many previous studies, we extracted place semantics with personal meaning, and our semantic extraction process considered individuals' visits to both personal and public places of interest.We also proposed a new individual similarity measurement, the ISM-PSP, which incorporates both the spatio-temporal and semantic properties of individuals' visits to significant places.Experiments using a real-world GPS dataset suggest that (1) when compared with

Conclusions
Individuals have a remarkable propensity to return to their frequently visited places.Hence, the interactions between individuals and these places are likely to represent individuals' characteristics.To facilitate the capture of these characteristics of individuals and the mining of their similarity, this study investigated how individuals spatially and temporally interact with their personally significant places.A framework was presented for mining individual similarity based on visits to personally significant places extracted from long-term trajectory data.Our framework includes two major phases: extracting the semantics of personally significant places and mining individual similarity.In contrast to many previous studies, we extracted place semantics with personal meaning, and our semantic extraction process considered individuals' visits to both personal and public places of interest.We also proposed a new individual similarity measurement, the ISM-PSP, which incorporates both the spatio-temporal and semantic properties of individuals' visits to significant places.Experiments using a real-world GPS dataset suggest that (1) when compared with the existing approaches, the proposed ISM-PSP is more capable of finding identical individuals, while maintaining low numbers of false identifications; (2) more accurate identification of individuals can be achieved by considering the spatio-temporal properties of visits to significant places than by considering the sequential properties; and, (3) personal places of interest play a vital role in characterizing individuals, which indicates that the semantics of visits to significant places with personal meaning are important for assessing individual similarity.Therefore, we conclude that it is insufficient to measure individual similarity by only analyzing the sequential properties of visits to public places, as done in previous works.
Our study has several limitations.First, when extracting personal places of interest, we only identified the most common types (homes and workplaces) for illustration.However, personal places of interest actually include a much wider range of places.According to Definition 4, a personal place of interest to an individual could be any place that carries a special personal meaning that is distinct from its functionality for the general public.Different types of personal places of interest should be identified for specific applications.Second, during the semantic interpretation process, we inferred the semantics of personal places of interest by comparing the temporal distribution of a person's presence at a place against certain predefined typical temporal signatures.However, there are many people who deviate from a standard work schedule, and others may work at home.In these cases, our method presented in Section 3.2.2 could fail to extract the accurate semantics.An alternative method could be to identify homes as places where individuals spend most of their time at night and workplaces as places where individuals spend most of their time during the daytime on workdays.Third, our similarity measurement results could be significantly affected by the accuracy of semantic interpretation.This is because we designed the ISM-PSP based on the assumption that the spatio-temporal patterns of two visits are comparable only when they are driven by the same reason.In other words, we do not compare one individual's working behavior with another's dining behavior, although they might appear at the same restaurant.Therefore, errors in place semantics extraction could lead to poor results in measuring individual similarity.Fourth, although the proposed framework enables us to generate meaningful subgroups in any single or combined semantics by setting different ps in the ISM-PSP, in this paper, only personal places of interest were used to demonstrate the possible application of the proposed framework.The sparse records in our dataset restricted the types of significant places we were able to discover, thus restraining the types of groups that we could identify.
Future work will focus on improving the semantics enrichment process applied in the proposed framework.Other publicly available data in social networks (e.g., georeferenced posts on Twitter) can be used to explore place semantics [25,[41][42][43][44]. Through the synthesis of individual similarities related to appropriate semantics, our similarity measurement could be applied in other fields using different datasets.For example, our approach could reveal meaningful relationships by identifying individuals who work together and who also share high a spatio-temporal similarity in their visits to certain restaurants or bars.

Figure 1 .
Figure 1.The proposed framework for mining individual similarity.

Algorithm 1 .
PersonalSemanticExtraction (TH, STI, POI).Input: TH: The set of individuals' trajectories TH = {TH i | 1 ≤ i ≤ |I|} STI: The set of standard time intervals STI = {STIps} POI: The set of points of interest Output: SPps: The set of individuals' significant places with personal meaning SPps = {SP ps i |1 ≤ i ≤ |I|}

Figure 1 .
Figure 1.The proposed framework for mining individual similarity.

Algorithm 1 .
PersonalSemanticExtraction (TH, STI, POI).Input: TH: The set of individuals' trajectories TH = {TH i | 1 ≤ i ≤ |I|} STI: The set of standard time intervals STI = {STI ps } POI: The set of points of interest Output: SP ps : The set of individuals' significant places with personal meaning

Figure 2 .
Figure 2. Identifying individual significant places with personal meaning using a four-layered model.

Figure 2 .
Figure 2. Identifying individual significant places with personal meaning using a four-layered model.
the identified personal places of interest PeSP i are first filtered out, and the remaining places are then regarded as public places of interest PuSP i .Next, the semantics of the public places of interest are extracted by associating each of these places with a spatial context (e.g., a POI).Given the set of remaining public places of interest PuSP i , for each PuSP k i , we compute the distance r between the center coordinates of PuSP k i and the farthest stay point in PuSP k i , and we construct a searching circle c of radius r (see Figure3).Next, c is used to associate POIs with PuSP k i and to interpret the corresponding semantics.If at least one POI is contained in c, then we annotate PuSP k i with the category having the greatest numbers of POIs in c; otherwise, we find the nearest POI and annotate PuSP k i with its category.After extracting the semantics of PuSP k i , the temporal signatures of visits to PuSP k i can be used for verification.

Figure 3 .
Figure 3.An example of extracting the semantics of public places of interest.

Figure 3 .
Figure 3.An example of extracting the semantics of public places of interest.

Table 1 .
An example of grouping the significant places of two individuals, a and b.

Figure 4 .
Figure 4. Semantic interpretation of significant places.

Figure 4 .
Figure 4. Semantic interpretation of significant places.

21 Figure 5 .
Figure 5. Numbers of identified for which the average rank is less than or equal to .

Figure 6 .
Figure 6.Ratio of identifying in the first position and average rank scores using different methods.

Figure 5 . 21 Figure 5 .
Figure 5. Numbers of v i identified for which the average rank is less than or equal to R.

Figure 6 .
Figure 6.Ratio of identifying in the first position and average rank scores using different methods.

Figure 6 .
Figure 6.Ratio of identifying v i in the first position and average rank scores using different methods.

Figure 7 .
Figure 7. Aggregating individuals based on the similarity of their visits to their workplaces.

Figure 7 .
Figure 7. Aggregating individuals based on the similarity of their visits to their workplaces.

Figure 8 .
Figure 8. Grouping individuals based on the similarity of their visits to both their homes and workplaces.

Figure 9 .
Figure 9. Temporal signatures of the visits of individuals in Groups #1 and #2 to their homes and workplaces.

Figure 8 . 21 Figure 8 .
Figure 8. Grouping individuals based on the similarity of their visits to both their homes and workplaces.

Figure 9 .
Figure 9. Temporal signatures of the visits of individuals in Groups #1 and #2 to their homes and workplaces.

Figure 9 .
Figure 9. Temporal signatures of the visits of individuals in Groups #1 and #2 to their homes and workplaces.

2018, 7, x FOR PEER REVIEW 6 of 21 Definition 4. (Personal place of interest and public place of interest) A personal
ISPRS Int.J. Geo-Inf.

Table 3 .
Public places of interest identified.