An Improved DBSCAN Algorithm to Detect Stops in Individual Trajectories

.


Introduction
In recent years, miniaturized GPS (global positioning system) devices have become more widely used in daily life and large amounts of target trajectory data can be easily recorded.For instance, people's daily activity trajectories can be recorded by car GPS equipment and GPS-enabled mobile phones.A common trajectory of a person's daily life is illustrated in Figure 1.Useful information can be extracted from these trajectories and they can be used to benefit daily life.As a result, many location-based services, such as position-based recommender systems and destination prediction systems, are receiving increasing attention from both users and developers.The primary concern of location-based applications is how to understand the semantic meaning of a trajectory, and not just to consider trajectory as a combination of recorded points.The work in Reference [1] proposed a conceptual model to present trajectories with semantic annotations, allowing one to assign semantic information, such as moves and stops, to specific parts of trajectories.Stops in trajectories represent the trajectory segments corresponding to a person's stay in certain locations.Moves correspond to the trajectory segments created by the motion of a target between stop locations.in certain locations.Moves correspond to the trajectory segments created by the motion of a target between stop locations.Stop locations in a trajectory are an indispensable part of various applications, such as purpose prediction services, navigation services, and generic or personalized recommendations.In this paper, the problem of how to extract stop locations from trajectories is called stop detection.In the literature, many models have been proposed to divide a trajectory into stop parts and move parts.Research on stop detection can be divided into two categories: static methods and dynamic methods.Important positions are defined in advance for static techniques [2,3], while no prior knowledge regarding stops is given for a dynamic approach.Recently, several papers have studied the dynamic solution by considering different aspects of mobility characteristics, such as velocity characteristics.Typically, general clustering algorithms, which are able to cluster one's stop locations by assigning different constraints to different features, are adopted in the dynamic solution.
In general, most of the existing clustering methods used in stop detection suffer from their respective drawbacks.First, the value of commonly used characteristics in these clustering methods, such as speed, intensely fluctuates when dealing with a real trajectory.We provide a qualitative analysis about the speed feature in Section 3. Furthermore, this problem further leads to the second set of drawbacks, namely that, in most cases, the algorithms need to be given manually-set parameters for different features, which is a difficult task for users due to the fluctuations described above.Finally, most of the clustering-based algorithms take the number of GPS points within a given distance as a measurement of density.As a result, these methods ignore sequential features and the results of these works are dramatically affected by distance parameters.Additionally, this will be worse when multiple features are considered together, since users have to specify a parameter for each feature, respectively.
In this paper, taking the aspects described above into consideration, we constructed a new, comprehensive, hybrid feature-based density measurement method.In our method, we define a new concept for move ability and apply data field theory, proposed in Reference [4], to measure the density around a GPS point; the new concept of move ability is considered by giving density a move ability-dependent weight.In our work, the density threshold is automatically determined when calculating core points.After that, we use our density measurement method to improve the original DBSCAN (density-based spatial clustering of applications with noise) algorithm.
The rest of our paper is organized as follows.Some common stop detection algorithms are presented in Section 2. In Section 3, we give the definitions of some basic concepts, for example, detailed definitions of GPS trajectory and stops.After describing our improved DBSCAN algorithm in detail in Section 4, we validate our method with real datasets both in terms of feasibility and efficiency by comparing it with the four other algorithms in Section 5. We conclude our work in Section 6. Stop locations in a trajectory are an indispensable part of various applications, such as purpose prediction services, navigation services, and generic or personalized recommendations.In this paper, the problem of how to extract stop locations from trajectories is called stop detection.In the literature, many models have been proposed to divide a trajectory into stop parts and move parts.Research on stop detection can be divided into two categories: static methods and dynamic methods.Important positions are defined in advance for static techniques [2,3], while no prior knowledge regarding stops is given for a dynamic approach.Recently, several papers have studied the dynamic solution by considering different aspects of mobility characteristics, such as velocity characteristics.Typically, general clustering algorithms, which are able to cluster one's stop locations by assigning different constraints to different features, are adopted in the dynamic solution.
In general, most of the existing clustering methods used in stop detection suffer from their respective drawbacks.First, the value of commonly used characteristics in these clustering methods, such as speed, intensely fluctuates when dealing with a real trajectory.We provide a qualitative analysis about the speed feature in Section 3. Furthermore, this problem further leads to the second set of drawbacks, namely that, in most cases, the algorithms need to be given manually-set parameters for different features, which is a difficult task for users due to the fluctuations described above.Finally, most of the clustering-based algorithms take the number of GPS points within a given distance as a measurement of density.As a result, these methods ignore sequential features and the results of these works are dramatically affected by distance parameters.Additionally, this will be worse when multiple features are considered together, since users have to specify a parameter for each feature, respectively.
In this paper, taking the aspects described above into consideration, we constructed a new, comprehensive, hybrid feature-based density measurement method.In our method, we define a new concept for move ability and apply data field theory, proposed in Reference [4], to measure the density around a GPS point; the new concept of move ability is considered by giving density a move ability-dependent weight.In our work, the density threshold is automatically determined when calculating core points.After that, we use our density measurement method to improve the original DBSCAN (density-based spatial clustering of applications with noise) algorithm.
The rest of our paper is organized as follows.Some common stop detection algorithms are presented in Section 2. In Section 3, we give the definitions of some basic concepts, for example, detailed definitions of GPS trajectory and stops.After describing our improved DBSCAN algorithm in detail in Section 4, we validate our method with real datasets both in terms of feasibility and efficiency by comparing it with the four other algorithms in Section 5. We conclude our work in Section 6.

Related Works
In this section, we provide a survey of the clustering algorithms described or analyzed in the literature.Various of methods can be used to extract the stop locations in GPS trajectories.In general, the approaches for stop detection can be summarized into two categories: static methods and dynamic methods.In static techniques [2,3], important positions, such as gas stations, are defined in advance.When extracting stops from trajectories, if targets enter into a predefined region and the stay duration exceeds the duration threshold, this previously defined region is regarded as a stop location in the trajectory.The main drawback of static algorithms is that users need to specify their respective places of interest.As a result, some interesting and personalized stop locations will not be found if they are not provided by users beforehand.
As for dynamic approaches, no prior knowledge regarding stops is given and personalized stop locations can be discovered.Multiple sources from the literature have studied the dynamic solution by considering different aspects of mobility characteristics [5][6][7][8][9][10].Considering only the spatial characteristics, several classical clustering algorithms are introduced to extract stops from a trajectory.A predictive model, based on automatically detected stop positions, is proposed in Reference [11], and the authors adopted a variation of the traditional K-Means methods in order to detect stop locations.The selection of the value of parameter K and the initial clustering center is the main issue, and will directly affect the final results.The DBSCAN [12] algorithm is used in Reference [13] to extract significant locations.In Reference [14], a modified DBSCAN algorithm, DJ-Cluster (density and join-based clustering algorithm), is proposed to detect personal meaningful places.These density-based clustering algorithms can overcome many limitations of the K-Means approach [15]; however, they only take spatial dimensions into consideration and the temporal sequential features are ignored.
Compared with the algorithms described above, many studies have taken both the spatial and temporal characteristics into consideration.Different derivative methods of the DBSCAN method, with temporal sequential characteristic being considered, have been adopted by many researchers in order to extract stop positions [5,6,14,16,17].In Reference [5], an improved DBSCAN algorithm with gap treatment was proposed to detect stop episodes in a trajectory.The CB-SMoT (clustering-based stops and moves of trajectories) algorithm was proposed in Reference [6] to extract known and unknown stops.As it considers temporal speed and spatial features, CB-SMoT is a density-based clustering algorithm.In detail, clusters are generated by evaluating trajectory sample points at a slower speed than the velocity threshold.In addition, one of the major parameters in Reference [6], namely Eps (a given distance threshold around which the points are regarded as neighbors), is obtained using a quantile function.As is described in Reference [16], the quantile function in Reference [6] does not always work in estimating the appropriate value for the parameter Eps, making it difficult to determine an appropriate threshold for the parameter.The method proposed in Reference [16] improves the CB-SMoT algorithm by proposing an alternative for calculating the Eps parameter, but it is still difficult to calculate it as it depends on users to distinguish the low speed part and high speed part.Additionally, by assigning different thresholds to different characteristics, some clustering approaches have been proposed [18][19][20][21].Especially, information from satellites is introduced in the TDBC (a spatio-temporal clustering method used to extract stop points from individual trajectory) algorithm [21].Additionally, a time-based clustering algorithm was proposed in Reference [18] and both the clustering distance threshold and the time threshold are needed.
The methods mentioned above can obtain a desirable performance in some situations; however, these methods also have their drawbacks.Most of these methods need to assign appropriate threshold values for each parameter.While calculating the density of GPS points, most clustering-based algorithms take the number of GPS points within a given distance into account, without considering their consequential characteristics.In this paper, the density of GPS points will be calculated using the adjacent points over the trajectory, but not the overall spatial points.First, we define the new concept of the move ability feature.To the best of the authors' knowledge, the move ability feature was first proposed in stop detection.After that, by combining the theory of the data field, proposed in Reference [4], and our new concept of move ability, we construct a new, comprehensive, hybrid feature-based, density measurement method.In our method, the density threshold is automatically determined when calculating core points.Finally, we use our density measurement method to improve the original DBSCAN algorithm.

Basic Concepts
In this section, we show the definitions of GPS trajectory, stop, and move, based on the general definitions in Reference [1].These definitions will be used in the rest of this paper.These definitions are given according to the particular application studied in this paper; for example, altitude is not considered in this paper since there are small variations in altitude within urban regions.Definition 1. GPS Trajectory:A GPS Trajectory is a list of GPS data points {p 0 = (x 0 , y 0 , t 0 ), p 1 = (x 1 , y 1 , t 1 ), . . . ,p n = (x n , y n , t n )}, where ∀i ∈ [1, n], p i = (x i , y i , t i ) and t i < t i+1 , and x i , y i and t i represent the longitude, latitude, and timestamp, respectively.
Stops represent the significant places of a GPS trajectory where a target has spent a minimal amount of time, and, essentially, with a higher density of GPS points.A move represents the trajectory between stops and is equipped with a lower density of GPS points.In Reference [1], Spaccapietra defined some of their characteristics.

Definition 2.
Stop: A stop is a part of a trajectory and the features are as follows: (i) the user has explicitly defined this part of the trajectory to represent a stop; (ii) the temporal extent is a non-empty time interval; (iii) the traveling object does not move as far as the application view of this trajectory is concerned; and (iv) all stops in the same trajectory are temporally disjointed, i.e., the temporal extents of two stops are always disjointed.

Definition 3.
Move: A move is a part of a trajectory, such that: (i) the part is delimited by two extremities that represent either two consecutive stops, or t begin and the first stop, or the last stop and t end , or [t begin , t end ] (the case when a trajectory has no stops); (ii) the temporal extent [t begin , t end ] is a non-empty time interval; (iii) the spatial range of a trajectory for the [t begin , t end ] interval is a spatio-temporal line (not a point) defined by the trajectory, where t begin is the initial point of the trajectory and tend is the final one.Definition 4. Distance: The distance between two points < p n , p m > is denoted by: where R represents the radius of the Earth (R = 6371 km), lat n and lat m represent the latitudes of p n and p m , respectively; similarly, lgt n and lgt m represent the longitude.
Definition 5. Trajectory curve distance: the curve distance of a sub-trajectory segment, traj nm , which is composed of a sequence of points {p n , p n+1 , . . . ,p m }, and is denoted by Definition 6. Trajectory direct distance: the direct distance of sub-trajectory segment traj nm = {p n , p n+1 , . . . ,p m } equals the distance between the first point and the last point in the sub-trajectory and is denoted by: TrajDirectDist(traj nm ) = Dist(p n , p m ) In general, when a target stays at a stop region, the corresponding trajectory direct distance is far less than the trajectory curve distance.On the contrary, the corresponding trajectory direct distance would be close to the trajectory curve distance when the target moves between stop regions.Taking this into consideration, we propose our new concept of move ability.Definition 7. Move ability: the move ability of a sub-trajectory segment traj nm = {p n , p n+1 , . . . ,p m } is denoted by: MoveAbility(traj nm ) = TrajDirectDist(traj nm ) TrajCurveDist(traj nm ) (4) Figure 2 illustrates the concept of move ability.In the figure, there are three sub-trajectories, each of which contains six points.In detail, the coordinates of each point illustrate the spatial longitude and latitude in the real world.In addition, for simplicity, the Euclidean distance is used in this illustration to calculate the move ability features.These three sub-trajectories represent real trajectories corresponding to different situations: Figure 2a represents the activity at a stop; Figure 2b represents the movement on curved roads; Figure 2c represents a linear motion in reality.Comparing the move ability of each sub-trajectory in Figure 2, the results are consistent with our reasoning, described above.
Figure 2 illustrates the concept of move ability.In the figure, there are three sub-trajectories, each of which contains six points.In detail, the coordinates of each point illustrate the spatial longitude and latitude in the real world.In addition, for simplicity, the Euclidean distance is used in this illustration to calculate the move ability features.These three sub-trajectories represent real trajectories corresponding to different situations: Figure 2a represents the activity at a stop; Figure 2b represents the movement on curved roads; Figure 2c represents a linear motion in reality.
Comparing the move ability of each sub-trajectory in Figure 2, the results are consistent with our reasoning, described above.Furthermore, we find that our new concept of move ability is more suitable for distinguishing move and stop episodes.A qualitative comparison between the velocity feature and the move ability was done.Taking a real track as an example, the velocity curve after Gaussian smoothing is shown in Figure 3a.The velocity curve shows that the speed of moving objects can vary dramatically and there are many short, slow-speed segments during high-speed parts, which may be caused by short decelerations in motion.Comparatively, the move ability curve is more stable and discriminatory.The smoothed move ability curve, using the same Gaussian kernel, is shown in Figure 3b.Especially, a low value for move ability is only obtained when the target stays in movement around a certain region, which is likely to be a stop region.In addition, even a low-speed sub-trajectory may achieve a high move ability; for example, when a target moves in an approximately linear fashion with a low speed, this can help to remove some fake stops, such as short-duration traffic jams.Furthermore, we find that our new concept of move ability is more suitable for distinguishing move and stop episodes.A qualitative comparison between the velocity feature and the move ability was done.Taking a real track as an example, the velocity curve after Gaussian smoothing is shown in Figure 3a.The velocity curve shows that the speed of moving objects can vary dramatically and there are many short, slow-speed segments during high-speed parts, which may be caused by short decelerations in motion.Comparatively, the move ability curve is more stable and discriminatory.The smoothed move ability curve, using the same Gaussian kernel, is shown in Figure 3b.Especially, a low value for move ability is only obtained when the target stays in movement around a certain region, which is likely to be a stop region.In addition, even a low-speed sub-trajectory may achieve a high move ability; for example, when a target moves in an approximately linear fashion with a low speed, this can help to remove some fake stops, such as short-duration traffic jams.

Methodology
The basis of our approach is that the stop part of a trajectory should have a lower move ability and a higher density of GPS points; therefore, it is meaningful to find an appropriate method to estimate the density when move ability is considered.Especially, no special treatment is done when dealing with long-duration gaps in a trajectory.These gaps in trajectory may be caused by many things, such as a GPS logger running out of power.It is inappropriate to assume that a target stays in the same place.In our method, trajectory point density is the primary consideration and the short temporal gaps in trajectories have little effect on the results.
Following the reasoning described above, we propose a comprehensive, hybrid feature-based, density measurement method.Furthermore, we improve the original DBSCAN algorithm by using our own density measurement method.

Density Function
DBSCAN [12] is a classic density-based clustering method.The density of a current point is measured by the number of points within a certain distance from the current point.In our work, taking our new concept of move ability into consideration, we propose a method to measure density by introducing the data field proposed by Li et al. [4].According to the data field, each trajectory point will receive an interactive impact from other points.Without losing generality, we estimated the impact between points using a Gaussian function, which has acceptable mathematical properties, and the equation is shown as follows: ) 2  =1 (5) where   (i = 1, …, n) represents a trajectory point,   equals the distance between   and   , and  1 represents the standard deviation.
In our work, we regard the summation of the impact from a set of points before and after   as an alternative to density.The number of these adjacent points is presented as the input parameter Nap.In addition, we take our new concept of move ability into account by multiplying a move ability-related weight function.The final density of points is calculated as follows: ) 2 () (6) where   represents the move ability of the sub-trajectory, which is denoted by the adjacent points described above,   represents the standard deviation of the weight function, and adj(i)

Methodology
The basis of our approach is that the stop part of a trajectory should have a lower move ability and a higher density of GPS points; therefore, it is meaningful to find an appropriate method to estimate the density when move ability is considered.Especially, no special treatment is done when dealing with long-duration gaps in a trajectory.These gaps in trajectory may be caused by many things, such as a GPS logger running out of power.It is inappropriate to assume that a target stays in the same place.In our method, trajectory point density is the primary consideration and the short temporal gaps in trajectories have little effect on the results.
Following the reasoning described above, we propose a comprehensive, hybrid feature-based, density measurement method.Furthermore, we improve the original DBSCAN algorithm by using our own density measurement method.

Density Function
DBSCAN [12] is a classic density-based clustering method.The density of a current point is measured by the number of points within a certain distance from the current point.In our work, taking our new concept of move ability into consideration, we propose a method to measure density by introducing the data field proposed by Li et al. [4].According to the data field, each trajectory point will receive an interactive impact from other points.Without losing generality, we estimated the impact between points using a Gaussian function, which has acceptable mathematical properties, and the equation is shown as follows: where p i (i = 1, . . ., n) represents a trajectory point, d ij equals the distance between p i and p j , and σ 1 represents the standard deviation.
In our work, we regard the summation of the impact from a set of points before and after p k as an alternative to density.The number of these adjacent points is presented as the input parameter Nap.In addition, we take our new concept of move ability into account by multiplying a move ability-related weight function.The final density of points is calculated as follows: ) 2 (6) where MA i represents the move ability of the sub-trajectory, which is denoted by the adjacent points described above, σ MA represents the standard deviation of the weight function, and adj(i) represents the adjacent points, before and after p i , and the other parameters are the same as those in Equation (5).

Improved DBSCAN
In our method, most of the concepts of DBSCAN are the same as the original definition [12].Instead of using a minimal number of points to define the core point, we define the core point as follows: Definition 8. Core point: A trajectory point p i = (x i , y i , t i ) of a trajectory is called the core point with respect to MinDensity, if ϕ(p i ) > MinDensity, where MinDensity represents the threshold of density.
In our paper, in order to find the appropriate value for MinDensity, like the quantitative method in the work of Yuan [22], we consider the 'elbow point' as the threshold.The 'elbow point' refers to the point with the maximum curvature and usually indicates the cut-off point of two states.In our method, a curvature calculation method, KD (a technique to estimate the curvature on curves) curvature [23], is used to determine the 'elbow point'.Taking three consecutive points, {p i−1 , p i , p i+1 }, as an example, the corresponding KD curvature is calculated as follows: where Furthermore, the density sequence in practice fluctuates frequently and is not smooth enough, making it difficult to calculate the 'elbow point'.In order to get the exact 'elbow point', we smooth the density curve with a Gaussian kernel.After that, considering horizontal and vertical coordinates as being equally important, a normalization procedure is adopted to normalize the density sequence.In addition, we discretize the density curve by sampling with a given length interval, making it easier to calculate the 'elbow point'.Taking one track in our real dataset as an example, the non-smoothed original density sequence is shown in Figure 4a; the density sequence after smoothing is illustrated in Figure 4b.The figures show that there are indeed fluctuations and gaps in the sequence.In our algorithm, the normalized density sequence, after sampling with a length interval ∆l = 0.1, is illustrated in Figure 5a.The asterisk in Figure 5b refers to the final 'elbow point' in the normalized density sequence, which is noted with a black arrow.
ISPRS Int.J. Geo-Inf.2017, 6, 63 7 of 16 represents the adjacent points, before and after   , and the other parameters are the same as those in Equation (5).

Improved DBSCAN
In our method, most of the concepts of DBSCAN are the same as the original definition [12].Instead of using a minimal number of points to define the core point, we define the core point as follows: Definition 8. Core point: A trajectory point   = (  ,   ,   ) of a trajectory is called the core point with respect to MinDensity, if (  ) > , where MinDensity represents the threshold of density.
In our paper, in order to find the appropriate value for MinDensity, like the quantitative method in the work of Yuan [22], we consider the 'elbow point' as the threshold.The 'elbow point' refers to the point with the maximum curvature and usually indicates the cut-off point of two states.In our method, a curvature calculation method, KD (a technique to estimate the curvature on curves) curvature [23], is used to determine the 'elbow point'.Taking three consecutive points, { −1 ,   ,  +1 }, as an example, the corresponding KD curvature is calculated as follows: where   = ∠( −1   ,  +1   ) and  = (| −1   | + | +1   |)/2.Furthermore, the density sequence in practice fluctuates frequently and is not smooth enough, making it difficult to calculate the 'elbow point'.In order to get the exact 'elbow point', we smooth the density curve with a Gaussian kernel.After that, considering horizontal and vertical coordinates as being equally important, a normalization procedure is adopted to normalize the density sequence.In addition, we discretize the density curve by sampling with a given length interval, making it easier to calculate the 'elbow point'.Taking one track in our real dataset as an example, the non-smoothed original density sequence is shown in Figure 4a; the density sequence after smoothing is illustrated in Figure 4b.The figures show that there are indeed fluctuations and gaps in the sequence.In our algorithm, the normalized density sequence, after sampling with a length interval ∆ = 0.1, is illustrated in Figure 5a.The asterisk in Figure 5b refers to the final 'elbow point' in the normalized density sequence, which is noted with a black arrow.Finally, a merger step is introduced in our improved DBSCAN algorithm.In detail, two consecutive stops with the same geographic location and a short temporal interval are merged.This is consistent with real life, in that people always move around certain regions, resulting in the appearance of several small spatiotemporal similar stops in the same region.In our experiments, two consecutive stops are merged if the distance between them is less than 200 m and the temporal interval is less than one hour.
Compared with the original DBSCAN algorithm, our method contains two main modifications for clustering in single trajectories: (i) we propose a new measurement method for density by suggesting the new concept of move ability and introducing the theory of data field; additionally, we take several points before and after   into consideration to measure local density; (ii) by introducing the quantitative method from Yuan [22], our method can automatically select an appropriate value for the density threshold.

Experimental Results
In this section, we validate our improved DBSCAN algorithm through experiments on real trajectory datasets.Comparative experiments between our method and two classic, density-based clustering algorithms were conducted.In the following sections, we first discuss the datasets, and then relevant experimental results are shown.

Datasets Description
In this paper, we use the Geolife dataset and our own collected dataset to perform our experiments.The Geolife dataset was collected by Microsoft Research Asia during their Geolife project.It shows the trajectories of 182 users from April 2007 to August 2012.In total, this dataset contains 17,621 trajectories.Each trajectory in this dataset consists of a sequence of temporal, ordered, time-stamped points; each point contains geographical coordinate information, such as longitude, latitude, and altitude.Additionally, more than 90% of these trajectories were recorded in a dense representation, namely, every 5-10 m or 1-5 s per point.
In our experiment, a software tool was developed based on the Bing Maps APIs, which is a web-mapping service provided by Microsoft.Based on this software, hundreds of trajectories were visually inspected by a group of research assistants.During this work, regions with high densities and long durations were labeled as stops.A labeled real trajectory in our dataset is illustrated in Figure 6, where  1 and  2 represent two stops.Considering that there are many short trajectory segments in the Geolife dataset, the trajectories selected for our experiment should be long enough to ensure that there are, indeed, stops in the trajectories.Finally, in order to verify our algorithm, we Finally, a merger step is introduced in our improved DBSCAN algorithm.In detail, two consecutive stops with the same geographic location and a short temporal interval are merged.This is consistent with real life, in that people always move around certain regions, resulting in the appearance of several small spatiotemporal similar stops in the same region.In our experiments, two consecutive stops are merged if the distance between them is less than 200 m and the temporal interval is less than one hour.
Compared with the original DBSCAN algorithm, our method contains two main modifications for clustering in single trajectories: (i) we propose a new measurement method for density by suggesting the new concept of move ability and introducing the theory of data field; additionally, we take several points before and after p i into consideration to measure local density; (ii) by introducing the quantitative method from Yuan [22], our method can automatically select an appropriate value for the density threshold.

Experimental Results
In this section, we validate our improved DBSCAN algorithm through experiments on real trajectory datasets.Comparative experiments between our method and two classic, density-based clustering algorithms were conducted.In the following sections, we first discuss the datasets, and then relevant experimental results are shown.

Datasets Description
In this paper, we use the Geolife dataset and our own collected dataset to perform our experiments.The Geolife dataset was collected by Microsoft Research Asia during their Geolife project.It shows the trajectories of 182 users from April 2007 to August 2012.In total, this dataset contains 17,621 trajectories.Each trajectory in this dataset consists of a sequence of temporal, ordered, time-stamped points; each point contains geographical coordinate information, such as longitude, latitude, and altitude.Additionally, more than 90% of these trajectories were recorded in a dense representation, namely, every 5-10 m or 1-5 s per point.
In our experiment, a software tool was developed based on the Bing Maps APIs, which is a web-mapping service provided by Microsoft.Based on this software, hundreds of trajectories were visually inspected by a group of research assistants.During this work, regions with high densities and long durations were labeled as stops.A labeled real trajectory in our dataset is illustrated in Figure 6, where S 1 and S 2 represent two stops.Considering that there are many short trajectory segments in the Geolife dataset, the trajectories selected for our experiment should be long enough to ensure that there are, indeed, stops in the trajectories.Finally, in order to verify our algorithm, we used 100 labeled trajectories, which were selected from the Geolife dataset, and cover more than 50 users.In detail, our selected trajectories were recorded daily by volunteers and contain different modes of transportation, such as walking and biking.Additionally, all of the trajectories were urban trajectories.
ISPRS Int.J. Geo-Inf.2017, 6, 63 9 of 16 used 100 labeled trajectories, which were selected from the Geolife dataset, and cover more than 50 users.In detail, our selected trajectories were recorded daily by volunteers and contain different modes of transportation, such as walking and biking.Additionally, all of the trajectories were urban trajectories.The second dataset in our experiments was collected by our team, from daily life, using mobile phones.In total, this dataset contained the trajectories of our team for a month and was recorded every 2 s, per point.All stops in these trajectories were recorded in detail.In our experiments, 14 trajectories of this dataset were selected to validate the effectiveness of our new algorithm.

5.2.Parameter Estimation
In our improved DBSCAN algorithm, as shown in Equation ( 6), there are three parameters to be estimated in order to determine the density calculation model.These are Nap (the number of adjacent points),  1 and   .In order to find the appropriate value for these parameters, we used real trajectory data in our dataset to carry out on the simulation experiments for each parameter.The estimation results of these three parameters are shown in Figure 7.
In our improved DBSCAN algorithm, parameter   determines the weight assigned to different move abilities.As previously described, when the target stays in a region, the resulting sub-trajectories have a smaller move ability; when the target moves from one region to another, the resulting sub-trajectories have a larger move ability.Figure 7a shows the variations of move ability weight with parameter   .A high weight should be given to a point with a low move ability, and a low weight to a point with a high move ability.We found parameter   = 0.5 to be favorable as the value of a weight has a large distribution, without causing serious two-stage differentiation.
Parameter  1 determines the interactive impact between trajectory points.In our algorithm, the summation of impacts from adjacent points is used to measure the density around a certain trajectory point.Meanwhile, each point receives a stronger impact from closer points and a weaker impact from farther points.A lesser setting of  1 results in recieving weaker impact from farther points.When  1 is given too small a value, trajectory points will only receive an effective impact from closer points.
Parameter Nap demonstrates the number of adjacent points considered in our density calculations.When parameter Nap is given too small a value, the density is determined by a few adjacent points, resulting in a lower robustness to noise.In addition, Nap has an effect on the size of clusters.Since people always move at a small scale, even when they stay in a certain stop region, for example students move from place to place when they perform activities on a playground, too small a setting value for parameter Nap would reduce the size of the clusters and divide a larger stop region into several smaller stops.However, too large a setting value for Nap would make it difficult to detect smaller stops, as the local small changes of move ability would be smoothed.The second dataset in our experiments was collected by our team, from daily life, using mobile phones.In total, this dataset contained the trajectories of our team for a month and was recorded every 2 s, per point.All stops in these trajectories were recorded in detail.In our experiments, 14 trajectories of this dataset were selected to validate the effectiveness of our new algorithm.

5.2.Parameter Estimation
In our improved DBSCAN algorithm, as shown in Equation ( 6), there are three parameters to be estimated in order to determine the density calculation model.These are Nap (the number of adjacent points), σ 1 and σ MA .In order to find the appropriate value for these parameters, we used real trajectory data in our dataset to carry out on the simulation experiments for each parameter.The estimation results of these three parameters are shown in Figure 7.In order to find the appropriate value for the three parameters in our algorithm, we selected five trajectory segments from the Geolife dataset to carry out simulation experiments for each parameter.By observing the total number of detected stops for different combinations of parameters, the appropriate range of values for each parameter was determined.The merge step in our algorithm was removed in our simulation experiments, as small stops, which are close to each other, may be merged.Figure 7b demonstrates the number of discovered stops for different values for parameter   .For each curve in Figure 7b, parameter Nap and parameter  1 are set to a fixed value.From the graph in Figure 7b, we can see that the curve decreases dramatically when   is less than 0.5.On the other hand, when the value of parameter   is greater than 0.5, the curve tends to be stable.Therefore, the value of parameter   was set to no less than 0.5 in the experiments.Similarly, with the premise that   = 0.5, the number of detected stops with different values for Nap and  1 is shown in Figure 8a,b, respectively.Figure 8a shows that the number of detected stops decreases intensely until  > 50.When the value of Nap is set to be larger than 50, the curves tend to be stable.The variation in the number of detected stops with different values for  1 is shown in Figure 8b.The graph shows that the number of detected stops increases slowly with an increase in  1 .In our experiments, the value of Nap was set to be greater than 50.As for parameter  1 , considering that the value should not be set too small, we set the parameter as  1 > 0.3.In our improved DBSCAN algorithm, parameter σ MA determines the weight assigned to different move abilities.As previously described, when the target stays in a region, the resulting sub-trajectories have a smaller move ability; when the target moves from one region to another, the resulting sub-trajectories have a larger move ability.Figure 7a shows the variations of move ability weight with parameter σ MA .A high weight should be given to a point with a low move ability, and a low weight to a point with a high move ability.We found parameter σ MA = 0.5 to be favorable as the value of a weight has a large distribution, without causing serious two-stage differentiation.
Parameter σ 1 determines the interactive impact between trajectory points.In our algorithm, the summation of impacts from adjacent points is used to measure the density around a certain trajectory point.Meanwhile, each point receives a stronger impact from closer points and a weaker impact from farther points.A lesser setting of σ 1 results in recieving weaker impact from farther points.When σ 1 is given too small a value, trajectory points will only receive an effective impact from closer points.
Parameter Nap demonstrates the number of adjacent points considered in our density calculations.When parameter Nap is given too small a value, the density is determined by a few adjacent points, resulting in a lower robustness to noise.In addition, Nap has an effect on the size of clusters.Since people always move at a small scale, even when they stay in a certain stop region, for example students move from place to place when they perform activities on a playground, too small a setting value for parameter Nap would reduce the size of the clusters and divide a larger stop region into several smaller stops.However, too large a setting value for Nap would make it difficult to detect smaller stops, as the local small changes of move ability would be smoothed.
In order to find the appropriate value for the three parameters in our algorithm, we selected five trajectory segments from the Geolife dataset to carry out simulation experiments for each parameter.By observing the total number of detected stops for different combinations of parameters, the appropriate range of values for each parameter was determined.The merge step in our algorithm was removed in our simulation experiments, as small stops, which are close to each other, may be merged.Figure 7b demonstrates the number of discovered stops for different values for parameter σ MA .For each curve in Figure 7b, parameter Nap and parameter σ 1 are set to a fixed value.From the graph in Figure 7b, we can see that the curve decreases dramatically when σ MA is less than 0.5.On the other hand, when the value of parameter σ MA is greater than 0.5, the curve tends to be stable.Therefore, the value of parameter σ MA was set to no less than 0.5 in the experiments.
Similarly, with the premise that σ MA = 0.5, the number of detected stops with different values for Nap and σ 1 is shown in Figure 8a,b, respectively.Figure 8a shows that the number of detected stops decreases intensely until Nap > 50.When the value of Nap is set to be larger than 50, the curves tend to be stable.The variation in the number of detected stops with different values for σ 1 is shown in Figure 8b.The graph shows that the number of detected stops increases slowly with an increase in σ 1 .
In our experiments, the value of Nap was set to be greater than 50.As for parameter σ 1 , considering that the value should not be set too small, we set the parameter as σ 1 > 0.3.
In this paper, using one trajectory segment selected from the Geo-life dataset, we further estimated the values for parameters Nap and σ 1 by observing the sse (sum of squared error).As described in Reference [24], sse is an evaluation of the partitioning of detected locations.The smaller the sse value is, the better the clusters are; especially when the number of stops becomes very large, the sse tends to be very small.However, this does not mean a good result for the clustering.
In order to estimate the values for parameters Nap and σ 1 , the parameter σ MA is set to a fixed value (σ MA = 0.5). Figure 9a shows the variations of sse with the growth of Nap.The sse is small when Nap < 50 and it becomes large when Nap > 50.Therefore, the value of parameter Nap was set as Nap > 50 in our experiments.Figure 9b demonstrates the variations of sse with the growth of σ 1 .We can find that the larger σ 1 is, the larger sse becomes.This is consistent with our reasoning that σ 1 should not be set to be too small.

5.3.Evaluation of Effectiveness
In order to verify the feasibility of our algorithm, we compared our method with four other stop-detection algorithms, the CB-SMoT algorithm [6], DBSCAN algorithm, DJ-Cluster algorithm [14],and time-based clustering [18], using the public dataset and our own collected dataset.In this paper, we validate our algorithm using the same experimental method described in Reference [25], which also used the same Geolife dataset.In our experiments, the computation of precision and recall are as follows: Precision = number of correct stops found number of stops found Recall = number of correct stops found number ofcorrect stops In addition, as described in Reference [25], the weighted harmonic mean of precision and recall, F measure , is computed as follows:

5.3.Evaluation of Effectiveness
In order to verify the feasibility of our algorithm, we compared our method with four other stop-detection algorithms, the CB-SMoT algorithm [6], DBSCAN algorithm, DJ-Cluster algorithm [14],and time-based clustering [18], using the public dataset and our own collected dataset.In this paper, we validate our algorithm using the same experimental method described in Reference [25], which also used the same Geolife dataset.In our experiments, the computation of precision and recall are as follows: Precision = number of correct stops found number of stops found Recall = number of correct stops found number ofcorrect stops In addition, as described in Reference [25], the weighted harmonic mean of precision and recall, F measure , is computed as follows:

5.3.Evaluation of Effectiveness
In order to verify the feasibility of our algorithm, we compared our method with four other stop-detection algorithms, the CB-SMoT algorithm [6], DBSCAN algorithm, DJ-Cluster algorithm [14],and time-based clustering [18], using the public dataset and our own collected dataset.In this paper, we validate our algorithm using the same experimental method described in Reference [25], which also used the same Geolife dataset.In our experiments, the computation of precision and recall are as follows: Precision = number of correct stops found number of stops found (8) Recall = number of correct stops found number ofcorrect stops (9) In addition, as described in Reference [25], the weighted harmonic mean of precision and recall, F measure , is computed as follows: Both our improved DBSCAN method and the CB-SMoT algorithm are derivative methods of the DBSCAN algorithm.The main difference among these derivative methods is how a core point is defined, leading to a difference in complexity.A distance calculation between all pairs of trajectory points is needed in the traditional DBSCAN method to select the core points; thus, the complexity of the DBSCAN method is ( 2 ).However, in our method, and in the CB-SMoT algorithm, the distance to only a part of the trajectory points is necessary to determine whether a point is a core point, leading to the complexity being ().In more detail, the complexity of these algorithms is proportional to the number of times the distance between pairs of points is calculated.In our method, parameter Nap represents the number of adjacent points, and is related to the number of times the distance between pairs of points is calculated.Figure 10b shows the effect of parameter Nap on the computational efficiency.The results demonstrate that a larger value for Nap leads to a longer running time, which is consistent with our previous analyses.

Conclusions
In this paper, we first proposed the new concept of move ability.By introducing the theory of the data field in order to calculate the density of trajectory points, we proposed a new, comprehensive, hybrid feature-based, density measurement method to improve the original DBSCAN method.In our improved DBSCAN method, the move ability was taken into consideration by giving a lower move ability a larger weight.In addition, the density threshold can be automatically determined.In the experiments, we compared our method with four other clustering algorithms, CB-SMoT, DBSCAN, DJ-Cluster, and time-based clustering.The experimental results show that our method works better than the traditional methods, demonstrating the feasibility of our method.Both our improved DBSCAN method and the CB-SMoT algorithm are derivative methods of the DBSCAN algorithm.The main difference among these derivative methods is how a core point is defined, leading to a difference in complexity.A distance calculation between all pairs of trajectory points is needed in the traditional DBSCAN method to select the core points; thus, the complexity of the DBSCAN method is O n 2 .However, in our method, and in the CB-SMoT algorithm, the distance to only a part of the trajectory points is necessary to determine whether a point is a core point, leading to the complexity being O(n).In more detail, the complexity of these algorithms is proportional to the number of times the distance between pairs of points is calculated.In our method, parameter Nap represents the number of adjacent points, and is related to the number of times the distance between pairs of points is calculated.Figure 10b shows the effect of parameter Nap on the computational efficiency.The results demonstrate that a larger value for Nap leads to a longer running time, which is consistent with our previous analyses.

Conclusions
In this paper, we first proposed the new concept of move ability.By introducing the theory of the data field in order to calculate the density of trajectory points, we proposed a new, comprehensive, hybrid feature-based, density measurement method to improve the original DBSCAN method.In our improved DBSCAN method, the move ability was taken into consideration by giving a lower move ability a larger weight.In addition, the density threshold can be automatically determined.In the experiments, we compared our method with four other clustering algorithms, CB-SMoT, DBSCAN, DJ-Cluster, and time-based clustering.The experimental results show that our method works better than the traditional methods, demonstrating the feasibility of our method.

Figure 1 .
Figure 1.An example of a trajectory.

Figure 1 .
Figure 1.An example of a trajectory.

Figure 2 .
Figure 2. Examples of move ability.(a) The activity at a stop; (b) movement on curved roads; (c) linear motion.

Figure 2 .
Figure 2. Examples of move ability.(a) The activity at a stop; (b) movement on curved roads; (c) linear motion.

Figure 3 .
Figure 3.The velocity curve and move ability curve of a real-world track.(a) Smoothed velocity curve; (b) smoothed move ability curve.

Figure 3 .
Figure 3.The velocity curve and move ability curve of a real-world track.(a) Smoothed velocity curve; (b) smoothed move ability curve.

Figure 5 .
Figure 5. Sampled density sequence and 'elbow point'.(a) Density sequence after sample and (b) the final 'elbow point'.

Figure 5 .
Figure 5. Sampled density sequence and 'elbow point'.(a) Density sequence after sample and (b) the final 'elbow point'.

Figure 6 .
Figure 6.An example of a labeled trajectory.

Figure 6 .
Figure 6.An example of a labeled trajectory.

Figure 7 .
Figure 7.The estimation simulations of parameter   .(a)The variation of move ability weight with parameter  ; (b) the number of stops found for different   .

Figure 7 .
Figure 7.The estimation simulations of parameter σ MA .(a)The variation of move ability weight with parameter σ MA ; (b) the number of stops found for different σ MA .

Figure 8 .Figure 9 .
Figure 8.The number of detected stops with different values for parameters Nap and  1 .(a) The number of stops found for different Nap; (b) the number of stops found for different  1 .

Figure 8 .Figure 8 .Figure 9 .
Figure 8.The number of detected stops with different values for parameters Nap and σ 1 .(a) The number of stops found for different Nap; (b) the number of stops found for different σ 1 .

Figure 9 .
Figure 9.The sse (sum of squared error) with different values for parameters Nap and σ 1 .(a) Thesse for different Nap; (b) the sse for different σ 1 .

F measure = 2 ×Figure 10 .
Figure 10.The efficiency evaluation.(a) Efficiency comparison; (b) effect of parameter Nap on the efficiency of our method.

Figure 10 .
Figure 10.The efficiency evaluation.(a) Efficiency comparison; (b) effect of parameter Nap on the efficiency of our method.