Location Extraction and Prediction Method Based on Floating Car Spatial-Temporal Trajectory

: Predicting the next important location by mining the user’s historical spatial-temporal trajectory can be done for behavioral analysis and path planning. Since extracting the important location precisely is the premise of next location prediction, an enhanced location extraction algorithm is proposed to meet the requirements of dynamic trajectory via dynamic parameter estimation. To realize the estimation of parameters dynamically, the di ﬀ erences of ﬂoating car velocity in terms of spatial distribution and behavior in time distribution are considered in the location extraction algorithm. Then, an improved recurrent neural network (RNN) model is designed to mine the variation law of ﬂoating car trajectories to improve the accuracy of important location prediction under di ﬀ erent conditions. Di ﬀ erent from the traditional prediction model considering only the constraint of distance, the attention mechanism and semantic information are considered simultaneously by the proposed prediction model. Finally, the ﬂoating car trajectory of Beijing is selected for our experiments, and the results show that the proposed location extraction algorithm can meet the requirements of a dynamic environment and our model achieves high prediction accuracy.


Introduction
With the popularity of smartphones and other equipment that can be used to record locations via the Global Navigation Satellite System (GNSS) [1], capturing and recording trajectory data based on such equipment becomes more convenient. Thus, much research on location-based service (LBS) has been proposed to improve service capabilities by mining the laws from recorded trajectory data, such as for targeted advertising, road assistance and navigation, personnel tracking, point-of-interest (POI) recommendations [2], and so on. Obviously, a complete LBS system can not only obtain a user's current location, but also predict the next possible location based on their historical trajectory information, so as to plan the navigation path in advance or analyze the user's behavior. Since recording and obtaining trajectory data have become easy and convenient, and predicting the next location based on historic behavior to find future destinations is a key factor for complete LBS [2], so it is important to accurately predict the next location.
In fact, the research indicates that people's mobility is highly dependent on their historical behaviors [3]. Location prediction can be divided into two types: personalized and popularized [4]. Personalized location prediction mainly looks at the historical trajectory of a single user, while popularized location prediction mines the travel habits of many users. Since personalized location prediction can be used to provide guidance for navigation and personality recommendations, it plays an important role in the research of location prediction, so this paper also pays attention to that research. It is clear that the trajectory of different users is basically different even if their visited locations might ISPRS Int. J. Geo-Inf. 2020, 9, 302; doi:10.3390/ijgi9050302 www.mdpi.com/journal/ijgi be similar. Since the personalized location prediction is based on user's own behaviors, building a separate prediction model for each user might be preferable [5]. In addition, the different characteristics of user behavior have different influences on prediction. For example, predicting dynamic behavior is more difficult than predicting fixed behavior, and predicting many possible destinations is also more difficult than predicting only several possible destinations. Much different from pedestrian, bus, subway, train, and airplane travel, where the behavior features are mostly fixed or the destinations are mostly limited, not only are the behaviors of floating cars dynamic, but also their destinations are hugely variable. Thus, research based on trajectory data recorded in floating cars is more meaningful and more challenging. Moreover, a location prediction algorithm mainly predicts the user's next important location, such as a mall, school, important road intersection, residential area, or attraction. In order to make meaningful predictions, discarding as much of the unimportant data as possible and keeping the most important data is a priority due to the massive and redundant location data recorded by floating cars. Thus, it is necessary to propose an important location extraction algorithm to filter and cluster the original trajectory data in order to reduce the number of trajectory points and obtain important locations as the input of the prediction model.
Based on the above analysis, an important location extraction algorithm and a precise location prediction model are the two aspects of location prediction research for LBS, the former for reducing the size of the dataset and data redundancy, and the latter for precise prediction of behaviors.
In order to obtain important locations for the prediction model, many traditional location extraction methods can be used, such as manual tagging algorithms [2,6] and clustering methods [7][8][9]. Among them, attention-based spatiotemporal gated recurrent unit (ATST-GRU) introduced a method to realize POI recommendations by users sharing locations and check-in information with others in location-based social networks (LBSNs) [2]. Tree-based hierarchical graph (TBHG) also introduced a method that marks important locations by matching the trajectory with the map [6]. It is clear that both of these methods require geographical information, which will lead to heavy workloads. Furthermore, it is difficult to obtain the check-in information of users because of personal privacy issues.
Hybrid methods and clustering algorithms are two typical models that can be used to extract locations and avoid personal privacy issues. A typical hybrid method simply annotates POIs with meaningful information from LBSNs [10]. Since an important location represents a geographic area where a user stayed for a certain interval of time, extracting the important location based on two scale parameters, time, and distance, instead of collecting check-in information can also avoid personal privacy issues. Therefore, many clustering algorithms have been proposed to extract locations by selecting appropriate distance and time thresholds [11], such as density-based spatial clustering of applications with noise (DBSCAN) [7][8][9], K-means [12], and ordering points to identify the clustering structure (OPTICS) [13] algorithms.
Although the above methods can be used to obtain satisfactory results in location extraction, artificially configuring the threshold and parameters, which have a great influence on the clustering results, is their obvious disadvantage. Due to the dynamic behaviors of different floating cars, obtaining common parameters for different users is impossible. Thus, an adaptive algorithm that can meet the dynamic requirements of different users should to be considered. In this paper, a dynamic DBSCAN (D-DBSCAN) algorithm based on the traditional DBSCAN algorithm is proposed to dynamically adjust the value of Eps (the neighborhood radius) in order to extract important locations via different parameters based on different situations. The D-DBSCAN algorithm considers the dynamic information of trajectories and can effectively filter strip clusters. This has been proved by location extraction experiments.
As the traditional domain for trajectory analysis, location prediction based on floating car trajectory data was a kind of trip-matching process in early approaches [14]. Sub-trajectory synthesis (SubSyn) [14] divided the trajectories into sub-trajectories and then fused the sub-trajectories of different users to match current trips. Different from the above trip-matching algorithm, various Markov chain models offered a kind of destination-matching method where possible next locations were collected in advance and the prediction result was the probability of different important locations. Among them, Ashbrook et al. proposed a prediction method using the Markov model, where each node was an important location clustered from Global Positioning System (GPS) data, and a transition between two nodes represented the probability of the user traveling between those two important locations [7]. On this basis, Simmons et al. [15] proposed a hidden Markov model (HMM) to predict destinations. Furthermore, Ashbrook et al. [16] built models by adding the concept of support points to the hidden Markov model to improve prediction performance. Although the experimental results showed that the strategy of support points can be used to improve the performance of the algorithm, the model still cannot achieve high accuracy because a different choice of strategy will invariably lead to different performance, and no appropriate method has been proposed to match the various requirements in different situations. It is possible that a passenger may go to a totally different place where the floating car has never been. Thus, not only trip-matching algorithms but also destination-matching methods cannot meet the requirements of new next-location prediction. Furthermore, the above approaches will be ineffective when there is an absence of prior knowledge.
With the development of machine learning, deep learning technology has been widely used for next-location prediction. First, a kind of Bayesian network model was designed, and descriptive information of the trajectory was added to the model as a feature to perform destination prediction [14]. In addition, a location-prediction algorithm mainly makes predictions based on contextual information of the trajectory and the current location. Since the historical trajectory information is typical time series data, the location-prediction algorithm is also a typical time series prediction process. Recently, recurrent neural networks (RNNs) [17] have been adopted in machine translation [18], target recognition [19], video behavior recognition [20], sentiment classification, and image caption generation [21] and show promising performance in processing time series prediction compared with traditional methods. Therefore, RNNs can be used to predict next important locations [22,23]. Liu and co-workers [22] and Al-Molegi and colleagues [23] focused on using a set of features to obtain a good prediction performance. In these models all historical trajectory points have the same importance whether they are located in an intersection or in the middle of a straight road. In fact, trajectory points located in intersections, which greatly affect the turning direction, may play a more important role in next-location prediction. Some attention mechanisms need to be further considered for prediction algorithms based on the traditional RNN model in order to track the dramatic changes of floating car trajectories.
To remedy the two problems, this paper proposes a D-DBSCAN location extraction algorithm to cluster important locations and an attention-RNN location prediction algorithm to predict the next location. The main contributions are summarized as follows: (1). A novel dynamic important location extraction algorithm based on DBSCAN is proposed to extract important locations via different parameters based on different situations to meet the requirements of dynamic situations. The algorithm can dynamically adjust the parameters by tracking different user behaviors and can effectively filter out strip shape clusters in order to avoid using such invalid data as input for the prediction algorithm. (2). We propose attention-RNN location prediction, which can assign a different level of attention to historical track points to grasp the spatial characteristics of trajectories and closely track the user trajectory. (3). A time-step window mechanism is added to the attention-RNN model to reduce the time consumption and computational complexity.
The rest of this paper is organized as follows. We briefly introduce some related work on important location extraction and prediction models in Section 2. The dynamic important location extraction algorithm (D-DBSCAN) and attention-RNN location prediction model are proposed in Section 3. Section 4 reports the experimental design, performance metrics, and extensive experimental results and discusses the merits and drawbacks of our results and the baseline methods (RNN [17], space time features based RNN (STF-RNN) [23], and spatial-temporal RNN (ST-RNN) [22]). Finally, the conclusions are drawn in the last section.

Related Works
The DBSCAN algorithm does not need to set a fixed number of clusters in advance and has advantages over other algorithms, and the clustering results are not constrained by the cluster shape. However, the traditional algorithm still has some disadvantages, including that the parameters of the neighborhood radius Eps and the minimum number of points contained in the neighborhood (Minpts) must be set artificially, and the selection of parameters has a crucial effect on clustering results [8].
Unfortunately, it is difficult to select an appropriate value of Eps for the traditional DBSCAN algorithm, especially in massive nonuniform distribution conditions. For example, some important locations may be merged, or a single important clustering location may be split into many different clusters. In order to solve the parameter setting problem of the traditional DBSCAN model, Zhou et al. proposed a DBSCAN algorithm based on data partitioning [24]. The algorithm partitions the data according to density and establishes an R* tree in each region to obtain a K-dist map. According to the K-dist map, the appropriate values are selected in different partitions, and finally the clustering results of each region are combined to obtain the final results.
The partition-based DBSCAN algorithm can be used to obtain clustering locations that are not constrained by shapes, and small areas can also be preserved so as to avoid discarding some important location information. However, due to the massive dataset, the partition-based DBSCAN algorithm can obtain many strip clusters that are formed by points on the road. Most of these strip clusters are meaningless locations and should not be input into the prediction model. On the other hand, some clusters that are formed by congestion points and intersection points play an important role in the location prediction application and these clusters need to be retained.
Based on the above analysis, a dynamic DBSCAN (D-DBSCAN) algorithm is necessary to dynamically adjust the value of Eps and extract important locations via different parameters in different situations, such as different velocity and time information, rather than artificially setting the parameters. The details of the D-DBSCAN algorithm are presented in Section 3.
In order to deal with the problem of trajectory location prediction, Brébisson et al. [5] inputted trajectory points one by one into an RNN network and used the memory function of the hidden layer to achieve the prediction purpose. Liu et al. extended the RNN and proposed a novel method called spatial temporal RNN (ST-RNN) [22]. ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances [22]. Al-Molegi et al. [23] proposed a method to leverage RNN to model people's movement behaviors in order to predict their next location. Space and time are included in the network as features, where their internal representations are learned by the network itself rather than relying on a manmade representation.
In the above algorithms, all historical trajectory points have the same importance whether they are located in an intersection or in the middle of a straight road. In fact, trajectory points located in intersections, which greatly affect the turning direction, may play a more important role in next-location prediction. In order to fully consider the different influence weights of historical trajectory data on predicted locations, an attention mechanism module is added in the traditional RNN network in this paper. At the same time, in order to solve the problem that the performance of the model will deteriorate rapidly as the length of the input sequence increases, a time-step window should be considered to reduce the time consumption and computational complexity and improve training efficiency. In addition, an attention mechanism and the trajectory semantic information must be fully considered to grasp the spatial characteristics of trajectories.

Dynamic Location Extraction Method
The D-DBSCAN algorithm integrates the track point velocity information dynamically, which can adjust the key parameters of the algorithm dynamically. The main principles of the D-DBSCAN algorithm are as follows: (1) it calculates the instantaneous velocity of each trajectory point according to time sequences, (2) divides the points into different regions based on their velocity so as to reduce the velocity difference in the same region to better set Eps, and (3) it takes the velocity of the track point as an assist, dynamically adjusting the value of Eps.
Generally, denote P = p 1 (x 1 , y 1 ), p 2 (x 2 , y 2 ), . . . , p n (x n , y n ) and T = {t 1 , t 2 , . . . , t n } as the trajectory data and the corresponding timestamp data of a certain user, then the velocity of each trajectory point V = {v 1 , v 2 , . . . , v n } can be computed based on P and T. By sorting V in descending order to obtain the sorted velocity V = v 1 , v 2 , . . . v n and then subtracting the sorted velocity one by one to get the velocity differences ∆V = {∆v 1 , ∆v 2 , . . . , ∆v n−1 }, the maximum velocity v max = v 1 and the minimum velocity v min = v n can be easily found from V . After that the maximum N − 1 velocity differences ∆V = ∆v 1 , ∆v 2 , . . . , ∆v N−1 is calculated ∆V, where all elements of ∆V are more than a minimum velocity difference threshold ω. It is clear that the different values of ω may lead to obtain different numbers of the maximum velocity differences.
In order to divide the points into different regions, velocity partition thresholds can be computed firstly based on the velocity differences ∆V . ∆v i is obtained by subtracting v m and v m+1 , and without loss of generality, denotes the smaller (v m+1 ) of two corresponding velocities (v m and v m+1 ) of each maximum velocity differences ∆v i as a velocity threshold, then the corresponding N − 1 velocity thresholds of the maximum N − 1 velocity differences can be noted as {vt 1 , vt 2 , . . . , vt N−1 }, and thus, by sorting VT in ascending order to obtain the sorted VT = {vt 0 , vt 1 , vt 2 , . . . , vt N−1 , vt N }. VT can be defined as velocity partition thresholds, where vt 0 = v min and vt N = v max . Then, all trajectory points can be easily divided into N regions R = {R 1 , R 2 , . . . , R N } based on VT. Thus, the mean velocity of each region V = v 1 , v 2 , . . . , v N can be computed.
In order to clearly demonstrate the above process to obtain velocity partition thresholds, a simple example is shown in Table 1. It is also clear that the different number of maximum velocity differences may lead to obtaining a different number of regions.  Obviously, the velocities of the different trajectory points V which belong to the same region are similar and thus the same parameter of Eps can be used. Therefore, the parameter Eps of each region can be dynamically calculated by Equation (1), and then clustering in each region. Finally, the clustering results of each region are combined to obtain the final results. The complete procedure is described as Algorithm 1: where ε is the influence factor which can be simply estimated via calculating the K-dist map [24] (the appropriate value of ε is selected as 0.06 in our experiment). Moreover, although the velocity based partition method can reduce the velocity difference of the trajectory points in the same region so as to more appropriately set Eps parameter for each region, partitioning too many regions based on a too small velocity difference threshold ω may lead to splitting a single important clustering location into several fragmented location parts and which will reduce the accuracy of important location clustering as well as increase computational complexity. By the results of an ablation experiment, ω = 3(m/s) is defined as the minimum velocity difference threshold in our experiment so as to obtain the optimal region number N via the velocity differences ∆V.

Algorithm 1: D-DBSCAN algorithm for location extraction
Input: Given the trajectory data P, and the timestamp data T.
(1) Calculate V based on P and T, and sort V in descending order to get V .
(2) Obtain v max and v min from V and subtract V one by one to get ∆V.
(3) Get the velocity partition thresholds VT based on ∆V, ω, v max , and v min .
(4) Partition all trajectory points into N regions R based on VT: As shown in Algorithm 1, a larger instantaneous velocity of the trajectory point indicates a smaller Eps based on Equation (1). Thus, the D-DBSCAN algorithm can realize dynamic clustering under different velocity conditions by adjusting Eps dynamically, and then key locations, such as intersection points, can be retained and strip clusters can be filtered out, and finally a better location extraction result can be obtained to provide a better guarantee for location prediction. Moreover, the algorithm takes full account of the dynamic characteristics of trajectories, thus better location extraction results can be obtained.

Traditional RNN Prediction Network
The architecture of an RNN is a recurrent structure, and traditional RNN includes the Elman network and the Jordon network. The Elman network feeds the hidden layer (h t ) back into the recurrent structure, while the Jordon network feeds the output of the network (o t ) back into the recurrent structure. Many network variants are derived from the Elman network, so in general, when it comes to RNN, it refers to the Elman network. The Elman network is adopted in this paper, and the hidden layer (h t ) is fed back into the recurrent structure. At each time t, we can predict the hidden layer (h t ) by the previous moment hidden layer (h t−1 ), and then feed the new hidden layer (h t ) back into the next hidden status. The formulation of the hidden layer in RNN is: where the activation function g(•) is a tanh function, W (x) is the input layer weight, and W (h) is the hidden layer weight. The final result o t of the network can be obtained by using the appropriate activation function σ(•) and the output layer weight W (o) on the hidden layer state h t generated from the hidden layer.

Attention-RNN Location Prediction
Due to the unequal time interval from the predicted location and different spatial adjacencies, different historical trajectory data have unequal influence weights on the predicted location. For example, the next location is greatly affected by the trajectory point at that turning point when the direction of the track changes. The attention mechanism module is added to the traditional RNN network to fully reflect the different influence weights of the historical trajectory data on the predicted location. The architecture of the proposed location prediction network in this paper is represented in Figure 1.
where the activation function (•) is a ℎ function, ( ) is the input layer weight, and ( ) is the hidden layer weight. The final result of the network can be obtained by using the appropriate activation function (•) and the output layer weight ( ) on the hidden layer state ℎ generated from the hidden layer.

Attention-RNN Location Prediction
Due to the unequal time interval from the predicted location and different spatial adjacencies, different historical trajectory data have unequal influence weights on the predicted location. For example, the next location is greatly affected by the trajectory point at that turning point when the direction of the track changes. The attention mechanism module is added to the traditional RNN network to fully reflect the different influence weights of the historical trajectory data on the predicted location. The architecture of the proposed location prediction network in this paper is represented in Figure 1. As shown in Figure 1, a time-step window model is designed to divide the original trajectory sequence into several sub-tracks to reduce the amount of data and improve the training efficiency of the network, and the trajectory semantic information (SI) is extracted by the embedding layer. The attention coefficient of the proposed self-attention mechanism (SAM) can be calculated via azimuth information, which can also be obtained from the original trajectory sequence.
After that, a feature vector of the network input layer can be obtained by concatenating the subtracks with the trajectory description information, which goes through the embedding layer, and the attention coefficients are added into the RNN network; also, the attention mechanism is adopted to make the network better grasp the spatial characteristics of trajectories. Finally, the prediction As shown in Figure 1, a time-step window model is designed to divide the original trajectory sequence into several sub-tracks to reduce the amount of data and improve the training efficiency of the network, and the trajectory semantic information (SI) is extracted by the embedding layer. The attention coefficient of the proposed self-attention mechanism (SAM) can be calculated via azimuth information, which can also be obtained from the original trajectory sequence.
After that, a feature vector of the network input layer can be obtained by concatenating the sub-tracks with the trajectory description information, which goes through the embedding layer, and the attention coefficients are added into the RNN network; also, the attention mechanism is adopted to make the network better grasp the spatial characteristics of trajectories. Finally, the prediction location (latitude and longitude) is obtained by the activation function. The different parts of the attention-RNN location prediction method are described in detail as follows: (a) Time-step window: In location prediction experiments, padding based on the length of the longest track is usually used to solve the problem of unequal track sequence lengths, then track points are inputted into the RNN one by one. However, because some of the tracks are collected by the user for 24 hours or longer, padding all tracks will result in a sharp increase in data volume, and it turns out that the influence of historical trajectory points on predicted location gradually weakens over time. If the training is not controlled, it will not only be a waste of meaningless resources, but also will likely be counterproductive to the prediction results. Inspired by the concept of mask convolution in pixel convolutional neural network (CNN) [25], a time-step window is added to divide the original trajectory into several sub-tracks so that all tracks are guaranteed to have the same number of timestamps as the window size and the scope of the historical information is controlled in a certain range. The experimental results show that adopting an appropriate time-step window size can improve not only the efficiency of network training but also the prediction accuracy.
(b) Trajectory semantic information: Using only the latitude and longitude of the GPS point for location prediction, the network can learn poorly because the feature dimension is too small, and the accuracy of the predicted result is not sufficient. In the location prediction experiment, not only the latitude and longitude coordinates of the trajectory data but also some description information should be utilized. Considering that there are certain differences in people's travel destinations between weekends and workdays, an embedding layer is added to the network to mine deeper semantic information (SI) from the data, and to determine whether the trajectory occurs on a weekend or weekday. After the embedding layer, the time information is concatenated with the trajectory sequence, and then input into the RNN.
(c) Self-attention mechanism: In order to fully reflect the different influence weights of the historical trajectory data on the predicted location, this paper introduces a self-attention mechanism (SAM) to learn a set of weight information based on the azimuth changes produced by historical trajectories. In the process of training, each historical track point is assigned a different level of attention.
In the process of obtaining the attention coefficient, the concept of coordinate azimuth is introduced [26]. Usually the coordinate azimuths of the trajectory data P = (p 1 , p 2 , . . . , p n ) are defined as A = (a 1 , a 2 , . . . a n ). The azimuth of trajectory point p i is a i , and azimuth a i is the angle formed by clockwise rotation from the north to the line connected by track point p i and its previous point p i−1 based on the definition of azimuth. Thus, the azimuths of trajectory P are as follows: where a 1 = 0, and a i is the azimuth of p i . Then the angle of the entire track changes can be denoted as: When β i is taken as π, the track direction changes the most, and the degree of change is symmetrically distributed from π to 0 and from π to 2π. Therefore, the correspondence between attention coefficients s and β is: where δ is a scaling factor, and the network's powerful self-learning ability is used to adjust the attention coefficient and obtain S = (s 1 , s 2 , . . . s n ). Using the attention factor obtained and the attention layer weight W (s) , Equation (3) is modified to update the hidden layer:

Dataset
The GPS trajectory dataset used in this paper was collected by the Microsoft Asia Research Institute Geolife project [27,28] from April 2007 to August 2012, including trajectory data of 182 users. There are 73 labels of the trajectories according to transportation mode such as driving, taking a bus, riding a bike, or walking. The total distance and duration of different transportation modes are listed in Table 2. Data collection duration of different users varies, and the specific distribution is shown in Figure 2. The GPS trajectory dataset used in this paper was collected by the Microsoft Asia Research Institute Geolife project [27,28] from April 2007 to August 2012, including trajectory data of 182 users. There are 73 labels of the trajectories according to transportation mode such as driving, taking a bus, riding a bike, or walking. The total distance and duration of different transportation modes are listed in Table 2.  Figure  2. As shown in Figure 2, according to the collection duration, the experimental data can be classified into three categories: users with a long collection duration (more than 1 month), users with a medium collection duration (1 week to 1 month), and users with a short collection duration (less than 1 week). In order to verify the adaptability of the model under different trajectory lengths, 30% of users in each category were randomly selected for the experimental data. Eventually, experimental results of each category are listed.

Data Preprocessing
The quality of GPS data is affected by sensors, transmission paths, etc., and data leakage, data loss, and other noise pollution are prone to occur, which will adversely affect subsequent data analysis. Therefore, noise point filtering is required before trajectory prediction. Although the user trajectories in the Geolife dataset are widely distributed, most of them were collected in Beijing, China. Before the conventional noise processing, the latitude and longitude range (115.70°N-117.37°N, 39.40°E-41.03°E) was used to select the data inside the study area. Only floating car trajectory data according to transportation mode were selected according to the analysis in Section 1.
Several typical preprocessing methods for GPS data noise were selected: outlier filtering, smoothing, and data interpolation. The data sampling interval used in the experiment was small and the gross error in the data had the largest influence on the trajectory prediction. Zhang et al. [29] compared several filtering methods under several different types of noise. The results showed that median filtering works better when dealing with gross errors, so median filtering was used for data filtering. As shown in Figure 2, according to the collection duration, the experimental data can be classified into three categories: users with a long collection duration (more than 1 month), users with a medium collection duration (1 week to 1 month), and users with a short collection duration (less than 1 week). In order to verify the adaptability of the model under different trajectory lengths, 30% of users in each category were randomly selected for the experimental data. Eventually, experimental results of each category are listed.

Data Preprocessing
The quality of GPS data is affected by sensors, transmission paths, etc., and data leakage, data loss, and other noise pollution are prone to occur, which will adversely affect subsequent data analysis. Therefore, noise point filtering is required before trajectory prediction. Although the user trajectories in the Geolife dataset are widely distributed, most of them were collected in Beijing, China. Before the conventional noise processing, the latitude and longitude range (115.70 • N-117.37 • N, 39.40 • E-41.03 • E) was used to select the data inside the study area. Only floating car trajectory data according to transportation mode were selected according to the analysis in Section 1.
Several typical preprocessing methods for GPS data noise were selected: outlier filtering, smoothing, and data interpolation. The data sampling interval used in the experiment was small and the gross error in the data had the largest influence on the trajectory prediction. Zhang et al. [29] compared several filtering methods under several different types of noise. The results showed that median filtering works better when dealing with gross errors, so median filtering was used for data filtering.

Evaluation Metrics
In order to evaluate the validity of the proposed location extraction method, the silhouette coefficient (SC) was used [30]. The SC is a metric that does not need to know the true labeling of the dataset. The value of SC ranges from −1 to +1, and a high SC value indicates a good location extraction result.
The haversine distance between two points was used to measure the error between predicted resultŷ (longitude and latitude of a predicted location) and true value y (longitude and latitude of a true destination location). The haversine distance between point x(lon x , lat x ) and point y lon y , lat y is: where R is the radius of the Earth, and f (x, y) is defined as: f (x, y) = sin 2 lat y − lat x 2 + cos(lat x ) cos lat y sin 2 lon y − lon x 2 (9) In this paper, mean absolute error (MAE) and root mean square error (RMSE) are used to measure the experimental results of the prediction algorithm. MAE is the average haversine distance, which can be calculated as: RMSE is calculated as: For the sake of simplicity, all experimental results in this paper are expressed as integers.

Location Extraction Experimental Results
In order to evaluate the precision of the proposed D-DBSCAN algorithm compared with the traditional DBSCAN, the experiment of important location extraction is demonstrated with a randomly selected user's trajectory, and the results of different clustering algorithms can be visualized in Figure 3. In order to evaluate the validity of the proposed location extraction method, the silhouette coefficient (SC) was used [30]. The SC is a metric that does not need to know the true labeling of the dataset. The value of SC ranges from −1 to +1, and a high SC value indicates a good location extraction result.
The haversine distance between two points was used to measure the error between predicted result (longitude and latitude of a predicted location) and true value (longitude and latitude of a true destination location). The haversine distance between point ( , ) and point , is： where R is the radius of the Earth, and ( , ) is defined as: In this paper, mean absolute error (MAE) and root mean square error (RMSE) are used to measure the experimental results of the prediction algorithm. MAE is the average haversine distance, which can be calculated as: RMSE is calculated as： For the sake of simplicity, all experimental results in this paper are expressed as integers.

Location Extraction Experimental Results
In order to evaluate the precision of the proposed D-DBSCAN algorithm compared with the traditional DBSCAN, the experiment of important location extraction is demonstrated with a randomly selected user's trajectory, and the results of different clustering algorithms can be visualized in Figure 3. As shown in Figure 3, the proposed D-DBSCAN method effectively filters out strip shape clusters and preserves some small clusters. It is found that the proposed method can be used to dynamically adjust the clustering parameters, so it can take a relatively small Eps to filter out strip  Figure 3, the proposed D-DBSCAN method effectively filters out strip shape clusters and preserves some small clusters. It is found that the proposed method can be used to dynamically adjust the clustering parameters, so it can take a relatively small Eps to filter out strip shape clusters when the user is driving on the road at high velocity. As shown in Figure 3a,b, the D-DBSCAN algorithm can obtain better performance in filtering strip shape clusters than DBSCAN, which can reduce disturbance to the prediction model and help to improve the precision. Furthermore, the parameters can also be dynamically adjusted to take a relatively large value when the velocity is low to preserve these important location points (e.g., points A, C and B, D in Figure 3b, where A and C are intersections and B and D are congestion points in the trajectory).

As shown in
Accuracy comparisons can be found in Table 3, which provides the numbers of track points during clustering and obtained locations based on both algorithms. As the results shown in Table 3, from the evaluation metric perspective, D-DBSCAN has a higher SC (last row) than DBSCAN. From the perspective of the needs of the prediction model, similar to the previous analysis, D-DBSCAN can effectively filter strip shape clusters, so a lot of unimportant data are discarded and only the most important clusters are preserved, thus, as shown in Table 3 (second row), fewer track points remain with the D-DBSCAN clustering algorithm. At the same time, erroneously merging some different important locations due to data nonuniform distribution are effectively avoided based on the dynamic parameters' strategy, and more important locations are more accurately obtained by the D-DBSCAN algorithm than DBSCAN, which can also be seen in Table 3 (third row).
It is obvious that discarding as many unimportant track points as possible can reduce calculation cost; moreover, preserving the important locations and dividing locations into as many different clusters as possible are two key aspects of providing sufficient input data for the prediction model and avoiding disturbance to the model. The results show that the proposed D-DBSCAN can achieve the best performance in both aspects.
Furthermore, a series of velocity differences threshold ω were used in the experiment so as to check the influence of different minimum velocity difference threshold ω to the result of important location clustering, and the comparison experimental results are shown in Figure 4. shape clusters when the user is driving on the road at high velocity. As shown in Figure 3a,b, the D-DBSCAN algorithm can obtain better performance in filtering strip shape clusters than DBSCAN, which can reduce disturbance to the prediction model and help to improve the precision. Furthermore, the parameters can also be dynamically adjusted to take a relatively large value when the velocity is low to preserve these important location points (e.g., points A, C and B, D in Figure 3b, where A and C are intersections and B and D are congestion points in the trajectory). Accuracy comparisons can be found in Table 3, which provides the numbers of track points during clustering and obtained locations based on both algorithms. As the results shown in Table 3, from the evaluation metric perspective, D-DBSCAN has a higher SC (last row) than DBSCAN. From the perspective of the needs of the prediction model, similar to the previous analysis, D-DBSCAN can effectively filter strip shape clusters, so a lot of unimportant data are discarded and only the most important clusters are preserved, thus, as shown in Table 3 (second row), fewer track points remain with the D-DBSCAN clustering algorithm. At the same time, erroneously merging some different important locations due to data nonuniform distribution are effectively avoided based on the dynamic parameters' strategy, and more important locations are more accurately obtained by the D-DBSCAN algorithm than DBSCAN, which can also be seen in Table 3 (third row).
It is obvious that discarding as many unimportant track points as possible can reduce calculation cost; moreover, preserving the important locations and dividing locations into as many different clusters as possible are two key aspects of providing sufficient input data for the prediction model and avoiding disturbance to the model. The results show that the proposed D-DBSCAN can achieve the best performance in both aspects.
Furthermore, a series of velocity differences threshold were used in the experiment so as to check the influence of different minimum velocity difference threshold to the result of important location clustering, and the comparison experimental results are shown in Figure 4. According to the analysis above, partitioning trajectory points into different regions with an optimal minimum velocity difference threshold ω is the key to both obtaining a velocity similarity in the same region and avoiding the incorrect extracting important positions by appropriately setting Eps for each region. It can be seen from Figure 4 that the SC value gradually increases as the value of the minimum velocity difference threshold increases from 1(m/s) to 3(m/s). However, the SC value contrarily decreases as the value of the minimum velocity difference threshold continuously increases from 3(m/s) to 10(m/s). Since the value SC indicates the accuracy of important location clustering, ω = 3(m/s) is the optimal minimum velocity difference threshold in our experiments.

Location Prediction Experiment Results
Users have different behaviors and interests in different time periods and regions. According to the collection duration, the experimental data can be classified into three categories. In order to verify the prediction ability and adaptability of the proposed algorithm, 30% of users in each category were randomly selected for the location prediction experiment. The average accuracy of each category is listed. The size of time-step window is 10 in our experiment. We compared our model with some baseline approaches: RNN [17], STF-RNN [23], and ST-RNN [22]. The results are shown in Table 4.
It can be seen from the results in Table 4 that the proposed method is superior to the others in trajectory prediction based on users with historical trajectories of various lengths. In this paper, the attention mechanism and trajectory semantic information are added into the RNN model. Therefore, the prediction accuracy has an average increase of 87% compared to RNN. The more effective time-specific transition matrices and distance-specific ones are adopted in ST-RNN to extract temporal and spatial information, which makes it have higher prediction accuracy than STF-RNN. Since more semantic information is used in the proposed method and the attention mechanism fully reflects the different influence weights of the historical trajectory data on the predicted location, the prediction accuracy has an average increase of 55% compared to STF-RNN and has an average increase of 34% compared to ST-RNN.

Ablation Experiment Results
The size of the time-step window determines the size of the history track reference range. The larger the step window, the more historical track data is needed; the prediction accuracy will be affected accordingly, and the model training time will also become longer. In order to prove that applying an appropriate time-step window can reduce prediction error, different time-step windows were used in the experiment. The comparison experimental results with different time-step windows are shown in Figure 5. It can be seen from Figure 5 that the error of the prediction network is decreased sharply, and the prediction accuracy is increased when the time step is changed from 50 to 10. In addition, reducing the size of the time-step window can contribute to a reduction in the amount of data during network training and then reduce training consumption. When the time step is changed from 10 to 5, although the time consumption is reduced, the prediction error also increases, and the accuracy of the prediction model drops sharply. The reason for this result is that when the time step is 5, the network has a small history track reference range and there is too little historical information to learn the user's travel habits. It can be concluded that applying a suitable time-step window size can not only improve accuracy but also reduce time consumption.
Furthermore, in order to observe the contribution of each part of the model, an ablation experiment was conducted, and the experimental result is shown in Table 5, where RNN is the base model and SI and SAM represent the semantic information strategy and self-attention mechanism model, respectively. It can be seen from the results in Table 5 that each part of the proposed model plays its due role in improving prediction accuracy based on users with historical trajectories of various lengths. Since most users travel between their fixed residences and workplaces on workdays but are free to go anywhere they want on weekends, their behaviors are quite different between weekends and workdays. Thus, deeper semantic information from the data can better match the interest points of users to improve prediction performance. As the results in Table 5 show, the prediction accuracy of all different collection duration users is significantly improved after adding semantic information, and the average prediction accuracy is improved by 75%. Moreover, the directions of some key It can be seen from Figure 5 that the error of the prediction network is decreased sharply, and the prediction accuracy is increased when the time step is changed from 50 to 10. In addition, reducing the size of the time-step window can contribute to a reduction in the amount of data during network training and then reduce training consumption. When the time step is changed from 10 to 5, although the time consumption is reduced, the prediction error also increases, and the accuracy of the prediction model drops sharply. The reason for this result is that when the time step is 5, the network has a small history track reference range and there is too little historical information to learn the user's travel habits. It can be concluded that applying a suitable time-step window size can not only improve accuracy but also reduce time consumption.
Furthermore, in order to observe the contribution of each part of the model, an ablation experiment was conducted, and the experimental result is shown in Table 5, where RNN is the base model and SI and SAM represent the semantic information strategy and self-attention mechanism model, respectively. Table 5. Results of ablation experiment. SI, semantic information; SAM, self-attention mechanism.

Short Collection Duration Users Mean
different collection duration users is significantly improved after adding semantic information, and the average prediction accuracy is improved by 75%. Moreover, the directions of some key locations, such as the road intersections of the track, may play an important role in deciding whether users go to work or back home. Thus, paying close attention to the changes of track direction is another key factor to follow users' behaviors, so the additional SAM strategy can be used to further improve the accuracy of prediction. It can be seen from Table 5 that the average prediction accuracy is improved by 64% after applying self-attention mechanism.

Conclusions
In this paper, a novel dynamic important location extraction algorithm based on DBSCAN is proposed to meet the requirements of dynamic situations. Visualizing the clustering results, we can see that a better location extraction result is obtained. In addition, in the prediction model, a time-step window is added based on the RNN to shorten the training time and reduce the equipment requirements, and a self-attention mechanism and trajectory semantic information are added to the RNN model. It is proven by experiments that using a suitable time-step window size can not only improve accuracy but also reduce time consumption and adding a self-attention mechanism and trajectory semantic information can improve prediction accuracy. In the future, extracting more rich semantic information from the trajectory data to improve the adaptive ability of the model and realize the prediction of unknown important points is a possible research direction. Furthermore, location information belongs to the user's private information, so starting simultaneously collaborative research on location prediction and privacy protection is necessary and meaningful.