1. Introduction
With the popularity of smartphones and other equipment that can be used to record locations via the Global Navigation Satellite System (GNSS) [
1], capturing and recording trajectory data based on such equipment becomes more convenient. Thus, much research on location-based service (LBS) has been proposed to improve service capabilities by mining the laws from recorded trajectory data, such as for targeted advertising, road assistance and navigation, personnel tracking, point-of-interest (POI) recommendations [
2], and so on. Obviously, a complete LBS system can not only obtain a user’s current location, but also predict the next possible location based on their historical trajectory information, so as to plan the navigation path in advance or analyze the user’s behavior. Since recording and obtaining trajectory data have become easy and convenient, and predicting the next location based on historic behavior to find future destinations is a key factor for complete LBS [
2], so it is important to accurately predict the next location.
In fact, the research indicates that people’s mobility is highly dependent on their historical behaviors [
3]. Location prediction can be divided into two types: personalized and popularized [
4]. Personalized location prediction mainly looks at the historical trajectory of a single user, while popularized location prediction mines the travel habits of many users. Since personalized location prediction can be used to provide guidance for navigation and personality recommendations, it plays an important role in the research of location prediction, so this paper also pays attention to that research. It is clear that the trajectory of different users is basically different even if their visited locations might be similar. Since the personalized location prediction is based on user’s own behaviors, building a separate prediction model for each user might be preferable [
5]. In addition, the different characteristics of user behavior have different influences on prediction. For example, predicting dynamic behavior is more difficult than predicting fixed behavior, and predicting many possible destinations is also more difficult than predicting only several possible destinations. Much different from pedestrian, bus, subway, train, and airplane travel, where the behavior features are mostly fixed or the destinations are mostly limited, not only are the behaviors of floating cars dynamic, but also their destinations are hugely variable. Thus, research based on trajectory data recorded in floating cars is more meaningful and more challenging.
Moreover, a location prediction algorithm mainly predicts the user’s next important location, such as a mall, school, important road intersection, residential area, or attraction. In order to make meaningful predictions, discarding as much of the unimportant data as possible and keeping the most important data is a priority due to the massive and redundant location data recorded by floating cars. Thus, it is necessary to propose an important location extraction algorithm to filter and cluster the original trajectory data in order to reduce the number of trajectory points and obtain important locations as the input of the prediction model.
Based on the above analysis, an important location extraction algorithm and a precise location prediction model are the two aspects of location prediction research for LBS, the former for reducing the size of the dataset and data redundancy, and the latter for precise prediction of behaviors.
In order to obtain important locations for the prediction model, many traditional location extraction methods can be used, such as manual tagging algorithms [
2,
6] and clustering methods [
7,
8,
9]. Among them, attention-based spatiotemporal gated recurrent unit (ATST-GRU) introduced a method to realize POI recommendations by users sharing locations and check-in information with others in location-based social networks (LBSNs) [
2]. Tree-based hierarchical graph (TBHG) also introduced a method that marks important locations by matching the trajectory with the map [
6]. It is clear that both of these methods require geographical information, which will lead to heavy workloads. Furthermore, it is difficult to obtain the check-in information of users because of personal privacy issues.
Hybrid methods and clustering algorithms are two typical models that can be used to extract locations and avoid personal privacy issues. A typical hybrid method simply annotates POIs with meaningful information from LBSNs [
10]. Since an important location represents a geographic area where a user stayed for a certain interval of time, extracting the important location based on two scale parameters, time, and distance, instead of collecting check-in information can also avoid personal privacy issues. Therefore, many clustering algorithms have been proposed to extract locations by selecting appropriate distance and time thresholds [
11], such as density-based spatial clustering of applications with noise (DBSCAN) [
7,
8,
9], K-means [
12], and ordering points to identify the clustering structure (OPTICS) [
13] algorithms.
Although the above methods can be used to obtain satisfactory results in location extraction, artificially configuring the threshold and parameters, which have a great influence on the clustering results, is their obvious disadvantage. Due to the dynamic behaviors of different floating cars, obtaining common parameters for different users is impossible. Thus, an adaptive algorithm that can meet the dynamic requirements of different users should to be considered. In this paper, a dynamic DBSCAN (D-DBSCAN) algorithm based on the traditional DBSCAN algorithm is proposed to dynamically adjust the value of (the neighborhood radius) in order to extract important locations via different parameters based on different situations. The D-DBSCAN algorithm considers the dynamic information of trajectories and can effectively filter strip clusters. This has been proved by location extraction experiments.
As the traditional domain for trajectory analysis, location prediction based on floating car trajectory data was a kind of trip-matching process in early approaches [
14]. Sub-trajectory synthesis (SubSyn) [
14] divided the trajectories into sub-trajectories and then fused the sub-trajectories of different users to match current trips. Different from the above trip-matching algorithm, various Markov chain models offered a kind of destination-matching method where possible next locations were collected in advance and the prediction result was the probability of different important locations. Among them, Ashbrook et al. proposed a prediction method using the Markov model, where each node was an important location clustered from Global Positioning System (GPS) data, and a transition between two nodes represented the probability of the user traveling between those two important locations [
7]. On this basis, Simmons et al. [
15] proposed a hidden Markov model (HMM) to predict destinations. Furthermore, Ashbrook et al. [
16] built models by adding the concept of support points to the hidden Markov model to improve prediction performance. Although the experimental results showed that the strategy of support points can be used to improve the performance of the algorithm, the model still cannot achieve high accuracy because a different choice of strategy will invariably lead to different performance, and no appropriate method has been proposed to match the various requirements in different situations. It is possible that a passenger may go to a totally different place where the floating car has never been. Thus, not only trip-matching algorithms but also destination-matching methods cannot meet the requirements of new next-location prediction. Furthermore, the above approaches will be ineffective when there is an absence of prior knowledge.
With the development of machine learning, deep learning technology has been widely used for next-location prediction. First, a kind of Bayesian network model was designed, and descriptive information of the trajectory was added to the model as a feature to perform destination prediction [
14]. In addition, a location-prediction algorithm mainly makes predictions based on contextual information of the trajectory and the current location. Since the historical trajectory information is typical time series data, the location-prediction algorithm is also a typical time series prediction process. Recently, recurrent neural networks (RNNs) [
17] have been adopted in machine translation [
18], target recognition [
19], video behavior recognition [
20], sentiment classification, and image caption generation [
21] and show promising performance in processing time series prediction compared with traditional methods. Therefore, RNNs can be used to predict next important locations [
22,
23]. Liu and co-workers [
22] and Al-Molegi and colleagues [
23] focused on using a set of features to obtain a good prediction performance. In these models all historical trajectory points have the same importance whether they are located in an intersection or in the middle of a straight road. In fact, trajectory points located in intersections, which greatly affect the turning direction, may play a more important role in next-location prediction. Some attention mechanisms need to be further considered for prediction algorithms based on the traditional RNN model in order to track the dramatic changes of floating car trajectories.
To remedy the two problems, this paper proposes a D-DBSCAN location extraction algorithm to cluster important locations and an attention-RNN location prediction algorithm to predict the next location. The main contributions are summarized as follows:
- (1).
A novel dynamic important location extraction algorithm based on DBSCAN is proposed to extract important locations via different parameters based on different situations to meet the requirements of dynamic situations. The algorithm can dynamically adjust the parameters by tracking different user behaviors and can effectively filter out strip shape clusters in order to avoid using such invalid data as input for the prediction algorithm.
- (2).
We propose attention-RNN location prediction, which can assign a different level of attention to historical track points to grasp the spatial characteristics of trajectories and closely track the user trajectory.
- (3).
A time-step window mechanism is added to the attention-RNN model to reduce the time consumption and computational complexity.
The rest of this paper is organized as follows. We briefly introduce some related work on important location extraction and prediction models in
Section 2. The dynamic important location extraction algorithm (D-DBSCAN) and attention-RNN location prediction model are proposed in
Section 3.
Section 4 reports the experimental design, performance metrics, and extensive experimental results and discusses the merits and drawbacks of our results and the baseline methods (RNN [
17], space time features based RNN (STF-RNN) [
23], and spatial-temporal RNN (ST-RNN) [
22]). Finally, the conclusions are drawn in the last section.
2. Related Works
The DBSCAN algorithm does not need to set a fixed number of clusters in advance and has advantages over other algorithms, and the clustering results are not constrained by the cluster shape. However, the traditional algorithm still has some disadvantages, including that the parameters of the neighborhood radius
Eps and the minimum number of points contained in the neighborhood (
Minpts) must be set artificially, and the selection of parameters has a crucial effect on clustering results [
8]. Unfortunately, it is difficult to select an appropriate value of
Eps for the traditional DBSCAN algorithm, especially in massive nonuniform distribution conditions. For example, some important locations may be merged, or a single important clustering location may be split into many different clusters. In order to solve the parameter setting problem of the traditional DBSCAN model, Zhou et al. proposed a DBSCAN algorithm based on data partitioning [
24]. The algorithm partitions the data according to density and establishes an
R* tree in each region to obtain a
K-dist map. According to the
K-dist map, the appropriate values are selected in different partitions, and finally the clustering results of each region are combined to obtain the final results.
The partition-based DBSCAN algorithm can be used to obtain clustering locations that are not constrained by shapes, and small areas can also be preserved so as to avoid discarding some important location information. However, due to the massive dataset, the partition-based DBSCAN algorithm can obtain many strip clusters that are formed by points on the road. Most of these strip clusters are meaningless locations and should not be input into the prediction model. On the other hand, some clusters that are formed by congestion points and intersection points play an important role in the location prediction application and these clusters need to be retained.
Based on the above analysis, a dynamic DBSCAN (D-DBSCAN) algorithm is necessary to dynamically adjust the value of
and extract important locations via different parameters in different situations, such as different velocity and time information, rather than artificially setting the parameters. The details of the D-DBSCAN algorithm are presented in
Section 3.
In order to deal with the problem of trajectory location prediction, Brébisson et al. [
5] inputted trajectory points one by one into an RNN network and used the memory function of the hidden layer to achieve the prediction purpose. Liu et al. extended the RNN and proposed a novel method called spatial temporal RNN (ST-RNN) [
22]. ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances [
22]. Al-Molegi et al. [
23] proposed a method to leverage RNN to model people’s movement behaviors in order to predict their next location. Space and time are included in the network as features, where their internal representations are learned by the network itself rather than relying on a manmade representation.
In the above algorithms, all historical trajectory points have the same importance whether they are located in an intersection or in the middle of a straight road. In fact, trajectory points located in intersections, which greatly affect the turning direction, may play a more important role in next-location prediction. In order to fully consider the different influence weights of historical trajectory data on predicted locations, an attention mechanism module is added in the traditional RNN network in this paper. At the same time, in order to solve the problem that the performance of the model will deteriorate rapidly as the length of the input sequence increases, a time-step window should be considered to reduce the time consumption and computational complexity and improve training efficiency. In addition, an attention mechanism and the trajectory semantic information must be fully considered to grasp the spatial characteristics of trajectories.
3. Methodology
3.1. Dynamic Location Extraction Method
The D-DBSCAN algorithm integrates the track point velocity information dynamically, which can adjust the key parameters of the algorithm dynamically. The main principles of the D-DBSCAN algorithm are as follows: (1) it calculates the instantaneous velocity of each trajectory point according to time sequences, (2) divides the points into different regions based on their velocity so as to reduce the velocity difference in the same region to better set , and (3) it takes the velocity of the track point as an assist, dynamically adjusting the value of .
Generally, denote and as the trajectory data and the corresponding timestamp data of a certain user, then the velocity of each trajectory point can be computed based on and . By sorting in descending order to obtain the sorted velocity and then subtracting the sorted velocity one by one to get the velocity differences , the maximum velocity and the minimum velocity can be easily found from . After that the maximum velocity differences is calculated , where all elements of are more than a minimum velocity difference threshold . It is clear that the different values of may lead to obtain different numbers of the maximum velocity differences.
In order to divide the points into different regions, velocity partition thresholds can be computed firstly based on the velocity differences . is obtained by subtracting and , and without loss of generality, denotes the smaller of two corresponding velocities ( and ) of each maximum velocity differences as a velocity threshold, then the corresponding velocity thresholds of the maximum velocity differences can be noted as , and thus, by sorting in ascending order to obtain the sorted . can be defined as velocity partition thresholds, where and . Then, all trajectory points can be easily divided into regions based on . Thus, the mean velocity of each region can be computed.
In order to clearly demonstrate the above process to obtain velocity partition thresholds, a simple example is shown in
Table 1. It is also clear that the different number of maximum velocity differences may lead to obtaining a different number of regions.
Obviously, the velocities of the different trajectory points
which belong to the same region are similar and thus the same parameter of
can be used. Therefore, the parameter
of each region can be dynamically calculated by Equation (1), and then clustering in each region. Finally, the clustering results of each region are combined to obtain the final results. The complete procedure is described as Algorithm 1:
where
ε is the influence factor which can be simply estimated via calculating the
K-dist map [
24] (the appropriate value of
ε is selected as 0.06 in our experiment). Moreover, although the velocity based partition method can reduce the velocity difference of the trajectory points in the same region so as to more appropriately set
Eps parameter for each region, partitioning too many regions based on a too small velocity difference threshold
may lead to splitting a single important clustering location into several fragmented location parts and which will reduce the accuracy of important location clustering as well as increase computational complexity. By the results of an ablation experiment,
is defined as the minimum velocity difference threshold in our experiment so as to obtain the optimal region number
N via the velocity differences
Algorithm 1: D-DBSCAN algorithm for location extraction |
Input: Given the trajectory data , and the timestamp data . |
(1) Calculate based on and , and sort V in descending order to get . |
(2) Obtain and from and subtract one by one to get . |
(3) Get the velocity partition thresholds based on , , and . |
(4) Partition all trajectory points into regions based on : For If else if … else if … else if End if End for |
(5) Calculate the mean velocity of each region . |
(6) Update via Equation (1) for each region, and cluster by region. |
(7) Combine the clustering results of each region and calculate the latitude and longitude of the center point of each cluster. |
Output: Location extraction results (sequence of locations). |
As shown in Algorithm 1, a larger instantaneous velocity of the trajectory point indicates a smaller Eps based on Equation (1). Thus, the D-DBSCAN algorithm can realize dynamic clustering under different velocity conditions by adjusting Eps dynamically, and then key locations, such as intersection points, can be retained and strip clusters can be filtered out, and finally a better location extraction result can be obtained to provide a better guarantee for location prediction. Moreover, the algorithm takes full account of the dynamic characteristics of trajectories, thus better location extraction results can be obtained.
3.2. Attention-RNN Location Prediction
3.2.1. Traditional RNN Prediction Network
The architecture of an RNN is a recurrent structure, and traditional RNN includes the Elman network and the Jordon network. The Elman network feeds the hidden layer (
) back into the recurrent structure, while the Jordon network feeds the output of the network (
) back into the recurrent structure. Many network variants are derived from the Elman network, so in general, when it comes to RNN, it refers to the Elman network. The Elman network is adopted in this paper, and the hidden layer (
) is fed back into the recurrent structure. At each time
, we can predict the hidden layer (
) by the previous moment hidden layer (
), and then feed the new hidden layer (
) back into the next hidden status. The formulation of the hidden layer in RNN is:
where the activation function
is a
function,
is the input layer weight, and
is the hidden layer weight.
The final result
of the network can be obtained by using the appropriate activation function
and the output layer weight
on the hidden layer state
generated from the hidden layer.
3.2.2. Attention-RNN Location Prediction
Due to the unequal time interval from the predicted location and different spatial adjacencies, different historical trajectory data have unequal influence weights on the predicted location. For example, the next location is greatly affected by the trajectory point at that turning point when the direction of the track changes. The attention mechanism module is added to the traditional RNN network to fully reflect the different influence weights of the historical trajectory data on the predicted location. The architecture of the proposed location prediction network in this paper is represented in
Figure 1.
As shown in
Figure 1, a time-step window model is designed to divide the original trajectory sequence into several sub-tracks to reduce the amount of data and improve the training efficiency of the network, and the trajectory semantic information (SI) is extracted by the embedding layer. The attention coefficient of the proposed self-attention mechanism (SAM) can be calculated via azimuth information, which can also be obtained from the original trajectory sequence.
After that, a feature vector of the network input layer can be obtained by concatenating the sub-tracks with the trajectory description information, which goes through the embedding layer, and the attention coefficients are added into the RNN network; also, the attention mechanism is adopted to make the network better grasp the spatial characteristics of trajectories. Finally, the prediction location (latitude and longitude) is obtained by the activation function. The different parts of the attention-RNN location prediction method are described in detail as follows:
(a) Time-step window: In location prediction experiments, padding based on the length of the longest track is usually used to solve the problem of unequal track sequence lengths, then track points are inputted into the RNN one by one. However, because some of the tracks are collected by the user for 24 hours or longer, padding all tracks will result in a sharp increase in data volume, and it turns out that the influence of historical trajectory points on predicted location gradually weakens over time. If the training is not controlled, it will not only be a waste of meaningless resources, but also will likely be counterproductive to the prediction results. Inspired by the concept of mask convolution in pixel convolutional neural network (CNN) [
25], a time-step window is added to divide the original trajectory into several sub-tracks so that all tracks are guaranteed to have the same number of timestamps as the window size and the scope of the historical information is controlled in a certain range. The experimental results show that adopting an appropriate time-step window size can improve not only the efficiency of network training but also the prediction accuracy.
(b) Trajectory semantic information: Using only the latitude and longitude of the GPS point for location prediction, the network can learn poorly because the feature dimension is too small, and the accuracy of the predicted result is not sufficient. In the location prediction experiment, not only the latitude and longitude coordinates of the trajectory data but also some description information should be utilized. Considering that there are certain differences in people’s travel destinations between weekends and workdays, an embedding layer is added to the network to mine deeper semantic information (SI) from the data, and to determine whether the trajectory occurs on a weekend or weekday. After the embedding layer, the time information is concatenated with the trajectory sequence, and then input into the RNN.
(c) Self-attention mechanism: In order to fully reflect the different influence weights of the historical trajectory data on the predicted location, this paper introduces a self-attention mechanism (SAM) to learn a set of weight information based on the azimuth changes produced by historical trajectories. In the process of training, each historical track point is assigned a different level of attention.
In the process of obtaining the attention coefficient, the concept of coordinate azimuth is introduced [
26]. Usually the coordinate azimuths of the trajectory data
are defined as
. The azimuth of trajectory point
is
, and azimuth
is the angle formed by clockwise rotation from the north to the line connected by track point
and its previous point
based on the definition of azimuth. Thus, the azimuths of trajectory
are as follows:
where
= 0, and
is the azimuth of
. Then the angle of the entire track changes can be denoted as:
When
is taken as
, the track direction changes the most, and the degree of change is symmetrically distributed from
to
and from
to
. Therefore, the correspondence between attention coefficients
s and
is:
where
is a scaling factor, and the network’s powerful self-learning ability is used to adjust the attention coefficient and obtain
. Using the attention factor obtained and the attention layer weight
, Equation (3) is modified to update the hidden layer: