Sparse Trajectory Prediction Based on Multiple Entropy Measures

: Trajectory prediction is an important problem that has a large number of applications. A common approach to trajectory prediction is based on historical trajectories. However, existing techniques suffer from the “data sparsity problem”. The available historical trajectories are far from enough to cover all possible query trajectories. We propose the sparsity trajectory prediction algorithm based on multiple entropy measures (STP-ME) to address the data sparsity problem. Firstly, the moving region is iteratively divided into a two-dimensional plane grid graph, and each trajectory is represented as a grid sequence with temporal information. Secondly, trajectory entropy is used to evaluate trajectory’s regularity, the L-Z entropy estimator is implemented to calculate trajectory entropy, and a new trajectory space is generated through trajectory synthesis. We deﬁne location entropy and time entropy to measure the popularity of locations and timeslots respectively. Finally, a second-order Markov model that contains a temporal dimension is adopted to perform sparse trajectory prediction. The experiments show that when trip completed percentage increases towards 90%, the coverage of the baseline algorithm decreases to almost 25%, while the STP-ME algorithm successfully copes with it as expected with only an unnoticeable drop in coverage, and can constantly answer almost 100% of query trajectories. It is found that the STP-ME algorithm improves the prediction accuracy generally by as much as 8%, 3%, and 4%, compared to the baseline algorithm, the second-order Markov model (2-MM), and sub-trajectory synthesis (SubSyn) algorithm, respectively. At the same time, the prediction time of STP-ME algorithm is negligible (10 µ s), greatly outperforming the baseline algorithm (100 ms).


Introduction
As the usage of Global Positioning System (GPS) and smart mobile devices (SMD) becomes a part of our daily lives, we benefit increasingly from various types of location-based services (LBSs), such as route finding and location-based social networking.A number of new location-based applications require trajectory prediction, for example, to recommend sightseeing places, and to send targeted advertisements based on destination.Trajectory prediction has become one of the focuses for research and applications within the area of LBSs.Numerous studies have demonstrated that there is a high potential predictability in people mobility [1,2].Lian et al. [3] put forward a collaborative exploration and periodically returning model (CEPR) exploiting a novel problem, exploration prediction (EP), which forecasts whether people will seek unvisited locations to visit.Yao et al. [4] proposed an algorithm to predict human mobility in tensors of high-dimensional location context data.Using the tensor decomposition method, Yao et al. extracted human mobility patterns with multiple expressions and then synthesized the future mobility events based on mobility patterns.Alahi et al. [5] proposed a long short-term memory (LSTM) model which can learn general human movement and predict their future trajectories.Qiao et al. [6] proposed a three-in-one Trajectory-Prediction (TP) model in road-constrained transportation networks called TraPlan.TraPlan contains three essential techniques: (1) constrained network R-tree (CNR-tree), which is a two-tiered dynamic index structure of moving objects based on transportation networks; (2) a region-of-interest (RoI) discovery algorithm, which is employed to partition a large number of trajectory points into distinct clusters; and (3) a Trajectory-Prediction (TP) approach based on frequent trajectory patterns (FTP) tree, called FTP-mining, which is proposed to discover FTPs to infer future locations of objects within RoIs.The Markov chain (MC) model has been adopted by a number of works on predicting human mobility [7,8] to incorporate some amount of memory.Second-order MC has the best accuracies, up to 95%, for predicting human mobility, and higher order MC (>2) is not necessarily more accurate, but is often less precise.Abdel-Fatao et al. [9] demonstrated that the temporal information of a trajectory provides more accurate results for predicting the destination of the trajectory.However, the above methods suffer from the "data sparsity problem", so that many irregular patterns are contained in the huge trajectory space or only a small portion of query trajectories can match completely with the existing trajectories.
To address the data sparsity problem of trajectory prediction, Xue et al. [10,11] proposed a novel method based on the sub-trajectory synthesis (SubSyn) algorithm.The SubSyn algorithm first decomposes historical trajectories into sub-trajectories comprising two adjacent locations and builds the first-order Markov transition model, then connects the sub-trajectories into "synthesized" trajectories for destination prediction.However, the above method has some drawbacks: (1) the trajectory space is so large that the time taken by sub-trajectory synthesis is very long; (2) the prediction accuracy may be reduced because of some abnormal trajectories which influence the reliability of "synthesized" trajectories in the trajectory space; and (3) the temporal dimension and popularity of locations are ignored.For the above drawbacks, this paper proposes a sparse trajectory prediction method based on entropy estimation and a second-order Markov model.Firstly, we conduct a spatial iterative grid partition for the moving region of trajectories, and then the trajectory can be represented as a sequence of grid cells with temporal information.Secondly, we use an L-Z entropy estimator [12,13] to evaluate trajectory regularity [2] and implement it to compute the L-Z entropy of a trajectory sequences.Thirdly, we conduct trajectory synthesis based on the trajectory L-Z entropy and put synthesized trajectories into a new trajectory space.The trajectory synthesis can not only resolve the sparse problem of trajectory data, but also make the new trajectory space smaller and more credible.Fourthly, we define location entropy and time entropy to measure the popularity of locations and times, respectively.Finally, we combine location entropy and time entropy with the second-order Markov model for destination prediction under the new trajectory space.
The remainder of this paper is organized as follows: in Section 2, we introduce the spatial iterative grid partition and representations of trajectory sequences with time; in Section 3, the trajectory synthesis based on the L-Z entropy estimator is introduced; in Section 4, we define the location entropy and time entropy, and also provide an introduction of a sparsity trajectory prediction algorithm based on entropy estimation and the second-order Markov model; in Section 5, we show the experiments and results to demonstrate the effectiveness of the algorithm; and in Section 6 is the conclusion.

Trajectory Sequence with Time Based on Spatial Iterative Grid Partition
A common approach to a moving region spatial partition is uniform grid partitioning with a fixed size.Actually, when people browse maps on the Internet, the view of the map is different from different scales.The same size of the map includes different amounts of geographical elements from different scales, which is caused by map accuracy.The fewer geographical elements that the same sized rectangular region of map includes, the higher the geographical precision of the geographical elements represented by the map.Similarly, for the same size sample space, the more meticulous sample space partition would make the trajectory sequence much closer to the original trajectory.It is easy to divide the related GPS points into different grids by a uniform grid partition with the same size.It may separate GPS point classes into error grids.Figure 1 shows that the four GPS points are very close on the spatial location.However, they are divided into different grids because of the uniform grid partition.Thus, the connection between them is separated and it may affect the results of trajectory mining greatly.
is easy to divide the related GPS points into different grids by a uniform grid partition with the same size.It may separate GPS point classes into error grids.Figure 1 shows that the four GPS points are very close on the spatial location.However, they are divided into different grids because of the uniform grid partition.Thus, the connection between them is separated and it may affect the results of trajectory mining greatly.

Spatial Iterative Grid Partition
We proposed a spatial iterative grid partition (SIGP) to solve the above problem.As illustrated in Figure 2, moving regions with dense GPS point coverage will be partitioned into more grids with each grid having iteratively smaller areas by SIGP.This improves the precision of the grid to double with each iterative grid partition.Due to the continual spatial iterative grid partition for moving regions with dense GPS point coverage, the size of each grid cell will reach a suitable value and all grids include even geographical elements.Uniform partition divides the space into a two-dimensional grid through only one partition.A trajectory can be represented as a sequence of cells according to the sequence of GPS points in the trajectory.In SIGP, the space is partitioned in multiple times.We repeat the partition process recursively until a desired grid granularity is reached.SIGP yields a more balanced number of points in each cell than a uniform grid.This leads to better prediction accuracy.
The spatial iterative grid partition algorithm is shown as follows:

Spatial Iterative Grid Partition
We proposed a spatial iterative grid partition (SIGP) to solve the above problem.As illustrated in Figure 2, moving regions with dense GPS point coverage will be partitioned into more grids with each grid having iteratively smaller areas by SIGP.This improves the precision of the grid to double with each iterative grid partition.Due to the continual spatial iterative grid partition for moving regions with dense GPS point coverage, the size of each grid cell will reach a suitable value and all grids include even geographical elements.
is easy to divide the related GPS points into different grids by a uniform grid partition with the same size.It may separate GPS point classes into error grids.Figure 1 shows that the four GPS points are very close on the spatial location.However, they are divided into different grids because of the uniform grid partition.Thus, the connection between them is separated and it may affect the results of trajectory mining greatly.

Spatial Iterative Grid Partition
We proposed a spatial iterative grid partition (SIGP) to solve the above problem.As illustrated in Figure 2, moving regions with dense GPS point coverage will be partitioned into more grids with each grid having iteratively smaller areas by SIGP.This improves the precision of the grid to double with each iterative grid partition.Due to the continual spatial iterative grid partition for moving regions with dense GPS point coverage, the size of each grid cell will reach a suitable value and all grids include even geographical elements.Uniform partition divides the space into a two-dimensional grid through only one partition.A trajectory can be represented as a sequence of cells according to the sequence of GPS points in the trajectory.In SIGP, the space is partitioned in multiple times.We repeat the partition process recursively until a desired grid granularity is reached.SIGP yields a more balanced number of points in each cell than a uniform grid.This leads to better prediction accuracy.
The spatial iterative grid partition algorithm is shown as follows: Uniform partition divides the space into a two-dimensional grid through only one partition.A trajectory can be represented as a sequence of cells according to the sequence of GPS points in the trajectory.In SIGP, the space is partitioned in multiple times.We repeat the partition process recursively until a desired grid granularity is reached.SIGP yields a more balanced number of points in each cell than a uniform grid.This leads to better prediction accuracy.
The spatial iterative grid partition algorithm is shown as follows: for each grid g 0,i in {g num = count(g 0,i ) // Count the points located in each grid cell 4.
if num ≥ n then // These grid cells need further dividing 5.
G.push(g 0,i ) //Put into the result set G 8. end 9. end In Algorithm 1, each dimension of the moving region is divided into d fragments so that the moving region is divided into d × d same size grid cells.The width and height of each grid cell is W 0i and H 0i .Parameter n is the partitioning condition for every grid cell.If the grid contains more than n GPS points, it should be divided into four grid cells again.Otherwise, the grid cell is considered as "local sparsity".The parameter n reflects the locality of moving region partitioned by the SIGP algorithm.For the grid cell containing more than n GPS points, we use the Algorithm 2 Iterate-Partition to partition the grid cell.The process of the Algorithm Iterate-Partition is recursive to reflect the hierarchical partition characteristic of SIGP algorithm.

Algorithm 2. Iterate-Partition
Input: G (iterative partition grid set), g i (grid need to be divided); n(GPS point threshold of each grid) Output: G (iterative partition grid set) 1. Divide Count the points in g i+1,j as count(g i+1,j ) 3.
return G 8. end

Trajectory Description Based on SIGP and Time
Nowadays, trajectory data are collected as GPS points with timestamps.These original GPS points cannot be used for trajectory prediction and they require serialization.
Firstly, we decompose each day into non-overlapping timeslots Each original trajectory can be presented by a sequence of n points, each with a timestamp.Formally: where t k , lon k , lat k denote the kth GPS point's time, longitude, and latitude.
The map is constructed as a two-dimensional grid graph which consists of G by SIGP.All coordinate points are chronologically mapped to the grid graph so that a trajectory can be represented as a sequence of grid cells according to the sequence of locations of the trajectory.Formally: where g k is the grid in the grid graph of trajectory sequence at timeslot t k .
For the same timeslots t i = t j , if consecutive grids g i = g j , g i , and g j are combined into one grid cell.Similarly, we combine all the neighboring and same grid cells of trajectory sequence:

Trajectory Synthesis Based on L-Z Entropy Estimation
The main idea of trajectory synthesis based on L-Z entropy estimation is an L-Z entropy estimator, which is used to evaluate trajectory's regularity and calculate the entropy value of trajectory sequence.A new trajectory space with stronger regularity is generated by doing trajectory synthesis based on L-Z entropy.

Trajectory Entropy
Entropy can be used to quantify uncertainty, complexity, randomness, and regularity [14].In recent decades, entropy has come to be applied very broadly [15].We use trajectory entropy to evaluate the trajectory's regularity.We implement the L-Z entropy estimation on the basis of Lempel-Ziv complexity [12] and use it to compute the entropy of trajectory sequence.
Trajectories are treated as time series data and trajectory entropy is introduced as a measure of regularity of sequential data in time series analysis.For a trajectory sequence tra = {(t k , g k )} M k=1 , the L-Z entropy can be computed by Equation ( 4): where M is the number of grid cells of trajectory tra; Λ k is defined as the length of the shortest sub-trajectory starting at position k that did not occur in the trajectory {(t k , g k )} M k=1 previously.It has been proven that E(tra) converges to the actual entropy when m approaches infinity [2,16].The smaller the entropy is, the stronger the trajectory's regularity is, and vice versa.

Trajectory Synthesis Based on Entropy Estimation
It is obvious that there are some abnormal trajectories which affect prediction accuracy in the trajectory space.To enhance the regularity of the trajectory space, we do trajectory synthesis based on trajectory entropy and put synthesized trajectories into the trajectory space.Firstly, the map is constructed as a finer grid to create less overlap between the trajectories.For each trajectory tra i , the entropy e i of tra i is computed.Thus, the trajectory space can be obtained and the trajectories are sorted by entropy value {(tra i , e i ) |e i ≤ e i+1 } n i=1 .Then m (the trajectory selection parameter we can set) trajectories, which have comparatively low entropy values, are chosen (higher regularity) as the new trajectory space.For every trajectory of the new trajectory space, if there are cross-nodes with other trajectories, divide them into sub-trajectories by these cross-nodes.Then we compute the sub-trajectories' entropy by L-Z entropy estimation.The sub-trajectories are sorted by the sequence of nodes of the trajectory that is going to be synthesized.We keep the sub-trajectories that have lower entropy if there is overlapping among them (those sub-trajectories of the trajectory that is going to be synthesized).Finally, the remainder sub-trajectories with lower entropy are synthesized.The trajectory synthesis algorithm is shown as Algorithm 3.

Sparse Trajectory Prediction Based on Multiple Entropy Measures
Under the smaller and more credible trajectory space generated by TS-EE, sparse trajectory prediction based on multiple entropy measures (STP-ME) combines Location entropy and time entropy with the second-order Markov model to do sparse trajectory prediction.

Location Entropy
The location entropy measures how popular a location is in terms of the people who visited it.In information theory, it is the amount of information about the users' trajectories that visited location l.The first obvious observation is that the more visitors at l, the lower the entropy and the higher the prediction.However, the popularity of a location cannot always be described by just using the number of the visitors at the location, and this is where the entropy comes into play.
Location entropy measures the diversity of unique visitors of a location.A low value of the location entropy indicates a popular place with many visitors.Formally, the location entropy of location can be computed as: where V l,u = {|< u, l, t >||∀t} denotes the set of visiting location l by user u and V l = {|< u, l, t >||∀t, ∀u} is the set of visiting at location l by all users.

Time Entropy
Time entropy measures how popular a timeslot is in terms of how many locations people visited.In information theory, it is the amount of information about the locations visited at timeslot t.The first obvious observation is that the more locations at timeslot t, the lesser the entropy and the higher the prediction.However, the popularity of a timeslot cannot always be described by just using the number of the locations at the timeslot, and this is where the entropy comes into play.The advantage of using entropy is that it measures the timeslot popularity based on the number of locations over the users who visited them.
Time entropy measures the diversity of unique visitors of different time slots.Formally, the time entropy of timeslot can be computed as: where L t,u = {|< u, l, t >||∀l} denotes the number of locations visited by user u at timeslot t and L t = {|< u, l, t >||∀l, ∀u} is the number of locations visited by all users at timeslot t.

Second-Order Markov Model for Trajectory Prediction
A number of studies [7,8] have established that the second-order Markov model (2-MM) has the best accuracies, up to 95%, for predicting human mobility, and that higher-order MM (>2) is not necessarily more accurate, but is often less precise.However, the 2-MM always utilizes historical geo-spatial trajectories to train a transition probability matrix and in 2-MM (see Figure 3a) the probability of each destination is computed based only on the present and immediate past grids of interest that a user visited without using temporal information.Despite being quite successful in predicting human mobility, existing works share some major drawbacks.Firstly, the majority of the existing works are time-unaware in the sense that they neglect the temporal dimension of users' mobility (such as time of the day) in their models.Consequently, they can only tell where, but not when, a user is likely to visit a location.Neglecting the temporal dimension can have severe implications on some applications that heavily rely on temporal information for the effective function.For example, in homeland security, temporal information is vital in predicting the anticipated movement of a suspect if a potential crime is to be averted.Secondly, no existing works have focused on the popularity of locations and timeslots with considering locations users are interested in and in which timeslots users are active.Trajectory prediction accuracy would be improved by computing user's popularity of different locations and timeslots quantitatively; for example, people are most likely to go shopping or walking in the park after work.We propose the second-order Markov model with Temporal information (2-TMM, see Figure 3b) for trajectory prediction based on location entropy and time entropy.Specifically, using Bayes rule, we find the stationary distribution of posterior probabilities of visiting locations during specified timeslots.We then build a second-order mobility transition matrix and combine location entropy and time entropy with a second-order Markov chain model for predicting most likely next location that the user will visit in the next timeslot, using the location entropy, the time entropy, the transition matrix, and the stationary posterior probability distributions.
Let G = {g 1 , g 2 , g 3 , • • • , g n } denote a finite set of grids partitioned by SIGP.Additionally, let T = {t 1 , t 2 , t 3 , • • • , t m } be a set of predefined timeslots in a day.Thus, tra(u) = {(t 1 , g 1 ), (t 2 , g 2 ), . . ., (t k , g k )} denotes a finite set of historical grids with temporal information visited by user u.Assuming Table 1 represents statistics of historical visit behaviors of all users, Table 1a corresponds to trajectories' historical visits to grids without considering temporal information, and Table 1b corresponds to trajectories' historical visits to grids during specified timeslots, where Frequency(g i ) = ∑ t i ∈T f requency(g i , t i ).Definition 1.Given a finite set of grids with time visited by trajectories, the visit probability, denoted by λ(g i , t j ) of a grid g i ∈ G, is a numerical estimate of the likelihood that users will visit grid g i during t j ∈ T. We express a visit probability of grid g i in terms of two component probabilities coined as (i) grid feature-correlated visit probability (GVP), and (ii) temporal feature-correlated visit probability (TVP).GVP of a grid g i denoted by P(g i ), is a prior probability of visit to g i expressed as a ratio of number of times trajectories visited g i to the total number of visits to all grids in the trajectories' grid history.
Table 2a exemplifies GVP probabilities computed from Table 1a.TVP of g i during t j , denoted by P(t j |g i ), is a conditional probability that a visit occurred during t j given that g i is visited by trajectories.Table 2b shows TVP probabilities obtained from Table 1b.In line with Definition 1, we compute the visit probability of a semantic location by applying the Bayes' rule to GVP and TVP.Accordingly, the visit probability of g i during timeslot t j is given by: λ(g i , t j ) = P(g i )P(t j g i ) [P(g i )P(t where 0 ≤ λ(g i , t j ) ≤ 1. P(g i ) and P(t j |g i ) are defined in Definition 1.
Applying Equation ( 7) to Table 2 yields visit probabilities for grids visited during each timeslot in Table 3.Each column in Table 3 is a probability vector showing a distribution of λ(g i , t j ) for each g i ∈ G during t j , where ∑ g i ∈G λ(g i , t j ) = 1.In line with Definition 2, the probability that a trajectory's destination will be a grid g d during timeslot t j+1 can be expressed as P[(g d , t j+1 ) (g i , t j ), (g i−1 , t j−1 )] .Definition 3. A transition probability p j hid with respect to 2-TMM is the probability that a trajectory will move to a destination grid g d during timeslot t j+1 given that the user has successively visited locations having tags g h and g i during timeslots t j−1 and t j , respectively.
We denote a transition from grids g h and g i during timeslots t j−1 and t j , respectively, to a destination grid g d during timeslot t j+1 by [g . The transition probability is computed as: where g * is the tag of any location at t j+1 .We predict the destination grid g pre of the most likely next grid and its probability by computing right hand side of Equation ( 9): Let probability vectors λ(t j ) and λ(t j−1 ) represent distributions of visit probabilities of grids during timeslots t j and t j−1 , respectively.We represent the initial probability distribution of 2-TMM by the joint distribution of λ(t j ) and λ(t j−1 ) given by λ(t j t j−1 ) = λ(t j )λ(t j−1 ) = λ(g 1 , t j t j−1 ), λ(g 2 , t j t j−1 ), λ(g 3 , t j t j−1 ), • • • , λ(g n , t j t j−1 ) .Given the initial probability distribution and the matrix of transition probabilities, and the location entropy and time entropy computed, the prediction destination of a target query trajectory is calculated by using:

Experimental Evaluation and Analysis of the Results
In this section, we conduct an extensive experimental study to evaluate the performance of our STP-ME algorithm.It is worth mentioning that all of the experiments were run on a commodity computer with Intel Core i5 CPU (2.3GHz) (Intel Corporation, Santa Clara, CA, USA) and 4GB RAM.We use a real-world large scale taxi trajectory dataset from the T-drive project in our experiments [17].It contains a total of 580,000 taxi trajectories in the city of Beijing, with 15 million GPS data points from 2 February 2008 to 8 February 2008.In the follow experimental results, we select 80% trajectories in the dataset as a training dataset to infer the parameter and build the Markov model randomly, and the remainder 20% trajectories were used to estimate the coverage, prediction time, prediction error, and prediction accuracy.

The Result of Trajectory L-Z Entropy
To evaluate trajectory regularity, we divide every day into twelve periods, and then compute the average trajectory L-Z entropy for each period of time on weekend and weekday, respectively.
The results in Figure 4a clearly show that the trajectory L-Z entropies of twelve periods conform to the taxi traveling path, i.e., it is the go-to-work hours between 6:00 and 8:00, and the taxi traveling path is always regular from home to company, so the average L-Z entropy is the smallest.In Figure 4b, the standard deviation of L-Z entropy is stable.Consequently, trajectory entropy can be used to evaluate trajectory regularity.

Comparison of Various Grid Partitioning Strategies
Until now we have assumed a spatial iterative grid to represent the moving region.In this section, we investigate another grid partitioning strategy, a uniform grid partitioning.The moving region is constructed as a two-dimensional grid consisting of g g × square cells.The granularity of this representation is a cell, i.e., all the locations within a single cell are considered to be the same object.Each cell has the side length of 1 and adjacent cells have the distance of 1.The whole grid is

Comparison of Various Grid Partitioning Strategies
Until now we have assumed a spatial iterative grid to represent the moving region.In this section, we investigate another grid partitioning strategy, a uniform grid partitioning.The moving region is constructed as a two-dimensional grid consisting of g × g square cells.The granularity of this representation is a cell, i.e., all the locations within a single cell are considered to be the same object.Each cell has the side length of 1 and adjacent cells have the distance of 1.The whole grid is modelled as a graph where each cell corresponds to a grid in the graph.A trajectory can be represented as a sequence of grids according to the sequence of locations of the trajectory.
The prediction accuracy of STP-ME based on spatial iterative grid partitioning and uniform grid partitioning is given in Figure 5a, where g is the grid granularity of the uniform grid partition and n is the grid partition parameter of SIGP.A suitable value of n needs to be decided for our training dataset.On one hand, a large value of n may have very low prediction accuracy because the area covered by each grid cell is too large.On the other hand, it leads to more matching query trajectories since more trajectories may fall into identical cells, hence increasing prediction accuracy.A small value of n has the advantage of higher prediction accuracy that the small cell area brings, but training data becomes even sparser because fewer locations will lie in a same cell, making the task of destination prediction more difficult.Therefore, we need to find a balanced value of n which can achieve the best prediction accuracy.The optimal grid partition parameter n for our training dataset is selected to be 10 3 according to the global minimum point in Figure 5. Compared with the uniform grid, SIGP is able to achieve higher prediction accuracy with the increase of grid granularity.This is because, in a city, regions with dense trajectory coverage (e.g., Central Business District region) will be mapped to more cells with each cell having smaller areas.This improves the prediction accuracy of queries that involve these regions.The better result is given by SIGP because it achieves the most even distribution of points, which shows that SIGP has less information loss than that of the uniform grid.Figure 5b shows the standard deviation of prediction accuracy between uniform grid partition and SIGP.Standard deviation of both grid partition methods is not only small, but also stable.

The Comparison of STP-ME Algorithm with Baseline, 2-MM, and SubSyn Algorithms
To evaluate the performance of our STP-ME, we compare Prediction accuracy, prediction time, and coverage of 2-STMM with two approaches, namely, (i) the baseline algorithm coined from [7] which uses trajectory matching of historical visits; (ii) destination prediction using a second-order Markov model (2-MM) [18] to develop a second-order Markov chain model to predict the next grid that a user is likely to visit (see Figure 3a); and (iii) destination prediction by sub-trajectory synthesis (SubSyn) proposed by Xue et al. [10,11].The prediction accuracy is computed as the ratio between the number of correctly predicted trajectories and the total number of trajectories.Prediction time is the time used to predict the destination for one query trajectory online, and the coverage counts the number of query trajectories for which some destinations are provided.We use this property to demonstrate the difference in robustness between the baseline algorithm, 2-MM, SubSyn, and our STP-ME.
Figures 6a and 7a show the trend in both prediction time and prediction accuracy with respect to grid granularity.We compare the runtime performance of our STP-ME with that of the baseline

The Comparison of STP-ME Algorithm with Baseline, 2-MM, and SubSyn Algorithms
To evaluate the performance of our STP-ME, we compare Prediction accuracy, prediction time, and coverage of 2-STMM with two approaches, namely, (i) the baseline algorithm coined from [7] which uses trajectory matching of historical visits; (ii) destination prediction using a second-order Markov model (2-MM) [18] to develop a second-order Markov chain model to predict the next grid that a user is likely to visit (see Figure 3a); and (iii) destination prediction by sub-trajectory synthesis (SubSyn) proposed by Xue et al. [10,11].The prediction accuracy is computed as the ratio between the number of correctly predicted trajectories and the total number of trajectories.Prediction time is the time used to predict the destination for one query trajectory online, and the coverage counts the number of query trajectories for which some destinations are provided.We use this property to demonstrate the difference in robustness between the baseline algorithm, 2-MM, SubSyn, and our STP-ME.
Figures 6a and 7a show the trend in both prediction time and prediction accuracy with respect to grid granularity.We compare the runtime performance of our STP-ME with that of the baseline algorithm, 2-MM, and SubSyn in terms of online query prediction time.Since the information is stored during the offline training stage, STP-ME requires little extra computation when answering a user's query (10 µs), whereas the baseline algorithm requires too much time (100 ms) to predict.Our STP-ME, SubSyn, and 2-MM are at least four orders of magnitudes better, constantly.The reason is that the baseline algorithm is forced to make a full sequential scan of the entire trajectory space to compute the posterior probability, whereas the other algorithms can extract most transition probability values from the visit probability distribution and a matrix of transition probabilities directly.It is worth mentioning that grid granularity has little influence on the prediction time of our STP-ME, SubSyn, and 2-MM.The predition accuracy of our STP-ME, SubSyn, and 2-MM has a slight rise with the increase of grid granularity and reaches the peak at n = 10 3 .The prediction accuracy of our STP-ME is about 8%, 3%, and 4% higher than that of the baseline algorithm, 2-MM, and SubSyn, respectively.For the baseline algorithm, the number of query trajectories which have sufficient destinations drops slightly as the grid granularity increases due to the fact that more trajectories in the training dataset may fall into different grids, so query trajectories are less likely to have a partial match in the trajectory space.Meanwhile, the prediction accuracy of STP-ME (48%), SubSyn (44%), and 2-MM (45%) are stable and ascend steadily.From Figure 6b, the standard deviation of prediction time on our STP-ME, SubSyn, and 2-MM is almost 0. In Figure 7b, the standard deviation of prediction accuracy obtained by our STP-ME is the smallest and stable, which means its prediction accuracy is effective and stable.Apart from the huge advantage of STP-ME in prediction time and prediction accuracy, its coverage and prediction error are comparable with that of the baseline algorithm.Figure 8 shows the coverage and prediction error versus the percentage of the trip completed.For the baseline algorithm, the amount of query trajectories for which sufficient predicted destinations are provided decreases as the trip completed increases due to the fact that longer query trajectories (i.e., higher trip completed Apart from the huge advantage of STP-ME in prediction time and prediction accuracy, its coverage and prediction error are comparable with that of the baseline algorithm.Figure 8 shows the coverage and prediction error versus the percentage of the trip completed.For the baseline algorithm, the amount of query trajectories for which sufficient predicted destinations are provided decreases as the trip completed increases due to the fact that longer query trajectories (i.e., higher trip completed Apart from the huge advantage of STP-ME in prediction time and prediction accuracy, its coverage and prediction error are comparable with that of the baseline algorithm.Figure 8 shows the coverage and prediction error versus the percentage of the trip completed.For the baseline algorithm, the amount of query trajectories for which sufficient predicted destinations are provided decreases as the trip completed increases due to the fact that longer query trajectories (i.e., higher trip completed percentage) are less likely to have a partial match in the training dataset.Specifically, when trip completed percentage increases towards 90%, the coverage of the baseline algorithm decreases to almost 25%.Our STP-ME successfully copes with it as expected, with only an unnoticeable drop in coverage, and can constantly answer almost 100% of query trajectories.This proves that the baseline algorithm cannot handle long trajectories because the chances of finding a matching trajectory decrease when the length of a query trajectory grows.For the baseline algorithm, despite the negative influence of the coverage problem, its prediction error also increases as the trip completed percentage increases for a simple reason.When the baseline algorithm fails to find adequate predicted destinations, we use the current node in the query trajectory as the predicted destination.For STP-ME, getting closer to the true destination means that there are fewer potential destinations and, intuitively, the prediction error reduces.It is observed that STP-ME outperforms the baseline algorithm throughout the progress of a trip.

Conclusions
In this paper, we have proposed STP-ME to conduct sparsity trajectory prediction.STP-ME uses an L-Z entropy estimator to compute a trajectory's L-Z entropy and provides trajectory synthesis based on a trajectory's L-Z entropy.Lastly, by combining location entropy and time entropy, STP-ME uses a second-order Markov model to predict the destination.Experiments based on real datasets have shown that the STP-ME algorithm can predict destinations for almost all query trajectories, so it has successfully addressed the data sparsity problem.Compared with the baseline algorithm and 2-MM, STP-ME has higher prediction accuracy.At the same time, the STP-ME requires less time to predict and runs over four orders of magnitudes faster than the baseline algorithm.

Conclusions
In this paper, we have proposed STP-ME to conduct sparsity trajectory prediction.STP-ME uses an L-Z entropy estimator to compute a trajectory's L-Z entropy and provides trajectory synthesis based on a trajectory's L-Z entropy.Lastly, by combining location entropy and time entropy, STP-ME uses a second-order Markov model to predict the destination.Experiments based on real datasets have shown that the STP-ME algorithm can predict destinations for almost all query trajectories, so it has successfully addressed the data sparsity problem.Compared with the baseline algorithm and 2-MM, STP-ME has higher prediction accuracy.At the same time, the STP-ME requires less time to predict and runs over four orders of magnitudes faster than the baseline algorithm.

Figure 1 .
Figure 1.Related GPS points are divided into different grids.

Figure 1 .
Figure 1.Related GPS points are divided into different grids.

Figure 1 .
Figure 1.Related GPS points are divided into different grids.
Average L-Z entropy (b) Standard deviation of L-Z entropy

Figure 4 .
Figure 4. Trajectory average L-Z entropy (a) and standard deviation of L-Z entropy (b) on different time of weekday and weekend.

Figure 4 .
Figure 4. Trajectory average L-Z entropy (a) and standard deviation of L-Z entropy (b) on different time of weekday and weekend.

Figure 5 .
Figure 5. (a) Prediction accuracy of STP-ME based on spatial iterative grid partitioning and uniform grid partitioning; (b) standard deviation of prediction accuracy.

Figure 5 .
Figure 5. (a) Prediction accuracy of STP-ME based on spatial iterative grid partitioning and uniform grid partitioning; (b) standard deviation of prediction accuracy.

Figure 6 .Figure 7 .
Figure 6.(a) Prediction time of different grid granularity for Baseline, 2-MM, SubSyn and STP-ME; (b) and standard deviation of prediction time.

Figure 7 .
Figure 7. (a) Prediction accuracy of different grid granularity for baseline, 2-MM, SubSyn and STP-ME; (b) standard deviation of prediction accuracy.

Figure 8 .
Figure 8.(a) Coverage and (b) prediction error versus the percentage of trip completed for baseline, 2-MM, SubSyn, and STP-ME.

Figure 8 .
Figure 8.(a) Coverage and (b) prediction error versus the percentage of trip completed for baseline, 2-MM, SubSyn, and STP-ME.
{ , , , , } m T t t t t =  be a set of predefined timeslots in a day.Thus, =  ( ) {( , ),( , ), ,( , )} k k tra u t g t g t g denotes a finite set of historical grids with temporal information visited by user u.Assuming Table 1 represents statistics of historical visit behaviors of all users, Table 1a corresponds to trajectories' historical visits to grids without considering temporal information, and Table 1b corresponds Definition 1.Given a finite set of grids with time visited by trajectories, the visit probability, denoted by ( , ) i g in terms of two component probabilities coined as (i) grid featurecorrelated visit probability (GVP), and (ii) temporal feature-correlated visit probability (TVP).GVP of a grid
(a) General Grid Visit.

Table 3 .
Visit probabilities.A second-order Markov chain model is a discrete stochastic process with limited memory in which the probability of visiting a grid g w during timeslot t j+1 only depends on tags of two grids visited during timeslots t j , t j−1 .