LSTM-Based Deep Learning Model for Predicting Individual Mobility Traces of Short-Term Foreign Tourists

: The increasing availability of trajectory recordings has led to the mining of a massive amount of historical track data, allowing for a better understanding of travel behaviors by revealing meaningful motion patterns. In the context of human mobility analysis, the problem of motion prediction assumes a central role and is beneﬁcial for a wide range of applications, including for touristic purposes, such as personalized services or targeted recommendations, and sustainability studies related to crowd management and resource redistribution. This paper tackles a particular case of the trajectory prediction problem, focusing on large-scale mobility traces of short-term foreign tourists. These sparse trajectories, short and non-repetitive, lack spatial and temporal regularity, making prediction analysis based on individual historical motion data unreliable. To face this issue, we hereby propose a deep learning-based approach, taking into account the collective mobility of tourists over the territory. The underlying semantics of motion patterns are captured by means of a long short-term memory (LSTM) neural network model trained on pre-processed location sequences, aiming to predict the next visited place in the trajectory. We tested the methodology on a real-world big dataset, demonstrating its higher feasibility with respect to traditional approaches.


Introduction
Human mobility analysis has gained increasing popularity due to the recent growth in people's location information availability in the form of massive trajectory data sets.Motion behaviors can be passively collected by mobile phones in terms of cell tower connection or GPS signal, or even actively shared by users on social media platforms.These large volumes of geo-located data enable the opportunity to reveal and integrate motion patterns in a wide variety of contexts [1,2], from recommendation systems [3,4] to mobility modeling applications for smart city and smart enterprise [5,6].
The rise of positioning technology and motion data availability has particularly boosted location prediction analysis, which has become a very active research area in the big picture of location-based services.Location prediction is interpreted as inferring the short-term future location of an individual, leveraging his/her current place, past motion activity, and possibly additional side information.Depending on the context, it may imply very different problems and approaches, comprising motion flow modeling [7][8][9], individual large-scale mobility analysis [10][11][12], and very fine resolution systems [13][14][15].
While the majority of works dealing with the prediction of individual mobility traces are set in contexts with a high level of spatial and temporal regularity (e.g., motion activity of users in everyday life), our paper contributes to extend trajectory prediction analysis in the opposite direction, when individual motion regularity is lacking due to the non-repetitiveness of single mobility traces.
Our focus and intended application is related to tourists' mobility within the growing field of smart tourism.Smart tourism integrates tourism resources with information technologies to design intelligent services to provide valuable outcomes to tourists and tourism-related industries.The development of smart tourism is particularly embodied in four main aspects, namely tourism experience, tourism management, tourism service, and tourism marketing [16][17][18][19].The tracking and recording activity of space-time paths of individual tourists is inserted in this big wave of tourism mining, not as an ultimate purpose, but as a mean of providing valuable knowledge of tourists' mobility and travel behaviors.However, although spatial-temporal trajectory data have been widely utilized in studies of tourists' behavior, their use has been mainly limited to descriptive purposes at the level of clustering and pattern analysis [20][21][22][23].But if forestalling actions require consideration, predictive investigations become an essential tool.
Our case study targets short-term tourists in a foreign country.Foreign tourism is major source of income for the tourism industry and it is an area of investigation for public and private organizations.Most destination strategies define measures specifically designed for foreign tourists, which have different behaviors and spending patterns compared to domestic users.For this reason, the unfolding of their tourism experience is used to understand and possibly leverage the insights to improve tourism policies and decision-making.
While in everyday life a person's mobility is described by a significant probability of returning to a limited number of highly frequented locations (e.g., home and workplace) [24][25][26], the natural characterization of foreign tourists' motion behavior is based on short and non-repetitive trajectories of users moving in areas they have never been to.The lack of individual historical location data leads methods relying on a set of individual pre-recorded motion trajectories to performing poorly when applied to traces covering areas that are unfamiliar to the user; a prediction algorithm solely based on a sequential approximation of a single probability distribution is not effective in this case.In addition, the focus on large-scale mobility often implies a very wide territory, introducing further problems such as trajectory data sparseness and a multitude of locations, involving the curse of dimensionality.
The proposed method aims to overcome these issues with the use of a deep learning-based approach that leverages the collective mobility of users over the territory.The method consists of a long short-term memory (LSTM) neural network trained on pre-processed location sequences to learn the underlying patterns of tourists' motion activity.Original traces are first transformed into discrete location sequences, and are subsequently fed into a deep neural network model composed of embedding and LSTM layers.The model captures motion patterns directly from mobility traces, without requiring any manual feature extraction.Each individual user's mobility prediction is therefore based on the collective analysis of tourists' behavior over the territory.For a wider application in various contexts, we do not resort to any additional information besides the users' motion traces, since useful secondary information is not available in many cases.In this way, the model can be applied to a variety of geo-located data types, as long as the recorded positional data generated by users can be properly organized into mobility traces in the form of sequences of locations.
Experiments on a real-world large-scale big dataset prove the higher feasibility of our forecasting method with respect to traditional approaches in this mobility regime, standing out as a potentially beneficial methodology for many real-life applications, including touristic services for personalized recommendations, targeted advertisement, and sustainability studies related to crowd management and resource redistribution.In general, this study contributes to the expansion of tourists' mobility analysis in the direction of actively integrating artificial intelligence into the tourism sector.

Related Work
The rise of motion data availability has boosted the interest in human mobility analysis, establishing various methods for trajectory data mining [27,28] to either describe the observable motion behavior [29] or to predict future activities [30].
Location prediction has a central role in human mobility analysis and is applied to numerous tasks such as crowd management, congestion prediction, transportation planning, and place recommender systems [31,32].In the past few years, plenty of predictive models have been suggested, leveraging various methods including Markov models [33,34] and data mining approaches [35][36][37].Previous research on location prediction can be roughly split into two broad groups: motion regularity-based methods and multiple mobility-based methods.
The first group is based on the regularity of individual user's motion history.Since most people tend to follow regular motion patterns in daily life, often returning to the same few locations, their personal past mobility is a valuable factor to predict their future trajectories [24][25][26].Therefore, the majority of works on predicting a person's next visited location rely on historical motion data collected from this person exclusively, evaluating the regularity patterns in human mobility by learning individual, frequent traveling routes [38,39].In this sense, the most common approach is the use of Markov models, representing locations as states and movement between locations as transitions [11,12,40].States are defined by partitioning space into grids or reference points, and transition probabilities are defined by counting each user's transitions, identifying the most likely next destinations for each current location.This type of model achieves good performances in the presence of long, pre-recorded motion trajectories of the particular user under study.
The second group comprises methodologies combining individual past locations with collective motion information from multiple users.A subgroup is represented by collaborative filtering to find similarities among users' preferences in frequently visited destinations [41].This includes methods for classifying users' preferences into point of interest categories [42] and recommendation systems based on generic, top interesting places or personalized location matching [43].Another subgroup focuses on geographical elements, predicting the next locations based on the definition of features for each place and the relationships between places.These methodologies do not model individual preferences or similar preferences among users, but make predictions by using geographical statistics [44,45].A final subgroup includes motion pattern mining techniques and prediction algorithms combining individual current movements with historical collective data to find frequent patterns and co-occurrences of locations.The methods comprise ensemble probabilistic algorithms [46,47], feature-based machine learning methodologies [48,49], and deep learning models [50,51] to predict users' locations over time, based on individual and collective behaviors.
In general, when people rarely share their history of past visited places with other users, location prediction methods based on previously seen locations of an individual user are likely to be chosen over other methodologies.However, in the case of irregular individual motion patterns, short data history users, and non-repetitive mobility behaviors, prediction algorithms approximating single probability distributions are not reliable and multiple mobility-based methods may be preferred.Moreover, it is worth mentioning that a large number of methods enrich trajectories with further context data, such as prior knowledge of motion information (e.g., acceleration, orientation) [11], external data (e.g., weather, social media analysis) [52,53], or user-specific features (e.g., home and workplace, user specific preferences) [44,[54][55][56][57].In these cases, the main disadvantage is of a practical nature, since secondary information is often insufficient or not available.
Over the last decades, academics and practitioners have increasingly approached the study of tourists' movements [20,58,59] and how to guide practical measures based on these findings [60][61][62].Most studies focused on mapping and modeling movements between locations [21,63], as tourist destinations are involved in a complementary relationship [64,65].These include travel itinerary models [66] and spatial pattern examination of travel flows [67,68], often leveraging a variety of measures within the study framework [21,69].Only few studies, however, exclusively involved international visitors [70,71].While the interest in mining movement patterns of tourists has been prominent, and studies are developing fast for collectively estimating the overall amount of visitors within single destinations [72], the explicit prediction of individual short-term tourists' mobility traces still requires further expansion, being mainly based on Markov approaches for modeling location transitions [47,58,59].
This paper therefore introduces a deep learning model to predict individual trajectories of short-term foreign tourists.Its characteristics comprise: leveraging the collective mobility of people to predict individual traces, falling in the category of multiple mobility-based algorithms; learning mobility patterns without any manual feature extraction or secondary context data by simply feeding the model with sequences of locations, from a purely data-driven perspective; explicitly designed to predict the next location of a user, specifically when a very short data history is known about that user.The use of LSTM is tested in this particular mobility regime of short and non-repetitive traces to assess its feasibility when applied to large-scale movements of visitors in a foreign country.

Methodology
The proposed prediction method aims to model patterns hidden in the historical motion data of multiple people, in order to identify the most likely future movement of an individual user.Given a short mobility trace sampled at a given time step, the solution of our model consists of inferring the future visited location in the next time step.This section reports the details of the proposed methodology, from trajectory pre-processing to deep learning modeling.

Trajectory Pre-Processing
The first step of the path from original mobility traces to location prediction is characterized by trajectory discretization, a pre-processing phase transforming raw traces into the input for the neural network model.
An original mobility trace is described by a series of chronologically ordered track points T = p i i = 1, 2, 3, . . ., N , generated by an individual user, whereby each point is defined by a coordinate pair enriched with a time stamp p i = (lon i , lat i , t i ).The trajectory discretization task consists of aggregating continuous values of longitude and latitude into discrete locations and transforming the continuity of time into fixed time steps.This results in a pre-processed trajectory in the form of a sequence of locations (LOC 1 , LOC 2 , . . ., LOC N ), where, given a time step unit t, locations refer to time (t, 2t, . . ., Nt).Time information is therefore encoded in the position along the sequence and the location associated to each time step is chosen as the one identified by the majority of track points recorded within that time period.The length of the time step is case specific, depending on the data source and the prediction problem: a short unit increases fragmentation in the presence of discontinuous traces and low time resolution data, a long unit may compromise a proper trajectory representation affecting prediction results.Moreover, even spatial resolution varies according to the data source, and may be further discretized (e.g., through clustering, reference point definition, and grid-based approaches) in relation to the time resolution and the specific purpose of different applications (e.g., prediction of motion traces over a whole country or mining city-level mobility).This is particularly suggested when trajectories are very sparse and there are many locations with only very few occurrences.In addition, because human mobility is not generally uniformly distributed over the territory, locations that are potentially inaccessible or irrelevant should be discarded; only those locations that are seen by a sufficient amount of people should be considered, avoiding bias samples in the data and worthless computational effort.The result should consist of a set of fixed points (or areas) over the territory, each of them associated with a particular unique identifier.A pre-processed trajectory is made of a sequence of these discrete locations unfolding in fixed time steps.

Deep Learning Model for Trajectory Prediction
The collection of the pre-processed trajectories from multiple users, in the form of sequences of unique location identifiers, is used as input data to the deep neural network model.The model is made of three building blocks: an embedding layer, a block of one or more LSTM layers, and a softmax layer.Each location identifier is initially associated to a particular corresponding embedding vector, encoding input trajectories into sequences of embeddings that are subsequently fed to the LSTM block, made of stacked LSTM neural network layers.The final trajectory representation, output vector of the last LSTM layer, becomes the input of a softmax layer for generating the probability distribution of the next predicted location in the trace.A graphic exemplifying overview of the whole model, with a block of two LSTM layers, is illustrated in Figure 1.vector, encoding input trajectories into sequences of embeddings that are subsequently fed to the LSTM block, made of stacked LSTM neural network layers.The final trajectory representation, output vector of the last LSTM layer, becomes the input of a softmax layer for generating the probability distribution of the next predicted location in the trace.A graphic exemplifying overview of the whole model, with a block of two LSTM layers, is illustrated in Figure 1.

Embedding Layer
To limit the problems of the curse of dimensionality, trajectory sparseness, and computational inefficiency, we replace traditional representations such as one-hot by associating each discrete location with a low-dimensional dense vector (embedding).This is done by means of an embedding layer, transforming sequences of discrete location identifiers into sequences of dense vectors before they are fed to the LSTM block, as depicted in Figure 2. In particular, each location is initially defined by a random vector of a pre-defined size, whose values are updated during the training process; just like other model parameters, embeddings are tweaked, through backpropagation, on the basis of the prediction outcomes.Over training, they assume a meaningful mathematical representation as vectors of continuous values, whereby locations that are often co-occurring in the same traces share similar representations in this embedding space.

Embedding Layer
To limit the problems of the curse of dimensionality, trajectory sparseness, and computational inefficiency, we replace traditional representations such as one-hot by associating each discrete location with a low-dimensional dense vector (embedding).This is done by means of an embedding layer, transforming sequences of discrete location identifiers into sequences of dense vectors before they are fed to the LSTM block, as depicted in Figure 2. In particular, each location is initially defined by a random vector of a pre-defined size, whose values are updated during the training process; just like other model parameters, embeddings are tweaked, through backpropagation, on the basis of the prediction outcomes.Over training, they assume a meaningful mathematical representation as vectors of continuous values, whereby locations that are often co-occurring in the same traces share similar representations in this embedding space.

LSTM Block
The next stage consists of the LSTM block.LSTM [73] is a complex recurrent neural network type, whose repeating module is composed of four different neural networks interacting between each other.The network processes an input sequence one element at a time, receiving, at each step, two sources of input data: the current vector of the data sequence concatenated with the output vector of the network module at the previous step.The information flows through the network modules, encoded in the cell state, and is modified by the four neural network structures until the end of the sequence is reached.The output at the last step is the final vector characterization of the sequence, which is subsequently used for the actual prediction task.If the LSTM block contains multiple LSTM layers, the final trajectory vector is represented as the output, at the last step, of the last layer.In general, the first LSTM layer is fed with the input sequence, the second layer is fed with the output of the first layer, and so on.Figure 3

LSTM Block
The next stage consists of the LSTM block.LSTM [73] is a complex recurrent neural network type, whose repeating module is composed of four different neural networks interacting between each other.The network processes an input sequence one element at a time, receiving, at each step, two sources of input data: the current vector of the data sequence concatenated with the output vector of the network module at the previous step.The information flows through the network modules, encoded in the cell state, and is modified by the four neural network structures until the end of the sequence is reached.The output at the last step is the final vector characterization of the sequence, which is subsequently used for the actual prediction task.If the LSTM block contains multiple LSTM layers, the final trajectory vector is represented as the output, at the last step, of the last layer.In general, the first LSTM layer is fed with the input sequence, the second layer is fed with the output of the first layer, and so on.Figure 3  Equations ( 1)-( 6) report the formulas describing the functioning of a repeating module of LSTM, given an input vector  ; the forget gate (1) defines the information to be deleted from the cell state; the input gate (2) decides which values to update; the tanh network (3) determines a vector of new values to be added to the state; the new cell state ( 4) is obtained by filtering the old cell state through the forget gate, and by adding the combination outcome between the input gate and the tanh network; the output gate (5) defines which parts of the cell state to output; and the final LSTM output Equations ( 1)-( 6) report the formulas describing the functioning of a repeating module of LSTM, given an input vector x t ; the forget gate (1) defines the information to be deleted from the cell state; the input gate (2) decides which values to update; the tanh network (3) determines a vector of new values to be added to the state; the new cell state (4) is obtained by filtering the old cell state through the forget gate, and by adding the combination outcome between the input gate and the tanh network; the output gate (5) defines which parts of the cell state to output; and the final LSTM output (6) results from the multiplication between the output gate and the tanh of the new cell state.

Softmax Layer
The predicted next location is explicitly disclosed by means of a softmax layer on top of the LSTM block.The softmax layer is a simple, fully-connected neural network followed by a softmax activation function.It receives the final trajectory vector characterization as an input, and outputs the predicted probability distribution for the next potential location, as shown in Figure 4.

Softmax Layer
The predicted next location is explicitly disclosed by means of a softmax layer on top of the LSTM block.The softmax layer is a simple, fully-connected neural network followed by a softmax activation function.It receives the final trajectory vector characterization as an input, and outputs the predicted probability distribution for the next potential location, as shown in Figure 4. Equation ( 7) reports the description of the softmax layer, where ℎ represents the output of the last LSTM layer at the last step and _ is the total number of locations.

Model Training
Prior to being fed into the neural network model, location sequences are scanned by a sliding window, determining the training features and the target variable.The window moves forward by one location until the end of each sequence, defining multiple segments of fixed length as input sequences to the deep learning model.The segment length represents the amount of past motion activity taken into account for learning to predict the future location (e.g., predicting the next location based on the last six hours of a user's mobility).Its choice, besides strongly depending on the applications and dataset restrictions, is closely related to the time resolution of the sequence, whereby a higher time resolution determines a larger number of locations describing the past motion activity.
The deep learning model is fed with a collection of these segments, where, for example, a Equation ( 7) reports the description of the softmax layer, where h last represents the output of the last LSTM layer at the last step and n_LOC is the total number of locations.

Model Training
Prior to being fed into the neural network model, location sequences are scanned by a sliding window, determining the training features and the target variable.The window moves forward by one location until the end of each sequence, defining multiple segments of fixed length as input sequences to the deep learning model.The segment length represents the amount of past motion activity taken into account for learning to predict the future location (e.g., predicting the next location based on the last six hours of a user's mobility).Its choice, besides strongly depending on the applications and dataset restrictions, is closely related to the time resolution of the sequence, whereby a higher time resolution determines a larger number of locations describing the past motion activity.
The deep learning model is fed with a collection of these segments, where, for example, a window length equal to four locations would define a sequence (LOC t−3 , LOC t−2 , LOC t−1 , LOC t ) as input features to the model and the location LOC t+1 as the target variable.The model training maximizes the log probability, with respect to the weights of every layer (embedding, LSTM, and softmax), of observing the correct next location, given the sequence of past locations.The process relies on backpropagation and mini-batch stochastic training to determine in which direction the weights are adjusted.
The prediction of a location sequence is therefore based on the collective historical mobility of people, identifying the most likely next location as the one having the highest probability according to the output of the model.

Experiment
The current section introduces the dataset used for the prediction task and reports the description and results of the experiments conducted.A particular focus is given to the evaluation of results, which are compared to traditional approaches and are analyzed according to different motion characteristics.The proposed model was implemented and executed on TensorFlow (Google Brain, Mountain View, CA, USA), using AWS EC2 p3.2xlarge GPU instance.

Dataset
To properly describe the general large-scale motion activity of foreign tourists, we used a real-world dataset comprising seven months of anonymized mobile phone call detailed records (CDRs) of roamers in Italy.In order to present meaningful findings, it is indeed important, especially when dealing with wide territories, to make use of a sufficiently large and complete dataset, whose trajectories redundantly cover the study area.CDRs have been widely used in human mobility studies [74][75][76][77], reporting the detected mobile phone activities enriched with the time stamp and the position of the device in terms of the coverage area of the principal antenna.We only took into account short-term visitors, recorded to be located in the country for a maximum of two weeks.In addition, we discarded those users that appeared to be completely stationary.Foreign visitors' mobility was therefore represented by short traces and non-repetitive behaviors.
The erratic profile of mobile activity, represented by sparse connection events, may critically fragment mobility traces, making it difficult to create continuous location sequences.To limit the fragmentation problem and define proper trajectories, we pre-processed traces into sequences unfolded in 1 h time step; the prediction problem is formulated as predicting the location of a user in the next hour.In particular, if more than one track point was recorded in the same hour, the location associated to the majority of those recordings was chosen to identify the current position of the user.Given the wide territory, the choice of the time step unit, and our focus on large-scale movements, a minimum spatial resolution of 2 km was selected.Reference points were defined as the antennas subjected to the highest number of connections within the minimum spatial resolution, projecting the other ones to the closest reference point.Furthermore, we discarded very rare locations, identified by just a few tens of recorded events.Being mostly randomly visited, they are not significantly involved in the overall travel behavior of foreign visitors in Italy.Nevertheless, specific characteristics of different datasets may provide an influence on parameters such as time and space resolution, and a choice of different values can be suitable for different applications.
The final dataset consists of 1 h encoded sequences of 5903 possible unique locations over the Italian territory.To appropriately focus on short motion behaviors and to make complete and proper utilization of the dataset, represented by relatively short continuous traces, we set a window length equal to 6 h (6 locations), determining a total of 13 million trajectory segments (with a median displacement per segment of 36.1 km) generated by 1.4 million users.We believe this large amount of data is representative of the overall real motion behavior of foreign tourists.

Experimental Settings
We designed the neural network model using an embedding size of 100 dimensions and a block of two LSTM layers having a hidden size of 4000 neurons each.The training process was based on cross-entropy cost function, mini-batches, and Adam optimizer [78].To evaluate the performance of the model on previously unseen data, we randomly split the dataset into a training set and a test set, containing 80% and 20% of the users, respectively.
For a better evaluation of the results, we compared the achieved prediction accuracy with traditional approaches involving the use of Markov modeling, which is widely applied in location prediction problems.Locations are represented as states and movements between locations as state transitions.The creation of a transition matrix identifies the most likely next destinations for each current location [33]

Results
Table 1 reports the comparison results in terms of accuracy and accuracy in top 3 (if the correct label corresponds to one of the top three predicted locations, the accuracy is 1, otherwise it is 0; the result is the average for each testing trajectory).Our model (LSTM) outperformed the Markov approaches, yielding a 5% improvement compared to the best baseline, the global Markov model (GMM), 10% improvement compared to the variable-order Markov model (VGMM), and 33% to the personal Markov model (PMM).The accuracy in top 3 confirmed this trend, showing a 7% improvement of our model with respect to GMM, 8% to VGMM, and 47% to PMM.Reasonably, PMM, which was solely based on individual mobility and ignored the collective motion behavior, had the lowest scores in this regime of short and non-repetitive traces.GMM and VGMM, which considered the collective mobility of all users, greatly improved performances, with the first-order model surpassing the variable-order model.LSTM determined a further increment, exceeding the best baseline of 2.5 percentage points in terms of accuracy and 5 percentage points in terms of accuracy in top 3.
Moreover, we analyzed how different trajectory characteristics affect prediction.The idea was to evaluate the influence of different values of motion features, such as the traveled distance and radius of gyration, on the prediction performances.
Table 2 shows the accuracy and accuracy in top 3 (in brackets) for different values of traveled distance within six hours prior to prediction.Five bins were selected: ≤10 km, 10-25 km, 25-50 km, 50-100 km, and ≥100 km.Comparing accuracy, despite an overall tendency of decreasing performance when the traveled distance increases, PMM always performed very poorly, while GMM and VGMM achieved remarkable results for mid and short distances, respectively.In particular, GMM substantially outperformed VGMM for mid-range values (10-100 km), but was overcome by the latter for very short distances (≤10 km).LSTM always exceeded every baseline, even if it only slightly outperformed GMM for mid-short distance values (10-50 km).It is worth noticing how LSTM largely overcame the other methods for very long distances (≥100 km).Moreover, its accuracy in top 3 was consistently much higher than every baseline for each distance bin.Table 3 reports the accuracies for different values of radius of gyration (ROG), in bins of ≤3 km, 3-10 km, 10-32 km, and ≥32 km.These results reinforce the observations reported in the previous case, such as the general tendency of decreasing performance as the ROG value increases, the overall poor achievements of PMM, the good results of VGMM for very small values (≤3 km), and the remarkable performance of GMM for mid-range values (3-32 km).Again, LSTM always outperformed the baselines, only slightly beating the GMM accuracy for the 3-10 km bin, but greatly overcoming the other methods for very large ROG values (≥32 km).As in the traveled distance case, its accuracy in top 3 was consistently much higher than the baselines for each of the ROG bins.In addition, we observed the prediction variability at different hours of the day.Figure 5 displays the accuracy and accuracy in top 3 of the four methods over time, starting from midnight.Rush hours in the afternoon appeared to be more predictable than the ones in the morning, while accuracies significantly increased in the evening and night due to the higher regularity of mobility patterns during these hours.LSTM was shown to outperform the baselines for every hour of the day.
In addition, we observed the prediction variability at different hours of the day.Figure 5 displays the accuracy and accuracy in top 3 of the four methods over time, starting from midnight.Rush hours in the afternoon appeared to be more predictable than the ones in the morning, while accuracies significantly increased in the evening and night due to the higher regularity of mobility patterns during these hours.LSTM was shown to outperform the baselines for every hour of the day.Performances were further explored based on the imbalance of the dataset, by evaluating results corresponding to popular and rare locations.Table 4 reports the accuracies for different ranges of location occurrences in the data, defining frequently visited locations and less visited ones.The Performances were further explored based on the imbalance of the dataset, by evaluating results corresponding to popular and rare locations.Table 4 reports the accuracies for different ranges of location occurrences in the data, defining frequently visited locations and less visited ones.The columns from left to right identify specific groups of locations, where each location of each group represents, respectively, over 0.5% of the whole dataset, between 0.1% and 0.5%, between 0.05% and 0.1%, and less than 0.05%.As expected, there is a general drop of performance when passing from popular locations to rare ones.However, the superiority of LSTM is once again clearly exhibited.Finally, we focused on the prediction errors to study the performance of our model in the particular case when it was not able to correctly identify the future visited location.We compared LSTM with GMM, the best baseline in terms of accuracy, to assess how their predicted locations differed when a misprediction occurred in both models.Figure 6 reports the bar graphs representing the error distance distribution of the segments that are wrongly predicted by both models.The error distance was calculated as the absolute distance between the wrongly predicted location and the real future location (to calculate the error distance of wrong predictions in top 3, we considered the predicted location, within the first three, having the shortest distance with the real location).The bar graphs highlight the overall tendency of LSTM to make mistakes with a shorter error distance than GMM.
distance was calculated as the absolute distance between the wrongly predicted location and the real future location (to calculate the error distance of wrong predictions in top 3, we considered the predicted location, within the first three, having the shortest distance with the real location).The bar graphs highlight the overall tendency of LSTM to make mistakes with a shorter error distance than GMM.We also studied the difference of error distance between the two prediction models, analyzing the corresponding mispredictions on the same segment.The bar graphs in Figure 7 display the subtraction _() − _() for wrong predictions and wrong We also studied the difference of error distance between the two prediction models, analyzing the corresponding mispredictions on the same segment.The bar graphs in Figure 7 display the subtraction error_distance(GMM) − error_distance(LSTM) for wrong predictions and wrong predictions in top 3; a negative value indicates that the baseline provided a shorter error distance on a wrongly predicted segment; a positive value is in favor of our model.As depicted by the high bars on the right part of both graphs, there were a remarkable number of samples on which GMM tended to make prediction mistakes in the order of a few tens of km more than LSTM.Overall, our model, besides the higher prediction accuracy, also presented better results in terms of the shortest error distance.predictions in top 3; a negative value indicates that the baseline provided a shorter error distance on a wrongly predicted segment; a positive value is in favor of our model.As depicted by the high bars on the right part of both graphs, there were a remarkable number of samples on which GMM tended to make prediction mistakes in the order of a few tens of km more than LSTM.Overall, our model, besides the higher prediction accuracy, also presented better results in terms of the shortest error distance.

Discussion
We proposed a method to predict individual mobility traces of short-term foreign tourists leveraging the collective large-scale motion behavior of people and a deep learning-based methodology adapted to process motion trajectories.The model relies on a recurrent neural network architecture composed of embedding and LSTM layers.We assessed the feasibility of such methodology on short, non-repetitive traces, revealing its potentiality for human mobility studies and applications.
In particular, our method was shown to outperform the widely used Markov model approaches based on location transition probabilities.The results reported how a probabilistic approach built on

Discussion
We proposed a method to predict individual mobility traces of short-term foreign tourists leveraging the collective large-scale motion behavior of people and a deep learning-based methodology adapted to process motion trajectories.The model relies on a recurrent neural network architecture composed of embedding and LSTM layers.We assessed the feasibility of such methodology on short, non-repetitive traces, revealing its potentiality for human mobility studies and applications.
In particular, our method was shown to outperform the widely used Markov model approaches based on location transition probabilities.The results reported how a probabilistic approach built on the motion behavior of a single individual performs very poorly in this mobility regime, proving the need for collective motion information.This collective mobility, however, consists of non-repetitive traces that clearly influence prediction performances; the simpler first-order Markov model generally overcame the variable-order model based on the longest common suffix.LSTM, specifically designed to find patterns along series, outperformed every baseline, demonstrating a higher capability of correctly predicting individual mobility traces, represented as ordered sequences of locations.
We also observed how predictability varied for different trajectory characteristics.Despite the general tendency of decreasing performances for longer traveled distances and larger explored areas (local movements were more predictable than long-distance movements), our model always achieved a better accuracy than the baseline approaches.Reasonably, local movements rely on a restricted set of likely future locations, whereas long-distance movements are more unpredictable since the broad explored area could determine a large number of possible future visited locations.However, our model achieved the largest accuracy gap over the baselines exactly in correspondence of very high values of traveled distance and ROG, showing a particular potential for long distances and large covered areas.Moreover, its accuracy in top 3 was always significantly higher than the other models independently from trajectory characteristics.This also includes predictability over time, where results were split on the basis of the hour of the day.Besides the fact that our methodology constantly performed better than the comparison methods, we observed that rush hours in the morning were generally less predictable than rush hours in the afternoon.This is caused by the fact that the traces preceding the early morning hours contain less meaningful past information with regard to future activities.Due to the higher stationarity and regularity (individual and collective) during the night hours, trajectories sharing the same locations during the night can easily lead to different destinations in the morning; therefore, the recent past motion activity becomes less important in predicting the next location.However, the recent past visited locations gain more importance for predicting the afternoon hours because they carry information about motion behavior in the morning, which is more often meaningful and indicative of future movements.Finally, predictability increases in the night due to the intrinsic higher regularity of mobility patterns during these hours, which is also represented by the better performance of the variable-order Markov model over the first-order model in the late night and morning hours, and in correspondence of small values of traveled distance and ROG.
Furthermore, another meaningful performance indicator was defined by assessing the results in relation to the class imbalance, to observe how the model behaves with respect to frequent locations and rare locations.While better results were expected in correspondence to those locations that are often visited, it was worth verifying that the model did not totally drop in performance for very rare locations.In general, besides a tendency to obtain very accurate predictions for popular locations, LSTM was shown to still outperform the baselines, achieving acceptable results even for very rare locations.
Another meaningful matter to mention is related to the prediction error.While the main goal is to correctly detect the next location, it is also important, when the prediction is wrong, to assess how wrong it is.Comparing our model with the best baseline, we verified that the error distance of our methodology is generally smaller, in particular a few tens of kilometers smaller for a large number of observations, whereas far more rarely the error is strongly in favor of the Markov model.This shows that LSTM implicitly makes less serious mistakes in terms of the error distance with respect to Markov, further emphasizing its superiority.
In conclusion, the presented deep learning methodology shows advantages in location prediction of non-repetitive traces generated by short-term foreign tourists.This fits in the field of deep learning-based artificial intelligence for smart city research and smart tourism, e.g., for enhancing user experiences or providing advanced decision making.In particular, this work brings a contribution to the computer science side of the variety of disciplines involved in smart city research [79], specifically falling into the field of analytics technologies, comprising decision-making oriented approaches to discover hidden patterns over big data.These approaches have recently gained critical interest and development, especially for social impact implications [80,81].Nonetheless, their contribution is only a facet of the multi-disciplinary reality of smart city and smart tourism, and synergies with the other disciplines need to be carefully evaluated to guarantee valuable outcomes [82].In any case, the proposed research opens a wide variety of potentially suitable applications, ranging from personalized location-based services, to crowd control, to destination planning and management.The most straightforward implementation option is related to the optimization of the quality of individual touristic experiences.Personalized information and recommendations can be provided to a specific tourist along the path, highlighting optional spots and attractions within the next visited area predicted by the model.In addition, collecting the predictions of individual spatial choices can reveal potential crowded areas, giving rise to congestion warning information for those tourists that were forecasted to visit those areas.Combining individual predictions can indeed be used to study the future spatial collective distribution of tourists, which is certainly important for several tasks, including the adjustment of supply of facilities and services, and sustainable countermeasures complying with real-time crowd control.
More broadly, this study fits in the background of trajectory prediction employing machine learning methodologies, particularly contributing to highlighting the potential of deep learning on human mobility studies, disclosing recurrent network models as a promising tool for pattern recognition in trajectory analysis.

Conclusions
This paper presented a deep learning model to mine human motion patterns, aimed at predicting short-term foreign tourists' next location from place-based trajectories.The model was trained on the collective behavior of users to capture the dependency of track points and infer the latent patterns of motion traces to predict individual trajectories.The process follows a purely data-driven perspective, whereby the model is able to grasp mobility patterns directly from location sequences, without requiring any manual feature extraction or external information.We initially transformed raw traces into sequences of locations unfolding in fixed time steps, and then applied a deep neural network model composed of embedding and LSTM layers to correctly predict the next location in the sequence.Adopted in the context of short non-repetitive traces, our methodology was shown to outperform traditional approaches, expressing a potential that is worth examining in depth.
Possible extensions of this paper can explore augmentation of trajectory data with further information.A research direction may consist of explicitly integrating time information in the sequence, assessing probable performance improvements.In addition, other factors can be taken into consideration, including tourist characteristics such as nationality or age.Furthermore, it could be appropriate to study tourists' mobility at a smaller scale, investigating the predictability of finer traces in time and space (e.g., in an urban environment); in this case, GPS data would allow more detailed resolutions than telecom data.Lastly, the same methodology could be tested for different use cases dealing with short and non-repetitive traces, not limited to tourism analysis.
In conclusion, the use of recurrent network architectures should be further explored in the field of human mobility, since the current promising results can potentially become successful applications in a variety of tasks related to trajectory analysis and motion behavioral studies.

Figure 1 .
Figure 1.Exemplifying overview of the deep neural network model using a block of two long shortterm memory (LSTM) layers and a four-location trajectory.

Figure 1 .
Figure 1.Exemplifying overview of the deep neural network model using a block of two long short-term memory (LSTM) layers and a four-location trajectory.

Figure 2 .
Figure 2. Embedding layer representation: from a sequence of discrete locations to a sequence of dense vectors.
displays a visual representation of the LSTM block; the example shows the last two steps of an embedding sequence and a block of two LSTM layers.

Figure 2 .
Figure 2. Embedding layer representation: from a sequence of discrete locations to a sequence of dense vectors.

19 Figure 3 .
Figure 3. Visual representation of the last two steps of an LSTM block composed of two LSTM layers: the lower vectors represent the input embeddings; the vector on the upper right represents the final trajectory characterization.

Figure 3 .
Figure 3. Visual representation of the last two steps of an LSTM block composed of two LSTM layers: the lower vectors represent the input embeddings; the vector on the upper right represents the final trajectory characterization.

Figure 4 .
Figure 4. Softmax layer representation transforming the output vector of the LSTM block into the probability distribution of the potential predicted location.

Figure 4 .
Figure 4. Softmax layer representation transforming the output vector of the LSTM block into the probability distribution of the potential predicted location.

Figure 5 .
Figure 5. Prediction accuracy (on the left) and accuracy in top 3 (on the right) with respect to the hour of the day.

Figure 5 .
Figure 5. Prediction accuracy (on the left) and accuracy in top 3 (on the right) with respect to the hour of the day.

Figure 6 .
Figure 6.Bar graphs representing the error distance distribution of LSTM and global Markov model (GMM) when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).

Figure 6 .
Figure 6.Bar graphs representing the error distance distribution of LSTM and global Markov model (GMM) when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).

Figure 7 .
Figure 7. Bar graphs representing the difference of error distance between GMM and LSTM when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).

Figure 7 .
Figure 7. Bar graphs representing the difference of error distance between GMM and LSTM when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).
. We reported three different Markov model types as comparison baselines for our methodology: -Personal Markov model.Transition probabilities were calculated by counting each single user's transitions, modeling individual movement patterns.-Global Markov model.First-order probability distributions were calculated by counting the collective state transitions of all users, modeling collective movement patterns.
-Variable-order global Markov model.The principle of the longest match was applied to select which global Markov model order to adopt to calculate the transition probabilities; for a given location sequence, the collective prediction probability distribution was computed on the set of training sequences matching its longest suffix.

Table 1 .
Overall performance comparison between our methodology (LSTM) and the Markov baseline approaches, namely personal Markov model (PMM), global Markov model (GMM), and variable-order global Markov model (VGMM).

Table 2 .
Accuracy (and accuracy in top 3 in brackets) comparison for different values of traveled distance.

Table 3 .
Accuracy (and accuracy in top 3 in brackets) comparison for different values of radius of gyration.

Table 4 .
Accuracy (and accuracy in top 3 in brackets) comparison for visited locations in different ranges of occurrence in the data.The percentage value in the first row refers to the amount of occurrences of each location in that column with respect to the whole dataset.