# LSTM-Based Deep Learning Model for Predicting Individual Mobility Traces of Short-Term Foreign Tourists

## Abstract

## 1. Introduction

## 2. Related Work

## 3. Methodology

#### 3.1. Trajectory Pre-Processing

#### 3.2. Deep Learning Model for Trajectory Prediction

#### 3.2.1. Embedding Layer

#### 3.2.2. LSTM Block

#### 3.2.3. Softmax Layer

#### 3.2.4. Model Training

## 4. Experiment

#### 4.1. Dataset

#### 4.2. Experimental Settings

- Personal Markov model. Transition probabilities were calculated by counting each single user’s transitions, modeling individual movement patterns.
- Global Markov model. First-order probability distributions were calculated by counting the collective state transitions of all users, modeling collective movement patterns.
- Variable-order global Markov model. The principle of the longest match was applied to select which global Markov model order to adopt to calculate the transition probabilities; for a given location sequence, the collective prediction probability distribution was computed on the set of training sequences matching its longest suffix.

#### 4.3. Results

#### 4.4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 1.**Exemplifying overview of the deep neural network model using a block of two long short-term memory (LSTM) layers and a four-location trajectory.

**Figure 2.**Embedding layer representation: from a sequence of discrete locations to a sequence of dense vectors.

**Figure 3.**Visual representation of the last two steps of an LSTM block composed of two LSTM layers: the lower vectors represent the input embeddings; the vector on the upper right represents the final trajectory characterization.

**Figure 4.**Softmax layer representation transforming the output vector of the LSTM block into the probability distribution of the potential predicted location.

**Figure 5.**Prediction accuracy (on the left) and accuracy in top 3 (on the right) with respect to the hour of the day.

**Figure 6.**Bar graphs representing the error distance distribution of LSTM and global Markov model (GMM) when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).

**Figure 7.**Bar graphs representing the difference of error distance between GMM and LSTM when both models predicted wrongly (wrong predictions in the left graph, wrong predictions in top 3 in the right graph).

**Table 1.**Overall performance comparison between our methodology (LSTM) and the Markov baseline approaches, namely personal Markov model (PMM), global Markov model (GMM), and variable-order global Markov model (VGMM).

Accuracy | Accuracy in Top 3 | |
---|---|---|

PMM | 0.3373 | 0.3717 |

GMM | 0.4822 | 0.6508 |

VGMM | 0.4553 | 0.6445 |

LSTM | 0.5076 | 0.7013 |

**Table 2.**Accuracy (and accuracy in top 3 in brackets) comparison for different values of traveled distance.

Trav. Dist. = | ≤10 km | 10–25 km | 25–50 km | 50–100 km | ≥100 km |
---|---|---|---|---|---|

PMM | 0.4645 (0.5088) | 0.4240 (0.4901) | 0.3260 (0.3639) | 0.2613 (0.2796) | 0.1665 (0.1689) |

GMM | 0.5495 (0.7805) | 0.5648 (0.7412) | 0.4988 (0.6534) | 0.4494 (0.5845) | 0.3391 (0.4582) |

VGMM | 0.5788 (0.7945) | 0.5033 (0.7201) | 0.4312 (0.6270) | 0.3979 (0.5656) | 0.3212 (0.4630) |

LSTM | 0.5938 (0.8172) | 0.5696 (0.7933) | 0.5061 (0.7036) | 0.4633 (0.6293) | 0.3803 (0.5270) |

**Table 3.**Accuracy (and accuracy in top 3 in brackets) comparison for different values of radius of gyration.

ROG = | ≤3 km | 3–10 km | 10–32 km | ≥32 km |
---|---|---|---|---|

PMM | 0.4539 (0.5213) | 0.3650 (0.4078) | 0.2974 (0.3089) | 0.1880 (0.1899) |

GMM | 0.5496 (0.7859) | 0.5246 (0.6880) | 0.4719 (0.6038) | 0.3548 (0.4729) |

VGMM | 0.5661 (0.7923) | 0.4578 (0.6668) | 0.4218 (0.5846) | 0.3371 (0.4781) |

LSTM | 0.5891 (0.8229) | 0.5299 (0.7480) | 0.4849 (0.6426) | 0.3955 (0.5404) |

**Table 4.**Accuracy (and accuracy in top 3 in brackets) comparison for visited locations in different ranges of occurrence in the data. The percentage value in the first row refers to the amount of occurrences of each location in that column with respect to the whole dataset.

Amount of Data: | ≥0.5% | 0.1–0.5% | 0.05–0.1% | ≤0.05% |
---|---|---|---|---|

PMM | 0.5169 (0.5485) | 0.3809 (0.4147) | 0.3280 (0.3600) | 0.2624 (0.2986) |

GMM | 0.6872 (0.9305) | 0.5398 (0.7659) | 0.4745 (0.6511) | 0.3925 (0.5095) |

VGMM | 0.7172 (0.9146) | 0.5448 (0.7624) | 0.4462 (0.6456) | 0.3336 (0.5049) |

LSTM | 0.7372 (0.9459) | 0.6024 (0.8210) | 0.5039 (0.7151) | 0.3925 (0.5660) |

