An Attention-Based Spatiotemporal Gated Recurrent Unit Network for Point-of-Interest Recommendation

: Point-of-interest (POI) recommendation is one of the fundamental tasks for location-based social networks (LBSNs). Some existing methods are mostly based on collaborative ﬁltering (CF), Markov chain (MC) and recurrent neural network (RNN). However, it is di ﬃ cult to capture dynamic user’s preferences using CF based methods. MC based methods su ﬀ er from strong independence assumptions. RNN based methods are still in the early stage of incorporating spatiotemporal context information, and the user’s main behavioral intention in the current sequence is not emphasized. To solve these problems, we proposed an attention-based spatiotemporal gated recurrent unit (ATST-GRU) network model for POI recommendation in this paper. We ﬁrst designed a novel variant of GRU, which acquired the user’s sequential preference and spatiotemporal preference by feeding the continuous geographical distance and time interval information into the GRU network in each time step. Then, we integrated an attention model into our network, which is a personalized process and can capture the user’s main behavioral intention in the user’s check-in history. Moreover, we conducted an extensive performance evaluation on two real-world datasets: Foursquare and Gowalla. The experimental results demonstrated that the proposed ATST-GRU network outperforms the existing state-of-the-art POI recommendation methods signiﬁcantly regarding two commonly-used evaluation metrics.


Introduction
With the prevalence of smart devices and location-based social networks (LBSNs) services, people can easily share their locations and check-in information with others in LBSNs [1,2].The huge volume of users' history check-in data brings opportunities for researching human mobility behavior, and point-of-interest (POI) recommendation become one of the important tasks in LBSNs.As shown in Figure 1, POI recommendation may help a user find a place of interest after dinner, or provide a user with discount information about nearby shopping malls.Therefore, POI recommendation can not only meet user's personalized preferences for visiting new places but also help LBSNs service providers implement intelligent location-aware online advertising services [3,4].The task of POI recommendations is investigated in different settings, such as general POI recommendation [5,6], out-of-town recommendation [7], next POI recommendation [3,8,9], tour recommendation for groups [10,11], and so on.In this work, we focus on next POI recommendation by modeling check-in sequences and incorporating spatiotemporal influence in a personalized way, this task is more significant since it can predict users' next movement behaviors.As illustrated in Figure 1, a user's historical check-in sequence contains geographical information and temporal information, similarly, given all users' check-in sequences, the task of POI recommendation is to recommend the next POI that the user is likely interested in visiting.Specifically, according to prediction scores, we can recommend top-k POIs to a user, and a higher predicted score indicates that the user is more likely to go.
Nowadays, POI recommendation has been extensively studied from the academic and industrial fields [5][6][7][8][9][10][11][12][13].POI recommendation is different from the general recommendation tasks (e.g., goods, movies, news recommendation) because there is highly spatiotemporal dependence.The spatial and temporal contextual information is precisely the basis for modeling user movement behaviors [8].Tobler's first law of geography [14] states that "everything is related to everything else, but near things are more related than distant things".For example, people visit nearby places more often in real life, such as cinemas, restaurants, and so on.That is, adjacent POIs are more geographically relevant than distant POIs.Geographic factors significantly impact the user's movement behavior, and geographic location information can effectively improve the recommendation quality [12].
Due to the time sensitivity of the POI recommendation, the temporal information also plays a critical role [2], because physical constraints on check-in activities can lead to specific patterns.For instance, some people always like to go to a cinema on weekend nights to see a movie.The temporal influences of a POI recommender system typically show in the three aspects: periodicity, nonuniformness, and consecutiveness [15].Besides, sequential influence is also essential in POI recommendation since individual behaviors of movement commonly exhibit sequential patterns [16].
In brief, the influences of spatial, temporal and sequential factors are crucial to analyze individual behaviors for personalized POI recommendation.So far, many researches have considered these factors to improve the performance of POI recommendation algorithms, such as the collaborative filtering (CF) [4] and Markov chain (MC) [3].However, it is difficult to process sequence data and to capture dynamic user's preferences using CF based methods; and MC based methods depend on strong independent assumptions among different factors.Therefore, there are still many challenges regarding how to integrate information of various features to accurately model users' complex behavioral preferences and to recommend reliable POIs to users.
Recently, recurrent neural network (RNN) [17] and its variants (e.g., long-short term memory (LSTM) [18] and gated recurrent unit (GRU) [19,20]) have been successfully applied to sequential recommender systems [21].The hidden states of RNN methods have both characteristics in nature and are adequate for modeling sequential correlations and temporal dynamics in POI recommender systems [8,22], and they can better capture the long-term dependency.However, the existing RNN based POI recommendation methods have difficulty in alleviating the cold-start problem.An The task of POI recommendations is investigated in different settings, such as general POI recommendation [5,6], out-of-town recommendation [7], next POI recommendation [3,8,9], tour recommendation for groups [10,11], and so on.In this work, we focus on next POI recommendation by modeling check-in sequences and incorporating spatiotemporal influence in a personalized way, this task is more significant since it can predict users' next movement behaviors.As illustrated in Figure 1, a user's historical check-in sequence contains geographical information and temporal information, similarly, given all users' check-in sequences, the task of POI recommendation is to recommend the next POI that the user is likely interested in visiting.Specifically, according to prediction scores, we can recommend top-k POIs to a user, and a higher predicted score indicates that the user is more likely to go.
Nowadays, POI recommendation has been extensively studied from the academic and industrial fields [5][6][7][8][9][10][11][12][13].POI recommendation is different from the general recommendation tasks (e.g., goods, movies, news recommendation) because there is highly spatiotemporal dependence.The spatial and temporal contextual information is precisely the basis for modeling user movement behaviors [8].Tobler's first law of geography [14] states that "everything is related to everything else, but near things are more related than distant things".For example, people visit nearby places more often in real life, such as cinemas, restaurants, and so on.That is, adjacent POIs are more geographically relevant than distant POIs.Geographic factors significantly impact the user's movement behavior, and geographic location information can effectively improve the recommendation quality [12].
Due to the time sensitivity of the POI recommendation, the temporal information also plays a critical role [2], because physical constraints on check-in activities can lead to specific patterns.For instance, some people always like to go to a cinema on weekend nights to see a movie.The temporal influences of a POI recommender system typically show in the three aspects: periodicity, non-uniformness, and consecutiveness [15].Besides, sequential influence is also essential in POI recommendation since individual behaviors of movement commonly exhibit sequential patterns [16].
In brief, the influences of spatial, temporal and sequential factors are crucial to analyze individual behaviors for personalized POI recommendation.So far, many researches have considered these factors to improve the performance of POI recommendation algorithms, such as the collaborative filtering (CF) [4] and Markov chain (MC) [3].However, it is difficult to process sequence data and to capture dynamic user's preferences using CF based methods; and MC based methods depend on strong independent assumptions among different factors.Therefore, there are still many challenges regarding how to integrate information of various features to accurately model users' complex behavioral preferences and to recommend reliable POIs to users.
Recently, recurrent neural network (RNN) [17] and its variants (e.g., long-short term memory (LSTM) [18] and gated recurrent unit (GRU) [19,20]) have been successfully applied to sequential recommender systems [21].The hidden states of RNN methods have both characteristics in nature and are adequate for modeling sequential correlations and temporal dynamics in POI recommender systems [8,22], and they can better capture the long-term dependency.However, the existing RNN based POI recommendation methods have difficulty in alleviating the cold-start problem.An excellent choice is to apply RNN and incorporate additional spatiotemporal contextual information, such as continuous geographical distance and time interval.Most RNN methods rely on the last hidden layer activation vector when calculating the output of the network, and this limits the ability to understand and learn the main intention of user check-in behavior from the hidden states [21].In other words, the historical check-in behaviors of a user are not equally important for predicting the next behavior, and we need to focus on the main information.
In view of the above analysis, we propose a novel attention-based spatiotemporal GRU network (ATST-GRU) for POI recommendation.Figure 2 delineates the architecture of the network.First, considering the GRU network is a more robust variant of RNN which work better in capturing long term dependencies and alleviating the exploding or vanishing gradients problems [23], we attempted to use an extended GRU network to model check-in sequences by considering geographical distances between continuous POIs and time intervals between continuous check-in behaviors.Such a network structure can work better to explore the spatiotemporal influence and sequential influence and alleviate the problems of data heterogeneity and sparsity.Then, inspired by the attention mechanism in neural networks [24,25], we further improved our method by introducing an attention model, which can explore the most pertinent piece of user's check-in behavior.Next, the parameters of ATST-GRU were learned by the Bayesian personalized ranking (BPR) [26] framework and back propagation through time (BPTT) algorithm [27].Finally, extensive experiments were conducted on two public datasets (Foursquare and Gowalla) and the results were compared with several state-of-the-art POI recommendation methods to evaluate the model.The main contributions of this work are as follows: excellent choice is to apply RNN and incorporate additional spatiotemporal contextual information, such as continuous geographical distance and time interval.Most RNN methods rely on the last hidden layer activation vector when calculating the output of the network, and this limits the ability to understand and learn the main intention of user check-in behavior from the hidden states [21].In other words, the historical check-in behaviors of a user are not equally important for predicting the next behavior, and we need to focus on the main information.
In view of the above analysis, we propose a novel attention-based spatiotemporal GRU network (ATST-GRU) for POI recommendation.Figure 2 delineates the architecture of the network.First, considering the GRU network is a more robust variant of RNN which work better in capturing long term dependencies and alleviating the exploding or vanishing gradients problems [23], we attempted to use an extended GRU network to model check-in sequences by considering geographical distances between continuous POIs and time intervals between continuous check-in behaviors.Such a network structure can work better to explore the spatiotemporal influence and sequential influence and alleviate the problems of data heterogeneity and sparsity.Then, inspired by the attention mechanism in neural networks [24,25], we further improved our method by introducing an attention model, which can explore the most pertinent piece of user's check-in behavior.Next, the parameters of ATST-GRU were learned by the Bayesian personalized ranking (BPR) [26] framework and back propagation through time (BPTT) algorithm [27].Finally, extensive experiments were conducted on two public datasets (Foursquare and Gowalla) and the results were compared with several state-of-the-art POI recommendation methods to evaluate the model.The main contributions of this work are as follows:

•
A novel spatiotemporal gated recurrent unit (ST-GRU) network model is proposed in this paper, which combines continuous values of spatial and temporal contexts information into GRU network naturally to capture the user's spatiotemporal preferences and alleviate the problems of data heterogeneity and sparsity; • An attention-based method is introduced to ST-GRU network for POI recommendation, named ATST-GRU model.This can automatically pay more attention to critical information and extract the user's main purpose, which significantly strengthens the user's long-term interest; • Extensive experiments on two real-world datasets show that ATST-GRU is effective and outperforms state-of-the-art methods significantly.
The rest of this paper is organized as follows: the related methods are briefly reviewed in Section 2. The details of our ATST-GRU network are described in Section 3. Experiments and results of the proposed method are illustrated in Section 4. Finally, conclusions are summarized in Section 5.

Check-in Sequences
Recall F 1 -score BPR BPTT  The rest of this paper is organized as follows: the related methods are briefly reviewed in Section 2. The details of our ATST-GRU network are described in Section 3. Experiments and results of the proposed method are illustrated in Section 4. Finally, conclusions are summarized in Section 5.

Related Work
In this section, we review several methods for POI recommendation, including CF based methods, MC based methods and deep learning (DL) based methods.
Geographical influence is one of the important factors for POI recommendation.Most researchers have modeled the geographical influences by considering distance as a penalty [41] or building a distance distribution model, such as power-law distribution [2,42], multi-center Gaussian distribution [4] or personalized nonparametric distribution [43].Ye et al. [5] proposed a user-based CF framework for POI recommendation which models geographical influence by power law distribution.Cheng et al. [4] proposed a multi-center Gaussian model to capture the spatial clustering phenomenon.In addition, Zhang et al. [44] developed a kernel density estimation approach to capture the personalized geographical influence.Lian et al. [12] proposed MF-based POI recommendation method which captures the spatial clustering phenomenon from the aspect of two-dimensional kernel density estimation.Li et al. [35] proposed a ranking based geographical factorization method, which exploits both geographical and temporal contexts for POI recommendation.
Temporal information has been proved as another important type of context for POI recommendation and has attracted significant attention from some researchers.For example, Yuan et al. [2] incorporated temporal information into a user-based CF recommender by dividing time into 24 time slots.Furthermore, Gao et al. [32] proposed MF-based location recommendation framework which investigated the temporal properties of users' check-in behavior.
Besides, social information and other POI characteristic information have also been studied for POI recommendation.For instance, Gao et al. [1] proposed a social-historical model which integrated the social and historical effects and assessed the role of social correlation in user's check-in behavior.Yang et al. [37] fused the spatial and social information, and user tips with a location-based social matrix factorization algorithm.
However, most of these methods fail to study the spatiotemporal sequential influence of the user's check-ins history, which are very important for mining dynamic user's behavior and preferences.

Markov Chain Based Methods
Since historical check-in information in different time periods and spatial locations have different effects on users' behavior, sequential influence should be considered for POI recommendation.Most of the existing studies usually employ the properties of a Markov chain to model the sequential influence [45][46][47][48][49][50][51].For instance, Rendle et al. [45] first proposed a state-of-the-art personalized Markov chain model, namely FPMC, which implemented the recommended task for sequence data in an MF-based approach.Rather than merely modeling temporal information, Cheng et al. [3] employed FPMC to model the personalized POI transition and considers users' movement constraint.Mathew et al. [46] proposed a hybrid method to predict human movement by a hidden Markov model (HMM).Chen et al. [47] proposed a POI recommendation with Markov modeling, which considered both individual and collective movement patterns in making prediction.Ye et al. [16] attempted to model the underlying user movement pattern by using check-in category information and proposed a mixed hidden Markov model to predict the most likely next location.Zhang et al. [48] proposed a novel HMM based group-level mobility modeling framework.Similarly, some other POI recommendation methods based on Markov chain have also been proposed [49][50][51].However, the drawbacks of MC based methods are there strong Markov assumptions among different components, and they are challenging to model long-term dependency.Although Personalized Ranking Metric Embedding (PRME) method [52] learns a personalized metric embedding and models the sequential POI transition, it merely models short-term transition patterns within users' movements.

Deep Learning Based Methods
Deep learning has been successfully applied to the POI recommendation system in recent years.Many such methods have been introduced or used in POI recommendation, such as Word2vec [53], multilayer perceptron (MLP) [54,55], convolutional neural network (CNN) [56,57] and deep neural network (DNN) [58].Zhao et al. [34] proposed a temporal POI embedding model by introducing the word2vec framework, which incorporated both sequential and spatial-temporal context influence.Yang et al. [54] developed a deep neural architecture called Preference and Context Embedding (PACE) to bridge collaborative filtering and semi-supervised learning for POI recommendation.Wang et al. [56] proposed a novel POI recommender system, which used CNN to learn user and POI latent features from images.Ding et al. [58] proposed a DNN-based POI recommendation framework, and incorporated co-visiting pattern, geographical influence, and categorical correlation to alleviate the data sparsity issue.
Recently, recurrent neural networks (RNNs) [17] have become more and more powerful in modeling sequential history and transition of the user's movement.Moreover, they have been successfully applied to many fields like sequential click prediction [59], session-based recommendation [23] and mobility prediction [8], and so on.With the help of gated activation function like gated recurrent unit (GRU) [19] and long-short term memory (LSTM) [18], they can better capture the long-term dependencies.Hidasi et al. [23] first applied recurrent neural network with GRU for sequence recommendation, and their experimental results have shown a significant improvement over traditional methods.Spatial and temporal contextual information has shown its importance on different tasks.Liu et al. [8] employed a spatial and temporal recurrent neural network (ST-RNN) to model spatiotemporal contextual information with continuous values for location prediction.However, it ignores long-term dependencies, and the standard RNN method may suffer from the exploding or vanishing gradients problem.Zhao et al. [60] proposed a new variant of LSTM, which implemented time gates and distance gates into LSTM to capture the spatiotemporal relation between successive check-ins.Cui et al. [22] proposed a Distance-to-Preference (Distance2Pre) network for the next POI prediction, which modeled check-in sequences and successive distances to acquire the user's sequential preference and spatial preference.However, the above methods were unable to capture different contributions of each POIs in the history check-in sequence.In other words, it is difficult for them to extract the user's main intentions in the current sequence.
A large amount of research has benefited from the attention mechanism model in recent years [61][62][63], which does not only enhance the ability of the neural network to capture long-term dependencies but also enhances the interpretability of neural networks.Based on the seq2seq model [64], Bahdanau et al. [65] introduced an attention mechanism into the neural machine translation task.Vaswani et al. [61] proposed a new simple network architecture to encode an input sequence into an output sequence using the attention mechanism.Feng et al. [66] proposed an attentional GRU network for user's movement prediction from sparse data.Unlike existing studies, our work combines geographical distances and time intervals contextual information into a more robust GRU network to capture user's sequential preference and spatiotemporal preference.Besides, an attention model is introduced to capture the user's main intentions.

Proposed ATST-GRU Model
In this section, we first address the problem and introduce the basic GRU model.Then we present proposed ST-GRU and ATST-GRU network.Finally, we train our model with the BPR framework and the BPTT algorithm.

Problem Statement
For convenience of expression, we give several important definitions, and the essential notations are listed in Table 1.vector representations of spatial and temporal intervals

Gated Recurrent Unit
The primary challenge of the POI recommendation task is modelling the user's sequential preference and the spatiotemporal preference, this can be considered as a sequence modelling problem.As we know, a good choice is RNN architectures.In this paper, we choose the GRU rather than a standard RNN because it can work better to deal with the gradient vanishing and gradient exploding problem.Hidasi et al. [23] demonstrate that GRU outperformed LSTM in the sequence-based recommendation.The hidden unit of GRU contains a reset gate r u t k and an update gate z u t k to control the flow of information.The formulas are as follows: where In GRU network, the prediction of next POI can be calculated the inner product of user and POI representations.In our network, we regard the last hidden vectors h u t N as the representation of the user.Like MF-based POI recommendation approaches [4,12,35], a user's preference for a POI by considering sequential preference is denoted as: where o u,t N+1 ,v k represents the predicted probability that user u visits POI v k at time point t N+1 .

Spatiotemporal GRU Network
Spatial and temporal contextual information are the basis for mining user movement behavior patterns, which help us to understand the behavior background more precisely to improve user behavior modeling.General sequence modeling only considers the order relationship between check-in behavior, ignoring the continuous geographical distances and time intervals information.However, these continuous time intervals and geographical distance values are crucial for modeling user behavior and mining user's preference in personalized POI recommendation systems.
We argue that spatial and temporal context can work as implicit information to guide the learning of gate mechanism.We propose to add the continuous geographical distances and time intervals information into the basic GRU network, which more naturally captures personalized spatiotemporal preferences for POI recommendation.Figure 3 illustrates the architecture of ST-GRU network, at each time step, each ST-GRU unit takes an embedded vector v u t k , a spatial contexts vector s u t k and a temporal contexts vector g u t k as inputs.In this way, the output of ST-GRU is a hidden layer vector h u t k , which indicates the combined influence of POIs and spatiotemporal contexts information.The formulas are: .The user's preferences in each hidden state are greatly enhanced.However, if we learn a distinct matrix for each continuous geographical distance and time interval, the ST-GRU network will face the data sparsity problem.We partition continuous geographical distances and time interval values into discrete bins respectively and utilize a linear interpolation to acquire their transition matrices as follows: where  Finally, POI recommendation for target users can be calculated the dot-product of user and POI representations, which is similar to Formula (2).And the prediction of whether user u would go to a location k v at time 1 N t + can be calculated as:

(
) ( ) where u p is the permanent representation of a user, and it is specifically designed to indicate a user's profile and long-term preference.N u t h is the dynamic representation of a user, which captures a user's dynamic interests under a specific spatial and temporal contexts.

Attention-Based Spatiotemporal GRU Network
Intuitively speaking, when we predict the user's next behavior, all the users' history check-in behavior does not contribute equally.Moreover, previous methods have not been able to capture the user's main intention adequately.Therefore, in our model, we involve an attention mechanism to capture the user's main purpose in the current sequence, which allows the different parts of the sequence of the past check-in behavior' to be dynamically selected and linearly combined by the decoder.In other words, attention mechanism helps us select only the relevant and important POIs for next POI recommendation at each time step, and all the previous hidden states can be utilized by a weighted sum of visited POIs. Figure 4 illustrates the architecture of ATST-GRU network, in this encoding scheme we use ST-GRU as the basic component, where the weighted sum of hidden states is interpreted as the user's main intention feature.However, if we learn a distinct matrix for each continuous geographical distance and time interval, the ST-GRU network will face the data sparsity problem.We partition continuous geographical distances and time interval values into discrete bins respectively and utilize a linear interpolation to acquire their transition matrices as follows: where U(δs) and L(δs) denote the upper bound and lower bound values of a specific geographical distance δs.Similarly, U(δg) and L(δg) indicate the upper bound and lower bound values of a specific time interval δg.W U(δs) and W L(δs) are the spatial factor matrix, and W U(δg) and W L(δg) are the temporal factor matrix.Finally, POI recommendation for target users can be calculated the dot-product of user and POI representations, which is similar to Formula (2).And the prediction of whether user u would go to a location v k at time t N+1 can be calculated as: where p u is the permanent representation of a user, and it is specifically designed to indicate a user's profile and long-term preference.h u t N is the dynamic representation of a user, which captures a user's dynamic interests under a specific spatial and temporal contexts.W N and W p are the parameters of the output layer.W v , W s , and W g are transition matrices.

Attention-Based Spatiotemporal GRU Network
Intuitively speaking, when we predict the user's next behavior, all the users' history check-in behavior does not contribute equally.Moreover, previous methods have not been able to capture the user's main intention adequately.Therefore, in our model, we involve an attention mechanism to capture the user's main purpose in the current sequence, which allows the different parts of the sequence of the past check-in behavior' to be dynamically selected and linearly combined by the decoder.In other words, attention mechanism helps us select only the relevant and important POIs for next POI recommendation at each time step, and all the previous hidden states can be utilized by a weighted sum of visited POIs. Figure 4 illustrates the architecture of ATST-GRU network, in this encoding scheme we use ST-GRU as the basic component, where the weighted sum of hidden states is interpreted as the user's main intention feature.
ISPRS Int.J. Geo-Inf.2019, 8, 355 where the weighted factors α t N t k determine which part of the history check-in sequence should be emphasized or ignored when making next behavior predictions in the POI recommendation model, which in turn is a function of hidden states as follows: where the function m h u t N , h u t k is used to calculate the similarity between the final hidden state h u t N and the representation of the previously visited POI h u t k σ denotes the sigmoid function σ(x) = 1/(1 + e −x ).A 0 , A 1 and A 2 are used to transform h u t N and h u t k into a common latent space.
emphasized or ignored when making next behavior predictions in the POI recommendation model, which in turn is a function of hidden states as follows: ( ) where the function ( ) Similarly to Formula ( 6), the prediction of whether user u would go to a location k v at time  ) ( ) where

Network Learning
In this subsection, we train our ATST-GRU network under the Bayesian Personalized Ranking (BPR) framework by using the backpropagation through time (BPTT) algorithm.These methods have been successfully used for network training of RNN based recommendation models [8,22].BPR is a pairwise ranking framework that is widely used to process implicit feedback data.The basic assumption of BPR is that a user prefers previously visited POIs than negative ones.In the BPR framework, at each sequential position k, the objective of ATST-GRU is to maximize the following probability: ,, where v and v denote a positive location and a negative location, respectively.Additionally, a negative location is randomly chosen from location sets that users have not visited.( ) nonlinear sigmoid function ( ) ( ) Similarly to Formula (6), the prediction of whether user u would go to a location v k at time t N+1 can be computed as: where W N and W p are the parameters of the output layer.W v , W s , and W g are transition matrices.

Network Learning
In this subsection, we train our ATST-GRU network under the Bayesian Personalized Ranking (BPR) framework by using the backpropagation through time (BPTT) algorithm.These methods have been successfully used for network training of RNN based recommendation models [8,22].BPR is a pairwise ranking framework that is widely used to process implicit feedback data.The basic assumption of BPR is that a user prefers previously visited POIs than negative ones.In the BPR framework, at each sequential position k, the objective of ATST-GRU is to maximize the following probability: where v and v denote a positive location and a negative location, respectively.Additionally, a negative location is randomly chosen from location sets that users have not visited.g(•) is a nonlinear sigmoid function g(x) = 1/(1 + e −x ).
Finally, by incorporating the negative log-likelihood, we can solve the objective function for POI recommendation as follows: ISPRS Int.J. Geo-Inf.2019, 8, 355 10 of 18 where Θ = {U, W, b, A 0 , A 1 , A 2 } is the set of parameters, U represent the set of weight matrices which include U z , U r and U c , which is similar to W and b. λ is the regularization parameter.Then, we use stochastic gradient descent (SGD) and BPTT to optimize the network parameters in this study.Additionally, the range of initialization parameters was (−0.5 to 0.5).The size of each batch was set to 100.The regularization and the initial learning rate were set to 0.001 and 0.01, respectively.Moreover, our model is trained on a GeForce GTX TitanX GPU, the code used in our experiments was written by using Theano and Python 3.5.The learning algorithm of ATST-GRU is summarized in Algorithm 1.

Experimental Results and Analysis
In this section, we conduct empirical experiments on two publicly-available datasets to validate the effectiveness of the proposed method.First, we introduce the datasets, baseline methods and evaluation metrics.Then we compare ATST-GRU with some state-of-the-art POI recommendation methods.Finally, we study the effects of different model parameters.

Datasets
We applied two widely-used publicly-available LBSNs datasets called Foursquare and Gowalla to evaluate the performance of different methods, and the datasets were preprocessed in [2].In the Foursquare all check-ins data were collected at Singapore, from August 2010-July 2011.In the Gowalla dataset, all check-ins data were collected in California and Nevada, from February 2019-October 2010.Figure 5 presents the check-in distribution in the Foursquare and Gowalla datasets, where the locations are concentrated in some geographical regions.Moreover, we randomly selected several different user's check-in sequences on the two datasets and visualized them on the map.As shown in Figure 6, the results of map visualization have shown that people prefer to visit nearby POIs and the visited POIs often form spatial clusters.In particular, we can observe that people may have different moving patterns and different users have different preferences for travel distance.This further supports that considering the spatial impact can effectively improve the POI recommendation performance.
employed the leave-one-out evaluation.We used the last POI of each user's check-in sequence as the test data and the remaining POI as the training data.test data and the remaining POI as the training data.

Baseline Methods
We compared the effectiveness of our proposed ST-GRU and ATST-GRU model with the following state-of-the-art POI recommendation approaches.
• BPR [26].This method is a generic optimization criterion and learning algorithm for personalized ranking, and we applied BPR in POI recommendation; • GRU [19].GRU network is a more robust variant of RNN which work better in capturing long term dependencies, and we applied GRU in POI recommendation; • FPMC-LR [3].This method is a state-of-the-art Markov chain method that models personalized sequential transitions for POI recommendation; • PRME-G [52].This method is a state-of-the-art metric embedding method for POI recommendation, which integrates geographical influence and sequential information; • Rank-GeoFM [35].Also, to alleviate data sparsity and cold start problems, we removed POIs checked in by less than five users and users who have checked in fewer than five POIs.After pre-processing, the basic statistics of two datasets are summarized in Table 2. Inspired by previous studies [22,26], we employed the leave-one-out evaluation.We used the last POI of each user's check-in sequence as the test data and the remaining POI as the training data.We compared the effectiveness of our proposed ST-GRU and ATST-GRU model with the following state-of-the-art POI recommendation approaches.
• BPR [26].This method is a generic optimization criterion and learning algorithm for personalized ranking, and we applied BPR in POI recommendation; • GRU [19].GRU network is a more robust variant of RNN which work better in capturing long term dependencies, and we applied GRU in POI recommendation; • FPMC-LR [3].This method is a state-of-the-art Markov chain method that models personalized sequential transitions for POI recommendation; • PRME-G [52].This method is a state-of-the-art metric embedding method for POI recommendation, which integrates geographical influence and sequential information; • Rank-GeoFM [35].
It is a state-of-the-art ranking-based factorization model for POI recommendation, which incorporates the geographical influence and temporal influence; • ST-RNN [8].This is a state-of-the-art RNN-based model for successive POI recommendation.It incorporates both local temporal and spatial transition context within the RNN architecture; • DeepMove [66]: It is a state-of-the-art attentional RNN model which capture the sequential transitions by jointly embedding the multiple factors.

Evaluation Metrics
To evaluate the performance of the above methods, we applied two popular evaluation metrics called Recall@k and F 1 -score@k as follows: where k indicates the number of POIs recommended to the user, we reported R@k and F 1 @k with k = 5, 10, 15 and 20 in our experiments.R(u) indicates the Top-k list recommended to the user.T(u) represents the number of POIs the user actually visited.

Comparison and Results
Figure 7 shows the performance of all methods on the Foursquare and the Gowalla datasets.We made the following observations: First, we explored traditional baselines BPR, GRU, FPMC-LR, PRME-G and Rank-GeoFM.For both the two datasets, we can see that BPR and GRU dropped behind other algorithms since they did not take into account other useful information such as geographical influence and temporal information.Besides, GRU performed slightly better than BPR, indicating that modeling sequential influence is effective for POI recommendation.Compared with BPR and GRU, FPMC-LR and PRME-G employed both geographical and sequential information in LBSNs, and their performance on two datasets were better.Specifically, we observed that Rank-GeoFM performed obviously better than FPMC-LR and PRME-G.There are two possible reasons for this result: Rank-GeoFM incorporated both geographical influence and temporal context which could work better to capture spatiotemporal preference and to deal with the data sparsity problems.Therefore, the above analysis also supports that spatial influence, temporal influence, and sequential influence are critical factors in improving POI recommendation performance.
In the following, we compare the above traditional methods with RNN-based methods (i.e., ST-RNN, ST-GRU, DeepMove).Firstly, compared with the above traditional methods, ST-RNN and ST-GRU outperformed them significantly, this was because the RNN-based structure combined with spatiotemporal context information can better capture the user's sequence preferences and spatiotemporal behavior preferences.We can see that ST-GRU greatly outperformed GRU, indicating that ST-GRU benefits from considering the spatiotemporal features.Besides, we observe that the ST-GRU outperformed the ST-RNN, which may be due to the advantage of GRUs over RNNs, i.e., GRU is a more robust network structure which works better in capturing long term dependencies and alleviating the exploding or vanishing gradients problem.Specifically, DeepMove obtained much better performance than ST-RNN and ST-GRU as it introduces the attention mechanisms and considers period influence.The above discussion indicates that using RNN-based network structure and attention model can effectively improve the performance of POI recommendation.
In summary, the experimental results suggest that the proposed ATST-GRU can successfully capture the user's sequential preference, spatiotemporal preference, and main behavioral intention, leading to a superior performance for POI recommendation.First, we explored traditional baselines BPR, GRU, FPMC-LR, PRME-G and Rank-GeoFM.For both the two datasets, we can see that BPR and GRU dropped behind other algorithms since they did not take into account other useful information such as geographical influence and temporal information.Besides, GRU performed slightly better than BPR, indicating that modeling sequential influence is effective for POI recommendation.Compared with BPR and GRU, FPMC-LR and PRME-G employed both geographical and sequential information in LBSNs, and their performance on two datasets were better.Specifically, we observed that Rank-GeoFM performed obviously better than

Influence of Embedding Dimension Size
We further studied the effect of the embedding dimension size on the performance of our ATST-GRU network.In general, a higher number of embedding dimensions may enhance the performance of the model.However, it also leads to over-fitting.Here, we varied the dimension number from 10 to 150 and computed the network's generalization using Recall@k and F 1 -score@k with k = 10, 20 for each case.Figure 8 shows the Recall@k and F 1 -score@k values for different dimension numbers on the two datasets.We observed that our ATST-GRU network achieved stable performance in the range of 70-150 and 90-150 on the Foursquare and Gowalla datasets, respectively.Therefore, on the Foursquare dataset and Gowalla dataset, we could set the number of embedding dimensions to 70 and 90 in our experiments, respectively.
In summary, the experimental results suggest that the proposed ATST-GRU can successfully capture the user's sequential preference, spatiotemporal preference, and main behavioral intention, leading to a superior performance for POI recommendation.

Influence of Embedding Dimension Size
We further studied the effect of the embedding dimension size on the performance of our ATST-GRU network.In general, a higher number of embedding dimensions may enhance the performance of the model.However, it also leads to over-fitting.Here, we varied the dimension number from 10 to 150 and computed the network's generalization using Recall@k and F1-score@k with k = 10, 20 for each case.Figure 8 shows the Recall@k and F1-score@k values for different dimension numbers on the two datasets.We observed that our ATST-GRU network achieved stable performance in the range of 70-150 and 90-150 on the Foursquare and Gowalla datasets, respectively.Therefore, on the Foursquare dataset and Gowalla dataset, we could set the number of embedding dimensions to 70 and 90 in our experiments, respectively.

Influence of Different Spatial and Temporal Window Widths
Spatial and temporal window widths are essential factors to affect the performance of ATST-GRU, and we also did a batch of experiments on the two datasets with different spatial and temporal window width settings.Table 3 illustrates the performance of ATST-GRU evaluated by Recall@20 with varying window widths.On the Foursquare dataset, we can observe that we achieved best prediction performance when using a spatial window width of 0.3 km and temporal window width of 12 h.On the Gowalla dataset, the best prediction performance of recall@20 was obtained with a spatial window width of 0.1 km and temporal window width of 48 h.Moreover, we observe that ATAT-GRU outperformed state-of-the-art methods even when the spatial and temporal window's width were not optimal.The results further suggest the superiority of the proposed ATST-GRU for POI recommendation.

Conclusions
In recent years, POI recommendation based on deep learning has widely attracted attention in academia and industry.Compared to general POI recommendation, we focus on next POI recommendation task in this work, and comprehensively utilize user's check-in sequence information and spatiotemporal information to mine user's movement behavior rules and preferences.Hence, an innovative attention-based spatiotemporal GRU (ATST-GRU) network is proposed to tackle the POI recommendation problem in this paper.ATST-GRU introduces spatial-temporal factors into the gate mechanism of GRU to model the spatiotemporal contextual information and sequential nature.Such a network structure can better capture the user's spatiotemporal preferences and alleviate the sparsity of the data.More specifically, the contextual attention-based modeling can capture the important information of the user's historical behavior, and this greatly enhances the modeling of user's main interest.Besides, we validate the effectiveness of ATST-GRU by using two real-life mobility datasets (i.e., Foursquare and Gowalla).The experimental results show that ATST-GRU outperforms other state-of-the-art methods for POI recommendation.
In the future, we will focus on extending the current study by considering other check-in features (e.g., the semantic context of POI, user comments, social relations) or other more advanced neural networks (e.g., graph neural networks).These may motivate the model to improve performance of POI recommendation.

19 Figure 1 .
Figure 1.Diagram of a user's check-in sequence.

Figure 1 .
Figure 1.Diagram of a user's check-in sequence.

Figure 2 .
Figure 2. Architecture of the proposed method for point-of-interest (POI) recommendation.Figure 2. Architecture of the proposed method for point-of-interest (POI) recommendation.

Figure 2 .
Figure 2. Architecture of the proposed method for point-of-interest (POI) recommendation.Figure 2. Architecture of the proposed method for point-of-interest (POI) recommendation.

Definition 1 . 1 , c u t 2 ,.Definition 3 .
POI.A point-of-interest (POI) is a uniquely identified spatial location.In this paper, we use v to represent a POI and Q = {v 1 , v 2 , , • • •} represents the set of POIs.Each POI v has a unique identifier and geographical coordinates, which include geographical latitude and geographical longitude.Definition 2. Check-in sequence.A check-in sequence represents that a user's history check-ins are arranged in chronological order, denoted by C u = c u t • • • , c u t N POI recommendation.Given a set of users' check-in sequences C U and a set of POIs Q, the POI recommendation task is to recommend top-k POIs that user u would be interested in.
r , W c are transition matrices and b z , b r , b c are the biases.v u t k ∈ R d is the input vector of user u, and t k is the time step.hu t k is the candidate state activated by element-wise tanh(•), h u t k is the hidden vector.σ denotes the sigmoid function σ(x) = 1/(1 + e −x ) represents the element-wise multiplication between two vectors.

Figure 3 .
Figure 3.The architecture of ST-GRU network.

W
are the parameters of the output layer.

Figure 3 .
Figure 3.The architecture of ST-GRU network.

W
are the parameters of the output layer.

Figure 5 .Figure 5 .
Figure 5.All users' check-in distribution of POIs in the two datasets.

Figure 5 .
Figure 5.All users' check-in distribution of POIs in the two datasets.

Figure 6 .
Figure 6.Different users' check-in sequences on the map.(a-c) and (d-f)are from the Foursquare and Gowalla datasets, respectively.

Figure 6 .
Figure 6.Different users' check-in sequences on the map.(a-c) and (d-f)are from the Foursquare and Gowalla datasets, respectively.

Figure 8 .
Figure 8.Effect of the number of dimensions in ATST-GRU.Figure 8. Effect of the number of dimensions in ATST-GRU.

Figure 8 .
Figure 8.Effect of the number of dimensions in ATST-GRU.Figure 8. Effect of the number of dimensions in ATST-GRU.

Table 1 .
Notations in this paper.

.
W sz , W sr , W sh and W gz , W gr , W gh are transition matrices for s u

Table 2 .
Basic statistics of Foursquare and Gowalla datasets.

Table 2 .
Basic statistics of Foursquare and Gowalla datasets.
It is a state-of-the-art ranking-based factorization model for POI recommendation, which incorporates the geographical influence and temporal influence; • ST-RNN [8].This is a state-of-the-art RNN-based model for successive POI recommendation.It incorporates both local temporal and spatial transition context within the RNN architecture; • DeepMove [67]: It is a state-of-the-art attentional RNN model which capture the sequential transitions by jointly embedding the multiple factors.

Table 2 .
Basic statistics of Foursquare and Gowalla datasets.

Table 3 .
Performance of AST-GRU with varying window width by Recall@20.