Understanding User Preferences in Location-Based Social Networks via a Novel Self-Attention Mechanism

: The check-in behaviors of users are ubiquitous in location-based social networks in urban living. Understanding user preferences is critical to improving the recommendation services of social platforms. In addition, great quality of recommendation is also beneﬁcial to sustainable urban living since the user can easily ﬁnd the point of interest (POI) to visit, which avoids unnecessary consumption, such as a longer time taken for searching or driving. To capture user preferences from their check-in behaviors, advanced methods transform historical records into graph structure data and further leverage graph deep learning-based techniques to learn user preferences. Despite their effectiveness, existing graph deep learning-based methods are limited to the capture of the deep graph’s structural information due to inherent limitations, such as the over-smoothing problem in graph neural networks, further leading to suboptimal performance. To address the above issues, we propose a novel method built on Transformer architecture named spatiotemporal aware transformer (STAT) via a novel graphically aware attention mechanism. In addition, a new temporally aware sampling strategy is developed to reduce the computational cost and enable STAT to deal with large graphs. Extensive experiments on real-world datasets have demonstrated the superiority of the STAT compared to state-of-the-art POI recommendation methods.


Introduction
With the popularization of intelligent device applications and the continuous progress of mobile positioning service technology, users are more likely to share their living experiences on social platforms compared with the past decade. For instance, a user could share a photo about a scenic spot or a comment about a restaurant with their friends on location-based social networks [1] (LBSNs). A place that a user is interested in visiting is called a point of interest (POI).
Various types of urban life-related interest points (shopping malls, restaurants, parks, museums, tourist attractions, entertainment venues, etc.) have emerged in large numbers on the Internet in a sustainable manner. All kinds of urban life and daily clothes not only enrich peoples' lives, but also bring about the problem of "choice paralysis". Point of interest recommendation is a personalized recommendation based on contextual information and location awareness. A point of interest recommendation relates users and points of interest, aiming to recommend new, interesting locations to users. Point of interest recommendations based on location-based social networks play an important role in providing a better life and services for people. Hence, point of interest (POI) recommendation is one of the most important services in LBSNs, whose goal is to recommend POIs to users according to the check-in records of users [2][3][4]. Great POI recommendation services can improve the lives of urban users since the recommendation system can help users easily find their POIs, avoiding a long time searching, which is also beneficial to sustainable urban living.
However, the complex contextual information in the POI recommendation scenario, including geographical context and temporal context, makes the POI recommendation task more challenging than other conventional recommendation tasks, such as movie recommendation and music recommendation.
Such a challenging task has attracted great attention in the field of recommendation systems. Various techniques [5][6][7][8][9] have been proposed to enhance the performance of this task, such as matrix factorization-based methods and collaborative filtering-based methods. Among a diversity of techniques, deep learning-based methods have shown remarkable performance and exhibited high flexibility, which has attracted recent attention. Graph embedding-based models and graph neural network-based models are two representative approaches of deep learning-based methods.
The graph embedding-based models [6] aim to learn the low-dimension representations of users and POIs based on the graphs constructed according to the check-in records of users. By introducing different contextual information to generate the corresponding graphs, graph embedding-based models can flexibly integrate various information to learn the representation of nodes.
The goal of graph neural network-based models [8] is to leverage advanced graph deep learning techniques to learn the representations of users and POIs from both topology features and semantic features. Similar to the former, methods in this category also utilize generated graphs to capture the influences of different contextual factors. Benefiting from a message-passing mechanism, graph neural network-based models can extract more deep information from a graph's structural data, resulting in superior performance.
Despite effectiveness, the above two types of methods share a common weakness in that they are unable to capture the long-range dependencies of users and POIs. Since these methods treat users and POIs as nodes of graphs, they only leverage the topology information for learning node representations. For instance, most graph embedding-based methods regard immediate neighbors as the positive nodes, which indicates that the information of immediate neighbors is meaningful, while graph neural network-based methods depend on a message-passing mechanism, which has been proven to suffer from over-smoothing problems when obtaining remote nodes.
In light of the above limitations, in this paper we propose a novel recommendation model built on a Transformer architecture named spatiotemporal aware transformer (STAT). By introducing a self-attention mechanism, the STAT can carefully capture long-range dependencies via a full attention matrix. To enhance the expressiveness of the original self-attention mechanism, we propose a novel geographically aware attention mechanism that integrates geographical information into an attention matrix, which better preserves interactions between POIs. Moreover, for generalizing the transformer to large-scale social networks, we develop a temporally aware node sampling strategy that utilizes the temporal factor to sample relevant POIs for model training. To validate the effectiveness of the STAT, we conduct extensive experiments on widely used real-world datasets. The experimental results demonstrate the superiority of our proposed STAT compared to the representative graph-embedding methods and graph neural network-based methods.
The main contributions of this paper are summarized as follows: • We propose a Transformer-based model named STAT, which leverages the novel geographically aware attention mechanism to learn the representations of POIs for the recommendation of urban life-related services.

•
We develop a temporally aware sampling strategy that samples relevant POIs according to the check-in timestamp, which carefully preserves the influence of the temporal factor.
• We conduct extensive experiments on real-world urban life-related datasets, demonstrating our proposed STAT's effectiveness.

Related Work
In this section, we briefly review recent related works from the perspective of utilizing temporal factors and geographical factors.

Temporal Factor
The check-in behaviors of users are influenced by time factors to a large extent. For example, users tend to visit a restaurant, not a bar, to have a meal and a drink at twelve noon. In addition, users may share similar check-in habits in the same temporal context. Thus, exploiting the influence of temporal factors is crucial to capturing user preferences.
A general idea is to divide time into several timestamps to learn the representations of users based on different time patterns [10]. Christoforidis et al. [2] constructed two types of directed bipartite graphs to represent the interaction of a user or a POI and a specific timestamp and further leverages a graph embedding method to learn the representation vectors of users and POIs. Yuan et al. [4] proposed a collaborative recommendation model, which incorporates temporal information for generating the recommended POIs. Kefalas et al. [5] considered the temporal dimension and measured the impact of time on various time intervals for learning user preferences. Xie et al. [6] proposed a unified graph model to explore semantic vectors from a temporal context. Different from the above studies, Doan et al. [7] utilized the long short-term memory (LSTM) recurrent neural network to simultaneously capture both the sequential and temporal features of users' representations. Gao et al. [9] proposed a temporal state to represent the specific hour of the day and further utilize different temporal states to learn the check-in habits of users. Dai et al. [11] developed a spatiotemporal neural network framework to utilize the check-in history and social ties of users for recommending personalized POIs. Wang et al. [12] proposed a graph-enhanced spatial-temporal network that leverages the recurrent neural network (RNN) to learn user-specific temporal dependencies. Wang et al. [13] developed a time-aware position encoder to consider the temporal intervals among POIs separately.

Geographical Factor
The geographical factor is also one of the most important factors in capturing unique user preferences. Due to the expensive cost of check-in behavior, POIs which are far from the current location are not considered. Moreover, users tend to visit places near their lives. For instance, users may go to a nearby restaurant when they visit a mall. Hence, considering the geographical factor is beneficial to providing meaningful POIs for users.
Ye et al. [14] utilized the naïve Bayesian classifier to explore the geographical influence of POI recommendations based on the collaborative recommendation algorithm. Sun et al. [15] leveraged a geo-dilated RNN for short-term preference learning. Liu et al. [16] proposed a geographical probabilistic factor framework that leveraged matrix factorization techniques to integrate the geographical factor into the model to learn the user's presentations. Liu et al. [17] characterized the geographical clustering phenomenon more precisely based on the location visual content, which can improve the recommendation performance. Huo et al. [18] developed a geographical location privacy-preserving method based on Laplacian distributed noise to preserve the privacy of users. Su et al. [19] utilized the graph convolutional network (GCN) to integrate social relationships and geographical influence. Zhang et al. [20] proposed a personalized geographical influence method that jointly learns the geographical and diversity preferences of users. Liu et al. [21] developed a geographical-temporal awareness hierarchical attention network that utilizes the attention mechanism to capture the subtle POI-POI relationships from a multi-contextual perspective. For more works about POI recommendation, please refer to the recent surveys [22,23].

Preliminaries
In this section, we first introduce the definitions and notations in this paper. Then, we provide a description of Transformer architecture.

Definitions
Definition 1 (POI). A POI v contains several types of features, including geographical features, category features and attribute features. l v denotes the geographical information of POI v, which is described by longitude and latitude. Category features and attribute features determine the property of POI v, for instance, "restaurant" is the category information of POI v and the decoration style is one of the attribute features of POI v. In this paper, we regard the category information and attribute information as the semantic features, and we utilize the widely used embedding technique Word2Vec [24] to generate the semantic features X ∈ R m×d , where d denotes the dimension of the feature vector and m denotes the number of POIs.

Definition 2 (check-in record).
A check-in record c = (u, v, t) contains the history of check-in behavior of a user u, where t denotes the check-in timestamps. In this paper, we leverage 24 h as the division of the timestamps.

Definition 3 (top-k POI recommendation).
Given the history records C u of user u, the goal of POI recommendation is to provide a recommended POI list {v 0 , . . . , v k−1 } whose length is k according to the query (u, l, t), where l denotes the current location, which is also described by longitude and latitude, and t denotes the current timestamp.

Transformer
The Transformer's architecture [25] follows the structure of an auto-encoder that consists of an encoder and a decoder. In this paper, we utilize the Transformer's encoder to learn the representations of POIs. Next, we introduce the components of the encoder.
The Transformer's encoder is composed of a multi-head attention (MHA) module and a feed-forward neural network (FFN) module. For brevity, we introduce single-head attention (SHA), which can easily extend to MHA. Suppose we have the input feature matrix H ∈ R n×d , the output of SHA is calculated as follows: where W Q , W K and W V are the learnable parameter matrices and the dimensions of these matrices are the same d × d w . ρ(·) denotes the softmax function [25] and A represents the attention matrix, which preserves the interactions of all item pairs. The FFN module contains two linear layers and a nonlinear activation function: where ζ 1 (·) and ζ 2 (·) denote the linear layer with different parameters, which is a basic module of a neural network. σ(·) denotes the nonlinear activation function, and the residual connection technique is adopted to enhance the expressiveness of the FFN module's output.

Methodology
In this section, we detail our proposed spatio-temporal aware transformer (STAT) method, which consists of two main designs: geographically aware attention mechanism and temporally aware sampling strategy. We introduce them in turn. The framework of STAT is shown in Figure 1

Methodology
In this section, we detail our proposed spatio-temporal aware transformer (ST method, which consists of two main designs: geographically aware attention mechan and temporally aware sampling strategy. We introduce them in turn. The framewo STAT is shown in Figure 1.

Geographically Aware Attention Mechanism
As mentioned in Section 3.1, for each POI, we first constructed the semantic featu resulting in a feature matrix X . Recent advanced graph deep learning-based meth transform check-in records into graph structural data, and further leverage popular g deep learning techniques, such as graph embeddings and graph neural networks, to l the representations of users and POIs. However, these techniques limit the model to turing the long-range dependencies of POIs due to their fixed graph structure.
In light of this limitation, in this paper, we leveraged the self-attention mechanis Transformer architecture to learn the interactions between POIs. Since the self-atten mechanism regards all input items as connected, the learned representations of POIs obtain more meaningful global information.
However, the original self-attention mechanism was developed for learning the resentations of words in a sentence, which does not carry extra information except sem tic features. The POIs, discussed in Section 3.1, have complex information such as sem tic information and geographical information, and the geographical information is portant to learning the representations of POIs. Unfortunately, the calculation of th tention matrix is unable to capture this key information.
To enable the self-attention mechanism to preserve the geographical informatio POIs, we propose a novel geographically aware attention mechanism. Intuitively, th lation of POIs is sensitive to the geographical distance of them. According to Tobler's law of geography that states "Everything is related to everything else, but near thing more related than distant things" [26], we calculated the distances of all POI pairs as a to strengthen the attention matrix to capture the geographical influence:

Geographically Aware Attention Mechanism
As mentioned in Section 3.1, for each POI, we first constructed the semantic features, resulting in a feature matrix X. Recent advanced graph deep learning-based methods transform check-in records into graph structural data, and further leverage popular graph deep learning techniques, such as graph embeddings and graph neural networks, to learn the representations of users and POIs. However, these techniques limit the model to capturing the long-range dependencies of POIs due to their fixed graph structure.
In light of this limitation, in this paper, we leveraged the self-attention mechanism of Transformer architecture to learn the interactions between POIs. Since the self-attention mechanism regards all input items as connected, the learned representations of POIs can obtain more meaningful global information.
However, the original self-attention mechanism was developed for learning the representations of words in a sentence, which does not carry extra information except semantic features. The POIs, discussed in Section 3.1, have complex information such as semantic information and geographical information, and the geographical information is important to learning the representations of POIs. Unfortunately, the calculation of the attention matrix is unable to capture this key information.
To enable the self-attention mechanism to preserve the geographical information of POIs, we propose a novel geographically aware attention mechanism. Intuitively, the relation of POIs is sensitive to the geographical distance of them. According to Tobler's first law of geography that states "Everything is related to everything else, but near things are more related than distant things" [26], we calculated the distances of all POI pairs as a bias to strengthen the attention matrix to capture the geographical influence: where X i and X j are the semantic feature vectors of POI v i and POI v j . ς(v i , v j ) is a distance calculation function determined by the longitude and latitude of POIs. φ(·) denotes the learnable projection layer that transforms the distance into a scalar value. In this way, the geographical information can be carefully preserved in the attention matrix, which enables the self-attention mechanism to capture the influence of the geographical factor. Based on the geographically factor-aware attention matrix, we obtained the representations of POIs H V ∈ R m×d w according to Equations (2) and (3).

Temporally Aware Sampling Strategy
Another limitation of generalizing Transformer to POI recommendation is the huge computational consumption of the self-attention mechanism, which is the quadratic computational and storage complexity with the number of POIs. In addition, in real-world applica- tions, the number of POIs is usually very large, making it hard to train a Transformer-based method. On the other hand, not all POIs have close connections. As discussed in Section 2.1, POIs that have been visited at the same timestamp have more similar features. To preserve this similarity and reduce the training cost, in this paper, we propose a temporally aware sampling strategy, which samples the similar POIs according to check-in timestamps.
Specifically, for each POI v, we sampled a list N v,t = {v 0 , . . . , v s−1 } at the timestamp t, where s denotes the length of the list. Such a POI list can be regarded as a sentence in the field of natural language processing. And we sampled several lists for POI v according the check-in history record of v.
In this way, the training cost of the self-attention mechanism was reduced from O(m 2 ) to O(s 2 ), which is affordable for most training environments. More importantly, the sampling process was conducted before the model training stage so that the mini-batch training technique could employed, which guaranteed the scalability of the proposed model.
Moreover, the temporally aware sampling strategy filtered out irrelevant POIs based on check-in history records, which also enhanced the effectiveness of the self-attention mechanism and captured the influence of the temporal factor.

User Preference Estimation
Through the above components, we obtained the representation vectors of the POIs. Then, we leveraged them to estimate the representations of users given the fact that the visited POIs can reflect user preference. For instance, if a user loves food, there will be a large number of check-in records at restaurants. To calculate the representations of users, a naïve idea is to sum all the representation vectors of visited POIs to represent the preference of the target user u: where H U u ∈ R 1×d w denotes the representation of user u. However, such a simple design makes it hard to capture dynamic user preferences since user preferences change over time.
To address this limitation, in this paper, we proposed a user preference estimation method based on the check-in timestamp: where t denotes the timestamp of the check-in record and t 0 denotes the current timestamp. γ(·) represents the sigmoid function. The motivation of this strategy is that the longer the time is, the smaller the impact of the check-in record on user preferences. In this way, we can carefully preserve the dynamics of user preferences.

Parameter Learning
In this paper, we adopted a widely used optimization method, Bayesian personalized ranking [27], for learning the model's parameters. Specifically, we constructed the following objective function: where v j denotes a random POI that user u has not visited. Θ represents the model's parameters and µ denotes the regularization coefficient. ψ(·) denotes the sigmoid function. By minimizing Equation (7), we learnt the model's parameters based on the stochastic gradient descent method.

Using STAT for Recommendation
For a POI recommendation service request S = (u, r, t), where u denotes the target user, r denotes the current location of the user u and t denotes the timestamp of the request. We first estimated the user's preference at timestamp t via Equation (6). Then, we calculated the interesting scores between the target user u and POIs that had not appeared in u's check-in records. Considering the cost of check-in behaviors, we filtered out the POIs more than 200 km away from the current location r, which is almost a two-hour drive. Finally, we ranked the POIs according to the interesting score and kept the top-k items as recommended POIs.

Experiments
In this section, we first introduce the experimental settings of this paper, including dataset, baselines, evaluation metrics and parameter settings. Then, we introduce the designs of the experiments. Finally, we report the experimental results and provide the discussions.
The Foursquare dataset was collected from the famous location-based social network Foursquare, which contains user check-in records in New York and Tokyo from 2010 to 2014. Each check-in record contains rich data elements, including the information of users and POIs, the check-in timestamp and POI categories. Information on users' friendships is also involved in the dataset.
The Gowalla dataset was collected from the mobile social media platform Gowalla, which contains the check-in records from the mobile users from February 2009 to October 2010. In Gowalla, each record only contains the information of users, POIs and timestamps of check-ins.
The statistics of the datasets are reported in Table 1. In practice, we removed users and POIs that had less than 20 check-in records. We partitioned the datasets with the timeseries, and the former 80% of records were selected as the training set, following recent works [28,29]. The rest were regarded as the test set.

Baseline
In this paper, we chose four representative methods for performance comparison, GE, STA, GT-HAN, GPR and GNN-POI. The first two are embedding-based models, while the latter three are GNN-based methods.
GE [6]: GE is a graph embedding-based method that jointly learns the representations of POIs from multi-context features, including geographical features, temporal features and semantic features. Then, GE calculates the recommendation scores of POIs based on the above-learned features.
STA [30]: STA is a translation-based embedding method that leverages geographical and temporal information to learn the representations of users and POIs. Specifically, STA constructs the translation relationship of users, POIs and contextual information and further introduces the translation-based framework to learn the representations of users and POIs.
GT-HAN [21]: GT-HAN is a hybrid model based on attention networks, where it first utilizes a geographical-temporal attention network to learn the representations of POIs from multi-contextual information and leverages a context-specific co-attention network to learn user preferences.
GPR [29]: GPR is a graph neural network-based method that extracts user preferences from check-in graphs constructed using geographical information. Specifically, GPR utilizes a graph auto-encoder to learn two types of geographical influences: ingoing influences and outgoing influences for learning complex geographical influences from users' checkin networks.
GNN-POI [28]: GNN-POI is hybrid model that leverages graph neural networks to learn node representations from a topological structure and utilizes bidirectional long short-term memory to model users' sequential check-in behavior, which comprehensively considers the influence of temporal and geographical factors.

Evaluation Metric
In this paper, we evaluated the performance of the recommendation task via three evaluation metrics followed [28,31]: precision [32] at k, recall [32] at k and normalized discounted cumulative gain (NDCG) at k, where k denotes the length of the recommendation list. We varied k from {5, 10, 15, 20} in experiments to observe the performance of models.

Performance Comparison
In this section, we evaluated all models on the Foursquare and Gowalla datasets via the mentioned evaluation metrics. Specifically, we ran each model ten times with different random seeds and reported the mean values of each evaluation metric. The results are shown in Figures 2-7.
From the figures, we can observe that the values are relatively low on each evaluation metric. This may be because of the high data sparsity of the real-world datasets, recent work [28,29] has also made similar observations. In addition, we can observe that our proposed STAT consistently outperforms all baselines on all datasets via the three evaluation metrics, which indicates the superiority of STAT. In addition, STAT beats GPR and GNN-POI, which implies that introducing the self-attention mechanism leads to better model performance than the message-passing mechanism. Moreover, we also observed that GNN-based methods outperform embedding-based methods, which declares that introducing semantic features can assist in learning more expressive representations of users and POIs.       From the figures, we can observe that the values are relatively low on each evaluation metric. This may be because of the high data sparsity of the real-world datasets, recent work [28,29] has also made similar observations. In addition, we can observe that our proposed STAT consistently outperforms all baselines on all datasets via the three evaluation metrics, which indicates the superiority of STAT. In addition, STAT beats GPR and GNN-POI, which implies that introducing the self-attention mechanism leads to better model performance than the message-passing mechanism. Moreover, we also observed that GNN-based methods outperform embedding-based methods, which declares that introducing semantic features can assist in learning more expressive representations of users and POIs.

Study of the Parameter Sensitiveness
In this section, we study the parameter sensitiveness of STAT. There is a key param-  From the figures, we can observe that the values are relatively low on each evaluation metric. This may be because of the high data sparsity of the real-world datasets, recent work [28,29] has also made similar observations. In addition, we can observe that our proposed STAT consistently outperforms all baselines on all datasets via the three evaluation metrics, which indicates the superiority of STAT. In addition, STAT beats GPR and GNN-POI, which implies that introducing the self-attention mechanism leads to better model performance than the message-passing mechanism. Moreover, we also observed that GNN-based methods outperform embedding-based methods, which declares that introducing semantic features can assist in learning more expressive representations of users and POIs.

Study of the Parameter Sensitiveness
In this section, we study the parameter sensitiveness of STAT. There is a key parameter of STAT, the number of sampling nodes s . Intuitively, a large value of s brings

Study of the Parameter Sensitiveness
In this section, we study the parameter sensitiveness of STAT. There is a key parameter of STAT, the number of sampling nodes s. Intuitively, a large value of s brings more information since the sampling list contains more nodes for aggregating the information.
To validate the impact of s on the model's performance, we conducted experiments using the Foursquare dataset by fixing k = 20 and varying s from {20, 40, 60, 80}. The results are reported in Table 2. We can observe that the performance of the model increases first and then decreases with the increase of s, and STAT achieves the best performance when s = 40. This may be because when s is large, the sampling list may contain partial relevant nodes, which could introduce noisy information, further hindering the model's performance. Additionally, a small value of s achieves competitive performance. In this paper, we used the grid research method to determine the value of s in different datasets.

Ablation Study
In this section, we study the influence of different user preference estimation methods on model performance. As discussed in Section 4.3, STAT utilizes a time-aware method to calculate user preference via Equation (6), which can capture dynamic user preferences. Hence, we provided a variant of STAT named STAT-T. In STAT, we used Equation (5) to learn the user's preference, which means that we ignored the influence of the temporal factor on user preference. We ran the above two methods on the Foursquare dataset and the results are shown in Figures 8-10.

Ablation Study
In this section, we study the influence of different user preference estimation methods on model performance. As discussed in Section 4.3, STAT utilizes a time-aware method to calculate user preference via Equation (6), which can capture dynamic user preferences. Hence, we provided a variant of STAT named STAT-T. In STAT, we used Equation (5) to learn the user's preference, which means that we ignored the influence of the temporal factor on user preference. We ran the above two methods on the Foursquare dataset and the results are shown in Figures 8-10.   Intuitively, user preferences change over time, and recent check-in behaviors better represent the user's preferences, while early check-in records generally have little influence on the user's preferences. From the results, we observe that STAT outperforms STAT-T. This phenomenon indicates that our developed time-aware user preference estimation method can better capture the user preference, demonstrating that the temporal factor is beneficial to accurately obtaining user preferences.

Efficiency Study
In this section, we conduct experiments to analyze the training cost of our proposed STAT. Since baselines are implemented using different programming languages, such as the official implementation of GE, which is based on C++. We only reported the running time and the memory cost of STAT, and our implementation is based on Python and the Pytorch framework. The experiments were conducted on a Linux server with one I9-10900K CPU and one RTX 2080TI GPU. The original cost of Transformer was too large to afford. Benefitting from our proposed temporally aware sampling strategy, we utilized the minibatch training method for training the model on large-scale location-based social networks, guaranteeing the scalability of STAT. The experimental results are reported in Table 3. As mentioned in Section 4.2, the complexity of STAT is mainly related to the length of the sampling sequence. Hence, the memory cost was reduced to an affordable value. If we use the original Transformer model, we encounter the out-of-memory problem on these datasets due to the complexity square of the number of POIs.

Conclusions
In this paper, we propose STAT, a Transformer-based POI recommendation model, which takes the geographical factor and the temporal factor into account to learn the representations of POIs. Specifically, STAT develops a novel geographically aware attention mechanism that integrates the geographical influence into the self-attention mechanism to enhance the expressiveness of the attention matrix in the POI recommendation scenario. In addition, to generalize the Transformer architecture to a large-scale location-based social network, STAT proposes a temporally aware sampling strategy that samples several relevant nodes based on the check-in timestamp. In this way, the influence of the temporal factor was carefully preserved. Moreover, STAT develops a time-aware user preference estimation to capture dynamic user preferences. To validate the effectiveness of STAT, we conducted extensive experiments on real-world datasets via three widely used evaluation metrices. The experimental results indicate that STAT consistently achieved highest values on different evaluation metrices compared to baselines. For instance, STAT obtained 7.3% of recall on Foursquare when the length of recommendation list was 5, while the second-best performance was 6.8% from GPR. We also developed the efficiency study to analyze the training cost of STAT, and the results show that our proposed STAT has a good scalability for a large-scale network.
The advantage of STAT is to leverage the self-attention mechanism to learn the representations of POIs from complex contextual information. However, STAT only considers the temporal factor and geographical factor. Other important factors, such as social information of users and category information of POIs, are not involved. One potential future work would be to develop a reasonable framework to jointly model the above factors to better learn user preferences.