Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation

Yu, Chenglin; Shi, Lihong; Zhao, Yangyang

doi:10.3390/ijgi14050192

Open AccessArticle

Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation

by

Chenglin Yu

,

Lihong Shi

and

Yangyang Zhao

^*

The Research Center for Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing 100830, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(5), 192; https://doi.org/10.3390/ijgi14050192

Submission received: 25 February 2025 / Revised: 23 April 2025 / Accepted: 1 May 2025 / Published: 3 May 2025

(This article belongs to the Special Issue Advances in AI-Driven Geospatial Analysis and Data Generation (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Next point-of-interest (POI) recommendation aims to predict users’ future visitation intentions based on historical check-in trajectories. However, this task faces significant challenges, including coarse-grained user interest representation, insufficient social modeling, sparse check-in data, and the insufficient learning of contextual patterns. To address this, we propose a model that combines check-in trajectory information with user friendship relationships and uses a Transformer architecture for prediction (TraFriendFormer). Our approach begins with the construction of trajectory flow graphs using graph convolutional networks (GCNs) to globally capture POI correlations across both spatial and temporal dimensions. In parallel, we design an integrated social graph that combines explicit friendships with implicit interaction patterns, in which GraphSAGE aggregates neighborhood information to generate enriched user embeddings. Finally, we fuse the POI embeddings, user embeddings, timestamp embeddings, and category embeddings and input them into the Transformer architecture. Through the self-attention mechanism, the model captures the complex temporal relationships in the check-in sequence. We validate the effectiveness of TraFriendFormer on two real-world datasets (FourSquare and Gowalla). The experimental results show that TraFriendFormer achieves an average improvement of 10.3% to 37.2% in metrics such as Acc@k and MRR compared to the selected state-of-the-art baselines.

Keywords:

next POI recommendation; graph neural networks; social influence; transformer

1. Introduction

With the advancement of smartphones and wireless communication technologies, location-based social networks (LBSNs), such as Foursquare, Gowalla, and Yelp, have rapidly developed globally. Social platforms provide users with the convenience of sharing geographic locations, recording life events, and checking into points of interest (POIs), thus enabling people to explore the world around them more effectively. Meanwhile, the large amount of check-in data on these platforms—comprising geographical information (e.g., longitude and latitude), check-in timestamps, and social relationship data—has driven research into next POI recommendation systems. These systems aim to predict the next most likely POI that a user will visit based on their historical check-in sequences and other multimodal information. Such systems can enhance users’ exploration efficiency and service experience, thus attracting significant attention in both academic and commercial fields [1].

Previous next POI recommendation methods treat user check-ins as a continuous sequence, predicting the next location by capturing dynamic user preferences and geographical factors. From traditional machine learning methods, such as Markov chains [2,3], matrix factorization [4,5], and Bayesian personalized ranking [6,7,8], to deep learning-based methods, such as the use of recurrent neural networks [9,10,11,12], these methods, although capable of making predictions based on historical behavior data (such as check-in records), often overlook the social relationships between users and the underlying patterns behind their behavior. Specifically, the current next POI recommendation methods still face the following challenges.

Coarse user interest representation and neglect of intensity differences: Existing methods typically model a user’s interest in a POI as a binary variable (checked in vs. not checked in), thereby overlooking variations in interest intensity across different POIs. In practice, POIs that are visited more frequently provide a stronger signal of a user’s preferences. Consequently, leveraging check-in frequencies to model user interests at finer granularity is critical in enhancing the recommendation accuracy.
Single-faceted social modeling: Most existing methods rely solely on explicit friendship ties or compute user similarity based on behavioral patterns, without constructing a unified social graph that can capture latent influence mechanisms within the network. In reality, social networks encompass not only explicit friendships but also a wealth of implicit connections formed by shared interests and behaviors. Integrating both types of relationships into a comprehensive social graph is essential to fully exploit social information and improve the recommendation performance.
Lack of a unified framework for contextual integration: Factors such as user preferences, check-in timestamps, POI categories, and social influences all play critical roles in POI recommendation. However, existing models typically address only one or two of these aspects and lack a generalizable framework that cohesively integrates diverse contextual information.

Therefore, determining how to integrate check-in trajectories and social information to improve the accuracy of next POI prediction has become a critical challenge in recommendation systems. To address this issue, we propose a next POI prediction method based on graph neural networks and the Transformer architecture. To address Challenge (1), we first construct a global POI graph based on all users’ check-in sequences and use a graph convolutional network (GCN) to learn the representation of each POI node, capturing deep information from user trajectories. Next, we generate weighted POI embedding vectors as user representation vectors by using the check-in frequency of each POI in the user check-in matrix as weights. This method allows for a more accurate reflection of users’ interest preferences and alleviates the issues associated with binary representations in the check-in matrix, as detailed in Section 4.1.

To address Challenge (2), we calculate the cosine similarity between the weighted POI embeddings of different users to obtain the similarity between users, which is then treated as an implicit friendship. We introduce user friendship relationships as explicit friendships, and, based on both implicit and explicit friendships, we construct a user social graph. This graph structure comprehensively represents the relationships between users and captures the potential influence of social networks on user behavior.

Building on this, we use the GraphSAGE [13] model for embedding learning, propagating information through graph convolutions in the social graph structure to further optimize the user embeddings. The underlying assumption of the user social graph is that a user’s behavior is influenced not only by their own historical actions but also by the actions of others within their social circle. For example, POIs recommended by friends are more likely to attract a user’s attention than those recommended by strangers. As shown in Figure 1, two users with a friendship relationship exhibit highly similar check-in sequences. Therefore, social recommendation systems must fully leverage user similarities and social network information to improve the next POI recommendation accuracy. This approach effectively combines social relationships and behavioral data, thus enhancing the performance of social recommendation systems, as detailed in Section 4.2.

Finally, to address Challenge (3), building upon the aforementioned innovative approach, we further adopt the Transformer architecture for next POI prediction using user embedding vectors. The encoder–decoder structure of the Transformer offers exceptional performance, being particularly suitable for the handling of sequential data and complex contextual relationships. We fuse POI embeddings, user embeddings, timestamp embeddings, and POI category embeddings and input them into the Transformer architecture for next POI prediction, as detailed in Section 4.3.

In summary, the main contributions of this study are as follows.

We propose a novel model framework, TraFriendFormer, which is unique in its joint modeling of user check-in trajectories and their friendship relationships. This model utilizes graph neural networks and the Transformer architecture, effectively capturing user interests and the influence of friendships on their check-in behavior.
We develop a method for the construction of user representation vectors using weighted POI embedding vectors. These POI embeddings, generated with check-in frequencies as weights, serve as user representation vectors, capturing the intensity of user interest in different POIs.
We propose a novel method for the construction of a user social graph, which simultaneously considers user friendships and behavioral similarities to model the impact of social relationships on user check-ins. We use the GraphSAGE model to encode and embed the social graph.
We conduct extensive evaluation experiments on two real-world datasets to confirm the effectiveness and superiority of our model. The experimental results show that the proposed method outperforms baseline models across multiple metrics, thus demonstrating superior performance in terms of prediction accuracy and user preference capture. Additionally, we use ablation studies to validate the effectiveness of user friendship relationships.

2. Related Work

2.1. Next POI Recommendation Based on Graph Neural Networks

In recent years, with the rapid development of deep learning technologies and the promotion of complex data relationships in social networks, graph neural networks (GNNs) have garnered widespread attention [14]. GNNs are effective tools to capture relationships between nodes in complex networks and generate high-dimensional, non-linear node representations, thereby obtaining high-quality node information. Monti et al. constructed user information as a graph structure and utilized GNNs to extract useful information from the nodes of both users and items in the network, which was then used to evaluate user preferences [15]. Ying et al. proposed a method based on graph neural networks (GNNs) that utilizes random walks to better explore the relationships between nodes. By traversing multiple paths, more node information within these paths can be obtained, thus capturing the structural relationships between nodes in the network more effectively [16]. Wang et al. introduced a neural graph collaborative filtering (NGCF) recommendation model, which encodes collaborative information in the form of higher-order connectivity by performing embedding propagation [17]. Z. Zhong et al. proposed a hybrid graph convolutional network (GCN) model for POI recommendation based on a multi-head attention mechanism. This model constructs a spatial graph based on the geographical distances between POIs and uses GCNs to learn the higher-order connectivity between POIs. It not only considers spatial constraints but also overcomes the data sparsity problem [18]. To fully leverage POI feature information, Wu et al. employed GCNs to model and learn the adjacent geographical locations of each POI. By integrating the features of adjacent POIs, this model learns the impact of geographical locations on user preferences [18,19]. Tang et al. proposed a region-aware POI recommendation model, which constructs a semantic space graph and uses GNNs to model the relationships between POIs within the semantic space, extracting POI regional features and user preferences [20]. Abu et al. introduced an attention-based random walk method, which constructs a co-occurrence matrix to calculate attention weights and uses random walks to learn embedding information between nodes [21]. Li et al. designed a POI sequence-to-graph enhanced recommendation model that jointly learns POI embeddings and infers temporal preferences, but it fails to represent global POI information [22]. Lim et al. proposed a hierarchical multi-task graph recurrent network (HMT-GRN) method to alleviate the data sparsity issue, yet it also does not consider global information [23].

The models mentioned above have made progress in addressing issues such as sparsity, mining higher-order adjacency information, and integrating spatiotemporal or attention mechanisms. However, they all overlook global transition patterns within users’ historical check-in sequences and lack the integration of multi-source contexts, such as trajectory flows, social relationships, and temporal categories.

2.2. Next POI Recommendation Based on User Social Networks

In next POI recommendation systems, relying solely on historical trajectories may lead to a decrease in recommendation accuracy [24,25]. As social beings, human decision-making is often influenced by factors such as social status, friends, neighbors, and culture, and these social influences play a significant role in users’ visits to POIs. In next POI recommendation systems, some users tend to explore new places that they have not visited before. This behavior complicates the prediction of their next move based only on historical trajectories. Therefore, when a user’s friends give positive feedback on a particular POI, the user is more likely to visit this place [26]. To address the cold start problem, social circle information can also help the model to capture the preferences of new users. The system can improve the prediction performance by recommending POIs frequently visited by users’ friends. Thus, some methods enhance their recommendation performance by considering users’ social information [27,28,29]. Kosar et al. incorporated social, geographic, and temporal information into matrix factorization (MF) and modeled predictions based on user similarities in co-check-ins and their friendship relationships [30]. Christoforidis et al. combined social influences with spatial and temporal contexts, integrating graphs into a unified predictive model [31]. Kefalas et al. attempted to use user reviews to capture social influences, considering that combining social influences with spatial and temporal contexts can significantly improve the recommendation quality [32]. Fan et al. proposed a graph neural network model for social recommendations, which allows for the deeper modeling and learning of the user’s social network [33]. L. Lim et al. constructed an undirected user–user graph, in which nodes represent users and edges represent the similarity between users [34]. Huang et al. constructed various types of user–user social graphs based on users’ family and occupational backgrounds to improve the recommendation effectiveness [35].

In summary, recent methods for next POI recommendation based on user social information have made progress by combining friendship relationships, social influences, and geographic–temporal contexts, achieving some success in alleviating cold start and data sparsity issues. However, these approaches often rely solely on explicit social networks or simple similarity measures, lacking the exploration of deeper social relationships. They also fail to capture global transition patterns in users’ POI visits and do not unify the integration of trajectory flows, social graphs, and spatiotemporal contexts.

3. Problem Formulation

We assume that

U = \{u_{1}, u_{2}, u_{3}, \dots, u_{m}\}

represents the user set,

L = \{l_{1}, l_{2}, l_{3}, \dots, l_{n}\}

represents the POI set, and

T = \{t_{1}, t_{2}, t_{3}, \dots, t_{k}\}

represents the timestamps, where

m, n

denote the number of users and POIs, respectively.

u_{i}

represents the

i - t h

user, and

l_{j}

represents the

j - t h

POI, with each POI

l_{j} \in L

containing

〈l a t, l o n, c a t〉

, which represent the latitude, longitude, and category information of the POI, respectively.

Check-in: A check-in made by each user is represented as a tuple

s = 〈u_{i}, l_{j}, t_{k}〉

, which indicates that user

u_{i}

visited POI

l_{j}

at time

t_{k}

. The check-in frequency of a user for a particular POI is represented by a matrix

R \in ℝ^{m \times n}

, where each entry

r_{u, l} \in R

represents the number of times that user

u

checked in at POI

l

.

Trajectory: The entire check-in sequence of user

u_{i}

is sorted by time as

S^{u} = (s_{1}^{u}, s_{2}^{u}, \dots, s_{n}^{u})

, where

s_{m}^{u} \in S^{u}

represents the

m - t h

check-in record. For a more fine-grained analysis, we divide each user’s check-in sequence

S^{u}

into weekly intervals, i.e.,

S^{u} = (S_{1}^{u}, S_{2}^{u}, \dots, S_{k}^{u})

, where

S_{p}^{u}

represents the check-in records of user

u

during the

p - t h

week.

Next POI recommendation: Given a user’s check-in sequence

S^{u} = (s_{1}^{u}, s_{2}^{u}, \dots, s_{n}^{u})

, the goal of next POI recommendation is to predict the most likely POI that the user will visit next, denoted as

s_{n + 1}^{u}

. Specifically, the next POI recommendation system needs to select a set of POIs from

L

that best matches the user’s next visit behavior.

4. Methods

The proposed TraFriendFormer model is illustrated in Figure 2 and consists of three components. (1) Trajectory Learning: We construct a global trajectory flow graph and use a graph convolutional network (GCN) to learn POI embeddings. Based on user check-ins, we build a weighted interest representation vector for each user. (2) User Embedding Learning: We combine implicit friendships (based on user representation vectors) and explicit friendships (based on user social relationships) to construct a user social graph. The GraphSAGE model is used to learn user embeddings. (3) Encoder–Decoder: In addition to POI embeddings and user embeddings, we also consider timestamp embeddings and POI category embeddings, which are fused into a check-in embedding vector. Finally, we use a Transformer encoder–decoder framework to output POI predictions.

4.1. POI Embedding and User Representation Learning

4.1.1. POI Embedding

A user’s check-in records are the most direct features that describe their interest preferences. Different check-in records reflect the user’s behavioral patterns in both space and time, such as frequently visited locations or occasional exploratory behaviors. To explore user behavior patterns, we construct a global POI graph based on the check-in records of all users. By modeling the access relationships between POIs, we capture the spatial transition patterns within the user’s trajectory.

In summary, although the above methods effectively utilize graph neural networks (GNNs) and user social information to enhance the next POI recommendation performance, existing approaches have not considered the combination of user trajectories and user friendship relationships, which results in the underutilization of the potential of social networks in recommendations. Furthermore, existing social information-based recommendation methods typically model user similarity through simple social matrices, lacking effective strategies to integrate user behavior data and social information. We adopt the POI embedding method proposed by Song et al. [36], in which a global POI trajectory flow graph

G_{p o i}

is constructed, and a graph convolutional network (GCN) is used to learn the embedding representations of each POI. This method generates the transition relationships between POIs based on users’ check-in records, with the number of visits between two POIs used as the edge weight of the graph. Specifically, the weighted directed graph

G_{p o i} = (V, E, l, 𝓌)

is defined, where

V \in L

represents the nodes in the graph;

E

represents the edges in the graph;

l

represents the node attributes, including

〈l a t, l o n, c a t, f r e q〉

, where

f r e q

indicates the frequency of the node’s appearance; and

𝓌

represents the edge weights, which are equal to the number of visits between the two nodes.

Once the trajectory flow graph

G_{p o i}

is constructed, we use a graph convolutional network (GCN) [14] to learn the embedding representations of POIs.

A \in ℝ^{N \times N}

denotes the adjacency matrix of the graph

G_{p o i}

, and

I_{N}

represents the identity matrix. The specific calculation is as follows:

\begin{matrix} \begin{matrix} \tilde{A} = A + I_{N} \end{matrix} \end{matrix}

(1)

\begin{matrix} \tilde{L} = {(D + I_{N})}^{- 1} \tilde{A} \end{matrix}

(2)

Here,

\tilde{A}

is the adjacency matrix of the graph

G_{p o i}

,

D

is the degree matrix, and

\tilde{L}

represents the normalized Laplacian matrix. The node embeddings are updated through the multi-layer GCN propagation rule as follows:

\begin{matrix} H^{(k + 1)} = σ (\tilde{L} H^{(k)} W^{(k)} + b^{(k)}) \end{matrix}

(3)

In this equation,

H^{(l)}

represents the input signal at the

(k + 1) - t h

layer,

H^{(0)} = X \in ℝ^{N \times D}

is the input feature matrix of the nodes,

W^{(k)} \in ℝ^{D \times N}

is the weight matrix of the

k - t h

layer,

b^{(k)} \in ℝ^{D \times N}

is the bias matrix, and

σ

denotes the ReLU activation function. Finally, the output of the GCN module is expressed as follows:

\begin{matrix} E_{l} = \tilde{L} H^{(k + 1)} W^{(k + 1)} + b^{(k + 1)} \end{matrix}

(4)

In this equation,

E_{l} \in ℝ^{N \times D}

is the matrix representation of all POIs, with the

i - t h

row representing the embedding vector of POI

e_{l_{i}}

. The POI embeddings obtained through the GCN express the common check-in patterns of all users, which will later be fused with the user embedding vectors and sequentially input into the Transformer module for prediction.

4.1.2. User Representation Learning

The embedding vectors of points of interest (POIs) derived from the graph convolutional network (GCN) module serve as general representations. To accurately capture user preferences, we construct weighted interest representation vectors by combining POI embeddings with users’ check-in behaviors. Specifically, we use check-in frequencies as weights, mapping user–POI relationships into a weighted average vector space. This method not only reflects users’ intensity of interest in specific POIs but also provides a more precise representation of their preferences while addressing the issue of information scarcity due to sparse check-in data.

The process begins with extracting the check-in record

r_{u_{i}} = R [i, :] \in ℝ^{n}

from the check-in matrix

R

, which denotes the number of check-ins made by user

u_{i}

at all POIs. Based on

r_{u_{i}}

, the corresponding POI embeddings

E_{u_{i}} = \{E [j, :] |r_{u_{i}} [j] > 0\}

are selected, and the check-in count for each POI

w_{j} = r_{u_{i}} [j]

is retrieved. The user representation vector

h_{u} \in ℝ^{D}

is computed as follows:

\begin{matrix} h_{u} = \frac{\sum_{j \in L_{u}} w_{j} \cdot E [j, :]}{\sum_{j \in L_{u}} w_{j}} \end{matrix}

(5)

In this equation,

L_{u}

represents the set of POIs in which user

u_{i}

has check-in records. If a user has no check-in history, their vector representation is omitted. The generated user representation vectors serve as fundamental data for the subsequent user similarity calculations and social graph construction.

4.2. Social Graph Modeling and User Embedding Learning

Humans are social beings, and social information significantly influences users’ interests and behaviors. To comprehensively model the relationships between users, we construct a user social graph by integrating both user behavioral preference similarity and friendship relationships and further learn user embedding representations using the GraphSAGE network.

4.2.1. Social Graph Modeling

Inspired by Kosar et al. [30], we treat user behavioral similarity as implicit friendships and friendship relationships as explicit friendships, thus collectively constructing a user social graph

G_{u s e r} = (U, E_{u e s r})

, where

E_{u e s r}

represents the edges between nodes. To obtain implicit friendships, we use the cosine similarity method to calculate the similarity between user representation vectors, computed as follows:

\begin{matrix} s i m (u_{i}, u_{j}) = \cos (h_{u_{i}}, h_{u_{j}}) = \frac{h_{u_{i}} \cdot h_{u_{j}}}{‖ h_{u_{i}} ‖ ‖ h_{u_{j}} ‖} \end{matrix}

(6)

In this equation,

h_{u_{i}}

and

h_{u_{j}}

represent the weighted interest vectors for users

u_{i}

and

u_{j}

(generated in the first part), respectively, and

‖ \cdot ‖

denotes the

l_{2}

norm of the vector. The more similar the behaviors of two users, the closer the value of

s i m (u_{i}, u_{j})

will be to 1. For a user, explicit friendships may directly influence the next POI visited through recommendations, while implicit friendships, based on behavioral similarity, provide a valuable reference for the prediction of the user’s next POI.

For explicit friendships, i.e., the direct friendship relationships between users, we use the set

M_{u} = \{u_{a}, u_{b}, \dots, u_{x}\}

to represent the direct friends of user

u

. Combining both explicit and implicit friendships, we then construct the user social graph

G_{u s e r}

, introducing the hyperparameter

β

to balance the influence of explicit friendships and implicit similarity on user behavior. Specifically, we set the weight of implicit friendships as

𝓌_{f}

and that of explicit friendships as

(1 - 𝓌_{f})

. The probability of interaction between two users is computed as follows:

\begin{matrix} P (u_{i}, u_{j}) = 𝓌_{f} \cdot s i m (u_{i}, u_{j}) + (1 - 𝓌_{f}) \cdot \frac{1}{M_{u_{i}}} \end{matrix}

(7)

In this equation,

P (u_{i}, u_{j})

represents the interaction weight between users

u_{i}

and

u_{j}

, reflecting the behavioral similarity between the two users. We use

P (u_{i}, u_{j})

to construct the user social graph

G_{u s e r} (U, E_{u e s r})

, where nodes represent users and edges

E_{u e s r}

indicate social relationships between users. A threshold

θ

is introduced, and, if the interaction weight

P (u_{i}, u_{j})

exceeds

θ

, an edge is established between the two users; if

P (u_{i}, u_{j})

is less than

θ

, the edge is filtered out. The user social graph is shown in Figure 3.

4.2.2. User Embedding Learning

After obtaining the user social graph

G_{u s e r}

, our goal is to further learn the relationships between users and represent them as vectors. To achieve this, we introduce the GraphSAGE (SAmple and aggreGatE) model [13]. Unlike graph convolutional networks (GCNs), which learn the embedding vectors of each node over the entire graph, GraphSAGE effectively aggregates the features of a node’s

k -

hop neighbors along with its own features through a set of aggregator functions. This allows for the generation of embedding representations that are suitable for both training nodes and unseen nodes during testing.

In our implementation, the specific architecture of GraphSAGE consists of two layers of SAGEConv layers, modeling the local structure of the user social graph. After each convolution, the user embeddings are further optimized through non-linear activation functions and feature propagation. The specific computations are as follows:

\begin{matrix} h_{u}^{(k)} = σ (W^{(k)} \cdot C O N C A T (h_{u}^{(k - 1)}, A G G R E G A T E_{k} (\{h_{u}^{(k - 1)}, \forall_{u} \in N_{u}\}))) \end{matrix}

(8)

\begin{matrix} e_{u} = \frac{h_{u}^{(k)}}{‖ h_{u}^{(k)} ‖} \end{matrix}

(9)

Here,

h_{u}^{(k)} \in ℝ^{D}

represents the embedding of user

u

after aggregating the features of

k -

hop neighbors,

σ

is the LeakyReLU activation function,

W^{(k)}

is the learnable weight matrix,

C O N C A T

denotes vector concatenation, and

N_{u}

is the set of neighboring nodes of user

u

.

A G G R E G A T E_{k}

is the aggregation function, and we choose mean aggregation for this, which calculates the average of user

u ’ s

surrounding neighbors’ vectors. At each iteration, the aggregation function allows the node to collect information from its neighbors, and, as the depth of iteration increases, it gathers more information. Finally,

h_{u}^{(k)}

is

l_{2} -

normalized to prevent numerical instability, yielding the user embedding vector

e_{u}

, which will be passed along with the POI embedding vectors to the subsequent Transformer model to predict the next POI.

4.3. Fusion and Prediction Based on Transformer

In this section, we introduce the Transformer framework [37] to fuse multimodal embeddings and predict the next POI. This process consists of two steps: multimodal embedding fusion and prediction using the Transformer architecture. By combining multimodal embeddings (POI embeddings, user embeddings, timestamp embeddings, and POI category embeddings), we enhance the model’s ability to represent user behavior features. At the same time, we leverage the powerful sequential modeling capabilities of the Transformer to handle the complex dependencies within the check-in sequence. Ultimately, a multi-layer perceptron (MLP) is used to predict the user’s next action.

4.3.1. Fusion of Feature Embeddings

In addition to the previously obtained POI embedding vectors and user embedding vectors, we further introduce timestamp embeddings and category embeddings to more comprehensively describe user behavior; these reflect the temporal patterns of user check-ins and the categorical features of POIs, respectively.

Capturing the temporal patterns of user check-in behavior is crucial in modeling users’ visiting preferences. We employ Time2Vec [38] as the time encoding model, which embeds time characteristics into a high-dimensional vector space through both linear and periodic transformations. We divide a day into 48 time slots, with the embedding representation of the

i - th

time slot defined as follows:

\begin{matrix} e_{t} [i] = \{\begin{matrix} ω_{i} t + φ_{i}, i f i = 0 \\ F (ω_{i} t + φ_{i}), i f 1 \leq i \leq k \end{matrix} \end{matrix}

(10)

Here,

ω_{i}

and

φ_{i}

are trainable parameters, and

F

represents the periodic activation function—specifically, the sine function, which learns the periodic behaviors of users. For category embeddings, we train a simple embedding layer that projects POI categories into low-dimensional vectors, represented as

e_{c} \in ℝ^{Ω}

, where

Ω

is the embedding dimension. The timestamp embeddings and category embeddings serve as supplementary features to the user’s check-ins, providing a more comprehensive understanding of the user behavior patterns and the semantic characteristics of POIs.

After obtaining all of the embedding vectors, the next step is to fuse these vectors and input them sequentially into the Transformer. To represent each check-in, we first fuse the POI embeddings and user embeddings based on the embedding dimensions, and we similarly fuse the timestamp embeddings and category embeddings. We then concatenate the two fused vectors, as shown in the following formulas:

\begin{matrix} e_{p, u} = σ (w_{1} [e_{p}; e_{u}] + b_{1}) \end{matrix}

(11)

\begin{matrix} e_{c, t} = σ (w_{2} [e_{c}; e_{t}] + b_{2}) \end{matrix}

(12)

Here,

w_{1}

and

w_{2}

represent the weights,

b_{1}

and

b_{2}

represent the biases, and

[\cdot; \cdot]

denotes vector concatenation, which doubles the dimension of the fused vector. Afterward, the concatenated vectors

e_{p, u} ℝ^{2 \times D}

and

e_{c, t} \in ℝ^{2 \times Ω}

are combined to form the vector for each check-in

s = 〈u_{i}, l_{j}, t_{k}〉

as

e_{s_{i}} = [e_{p, u}; e_{c, t}] \in ℝ^{2 \times (Ω + D)}

. Next, for the input trajectory sequence

S^{u} = (s_{1}^{u}, s_{2}^{u}, \dots, s_{n}^{u})

, we concatenate the check-in vectors to form the input vector

Ζ = [e_{s_{1}}; e_{s_{2}}; \dots; e_{s_{k}}] \in ℝ^{k \times d}

. To ensure that the model correctly utilizes the sequential order of the data, positional information must be incorporated into the sequence. To achieve this, we add positional encoding to each position in the sequence, generated using sine and cosine functions, as shown in the following formulas:

\begin{matrix} {PE}_{(p o s, 2 i)} = \sin (\frac{p o s}{10000^{\frac{2 i}{d}}}) \end{matrix}

(13)

\begin{matrix} {PE}_{(p o s, 2 i + 1)} = \cos (\frac{p o s}{10000^{\frac{2 i}{d}}}) \end{matrix}

(14)

Here,

p o s

is the position index in the sequence,

i

is the index of the dimension, and

d = 2 \times (D + Ω)

is the embedding dimension. The positional encoding is then added to the input vector, resulting in the final sequence embedding vector:

\begin{matrix} X = Ζ + Ρ Ε \end{matrix}

(15)

Here,

X \in ℝ^{k \times d}

serves as the input to the Transformer, containing all feature information of the check-in records, as well as their positional information within the sequence.

4.3.2. Transformer Sequence Encoding

After completing the processes of feature embedding fusion and positional encoding, we pass the sequence encoding into the Transformer encoder to learn the temporal relationships and dependencies in user check-ins. Through the Transformer’s multi-head self-attention mechanism and its stacked structure, we can capture complex long-range interactions in the check-in sequence. The Transformer encoder consists of multiple layers, each containing two sub-layers: a multi-head self-attention layer and a feed-forward neural network. Additionally, residual connections and layer normalization are applied after each sub-layer to enhance the model’s stability.

Given the input sequence

X \in ℝ^{n \times d}

, we linearly project it into

h

different spaces and then concatenate them to generate the output. The specific calculation is as follows:

\begin{matrix} M u l t i H e a d (X^{l}) = [h e a d_{1}; h e a d_{2}; \dots; h e a d_{h}] W^{O} \end{matrix}

(16)

\begin{matrix} h e a d_{i} = A t t e n t i o n (X^{l} W_{q}, X^{l} W_{q}, X^{l} W_{v}) \end{matrix}

(17)

Here,

h e a d_{i}

represents the projection of each head, and

W_{q}, W_{k}, W_{v} \in ℝ^{d \times d / h}

are the learnable weight matrices, with

h

being the number of attention heads. The attention function is the scaled dot-product attention, which is calculated as follows:

\begin{matrix} A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{⊤}}{\sqrt{d / h}}) V \end{matrix}

(18)

In this equation,

Q = X W_{q}

,

K = X W_{k}, V = X W_{v}

, are the query, key, and value vectors, respectively. The

s o f t m a x

function ensures that the attention weights sum to 1. To introduce non-linearity between different dimensions, a fully connected feed-forward network follows the attention sub-layer, which is represented as the following:

\begin{matrix} F C_{𝓍}^{l} = σ (𝓍 W_{1} + b_{1}) W_{2} + b_{2} \end{matrix}

(19)

Here,

W_{1}

and

W_{2}

represent weights, and

b_{1}

and

b_{2}

are biases. The activation function

σ

is the ReLU activation function. Additionally, residual connections and layer normalization are applied around each sub-layer in the two layers. Therefore, the output of the encoder is as follows:

\begin{matrix} X^{l + 1} = L a y e r N o r m (A^{l} + F C_{𝓍}^{l}) \end{matrix}

(20)

A^{l}

is the attention head present after the residual connection and layer normalization are applied.

4.3.3. Decoder Prediction and Loss

After completing sequence modeling with the Transformer, since next POI recommendation is a sequence prediction task, the goal is to predict the next directly relevant POI. Therefore, we choose a multi-layer perceptron (MLP) head as the decoder. To optimize the prediction performance, we design a multi-task loss function that combines the predictions of the POI, timestamp, and category for training. The formulas are as follows:

\begin{matrix} P r e_{p} = X^{l^{*}} W_{p} + b_{p} \end{matrix}

(21)

\begin{matrix} P r e_{t} = X^{l^{*}} W_{t} + b_{t} \end{matrix}

(22)

\begin{matrix} P r e_{c} = X^{l^{*}} W_{c} + b_{c} \end{matrix}

(23)

Here,

W_{p} \in ℝ^{d \times n}, W_{t} \in ℝ^{d \times 1}, W_{c} \in ℝ^{d \times Γ}

are the weight matrices, with

Γ

representing the number of POI categories.

b_{p}, b_{t}, b_{c}

are the biases. For

P r e_{p} \in ℝ^{k \times n}

, we focus only on its last row to predict the next POI. Combining the predictions of the timestamp and category enhances the stability in predicting the next POI alone, achieving a more comprehensive recommendation objective.

For the loss function, to optimize the prediction performance for the POI, timestamp, and category simultaneously, we combine the loss functions of the POI, timestamp, and category predictions. Cross-entropy is used as the loss function for the POI and category predictions, while the mean squared error is used for timestamp prediction, with a time loss weight

η

. The final total loss is as follows:

\begin{matrix} L = L_{p} + L_{c} + η L_{t} \end{matrix}

(24)

5. Experiments

In this section, we evaluate the proposed model on real-world datasets.

5.1. Experimental Setup

5.1.1. Datasets

We evaluate our model on two publicly available datasets: FourSquare [39] and Gowalla [40]. All data used are publicly available and do not contain any personally identifiable information. Proper citations of the original authors are included. The FourSquare dataset contains global check-in data collected from April 2012 to January 2014. We selected check-in data from the New York region for our experiments. The Gowalla dataset contains global user check-ins from February 2009 to October 2010, and we filtered the dataset to include only users with mutual friendship relationships. For both datasets, we removed POIs with fewer than 10 check-ins and users with fewer than 10 check-in records. Regarding dataset preprocessing, we arranged users’ check-in records in chronological order and divided them into trajectories with one-week intervals. The datasets were then split into training, validation, and test sets with an 80%, 10%, 10% ratio. Statistical information for the preprocessed FourSquare and Gowalla datasets is shown in Table 1.

5.1.2. Baseline Models

We compare our model with the following baseline models.

PLSPL [41] uses attention mechanisms and LSTM to learn users’ long-term and short-term preferences.
STAN [42] utilizes self-attention layers to capture point-to-point effects between non-adjacent locations and non-continuous check-ins in a check-in trajectory.
GETNext [36] learns spatial transition information from users’ check-in records through a global trajectory flow graph and Transformer.
MTNet [43] uses mobile trees and their networks, applying a multi-task training strategy for the hierarchical modeling of user check-in records and personalized preference learning.

5.1.3. Evaluation Metrics

In this study, we use two commonly used evaluation metrics—Accuracy

@ k

(

Acc @ k

) and the mean reciprocal rank (

M R R

)—to assess the effectiveness of the recommendation models. Accuracy

@ k

measures the model’s ability to include the true POI within the top k recommendations. The mean reciprocal rank (

M R R

) reflects whether the model places the true POI near the top of the recommendation list.

5.1.4. Experimental Settings

We developed the model using PyTorch v2.1.0 and conducted the experiments on a CentOS server configured with an NVIDIA GeForce RTX 4090 GPU and 48 GB of RAM. We maintained consistent hyperparameters across the FourSquare and Gowalla datasets. The embedding dimensions for POIs and users were set to 160, with the embedding lengths for the time and POI categories set to 48. The GCN model has three hidden layers, with 40, 80, and 160 channels in each layer. The GraphSAGE module has two hidden layers, with 40 and 160 channels in each layer. For the Transformer module, we stacked two encoder layers, with each layer consisting of two attention heads. Additionally, we used the Adam optimizer with a learning rate of 0.001 and a weight decay rate of 0.0005. We also enabled Dropout with a dropout rate of 0.3. Another important parameter is the time loss weight, which was set to 10 to balance the POI and category losses. Finally, we ran the model for 200 epochs with a batch size of 20.

5.2. Results

We first compare our model with the baseline models. Table 2 shows the performance of the selected models on the FourSquare and Gowalla datasets. Our model outperformed the others across all metrics, demonstrating its effectiveness in the next POI recommendation task. For example, on the FourSquare dataset, we achieved top-1 accuracy of 32.29%, which represents an average improvement of 10.3% compared to the other baseline models. The Acc@10 and MRR metrics reached 69.99% and 45.18%, respectively, whereas the best baseline model, GETNext, achieved 67.32% and 43.98%, showing improvements of 3.97% and 2.9%, respectively. The effectiveness of our model is demonstrated through fine-grained user modeling based on check-in frequency and user social graph modeling, which integrates friendship relationships. This approach enables our model to better capture user check-in behavior patterns and the spatiotemporal dependencies in the global trajectory flow graph. On the Gowalla dataset, although all models performed poorly, our model achieved 12.81% and 27.96% for the Acc@1 and Acc@5 metrics, respectively, surpassing the second-best model by 11% and 28%. This result indicates that, despite the sparse check-in behavior and complex spatiotemporal characteristics of the Gowalla dataset, our model still demonstrates strong adaptability in capturing user behavior and social information. It effectively mitigates the issues present in current models, such as coarse user interest representation, simplistic social influence modeling, and the lack of the integration of spatiotemporal contextual information.

Figure 4a,b further illustrate that, on both datasets, TraFriendFormer outperforms the other baseline models by an average of 10.3% to 37.2%. The largest improvement is seen compared to the PLSPL model, as our model combines user social relationships and check-in trajectories, thus capturing more nuanced user behavior patterns. Compared to the GETNext model, the improvement is smaller, primarily because GETNext already utilizes the Transformer architecture to model spatiotemporal dependencies. However, our model performs better in handling complex and sparse data due to the introduction of additional embedding layers and social information. This also validates the hypothesis that the check-in frequency more accurately reflects user preferences and that the social circle influences user behavior. It further demonstrates that integrating the spatiotemporal context can improve the accuracy of next POI recommendations.

The poor performance of all models on the Gowalla dataset could be attributed to the fact that it contains global user check-in data, which leads to a sparser distribution of POIs and the weaker connectivity of user social relationships. Additionally, the spatiotemporal characteristics of the data are more complex, which increases the difficulty involved in models learning from this dataset, leading to overall poorer results. In contrast, the FourSquare dataset is concentrated in the New York City area, in which the check-in behavior and POI distribution are more focused and user friendship relationships are denser, allowing the model to better capture check-in patterns and user preferences.

5.3. Ablation Study

To analyze the impacts of different modules in the model, we conducted an ablation study on the FourSquare dataset. Specifically, we designed six model variants: (1) a full model; (2) a model that removed user friendship information and used an embedding layer to learn user embeddings; (3) a model that did not use the GraphSAGE model to generate user embeddings, instead using the average check-in vector as the user embedding; (4) a model with no timestamp or category information; (5) a model that removed the trajectory flow graph and used an embedding layer to learn POI features; and (6) a model that used LSTM instead of the Transformer. The experimental results are shown in Table 3.

This table clearly shows that the full model significantly outperforms all other variants. First, the modeling of user friendship relationships plays an important role in improving the recommendation accuracy and ranking quality. When user friendship information is removed, the model’s Acc

@ 1

and

M R R

drop by 10.63% and 9.44%, respectively, thus indicating the importance of user friendship information in recommendation tasks. User POI visits are often influenced by friends, and POIs recommended by friends are more likely to interest the user. By modeling user friendships, the model can capture the interest similarity and social influence between users, thus resulting in more accurate behavior preferences and improved user embedding quality and recommendation performance. Furthermore, when weighted averaging is used instead of GraphSAGE, the model performance declines, confirming the advantages of graph neural networks in learning both explicit and implicit friendship information. Additionally, the modeling of the trajectory flow graph plays a crucial role in POI representation learning. After removing this module, the Acc

@ 10

and

M R R

drop by 2.77% and 3.72%, respectively, indicating that the trajectory flow graph is key to capturing the spatiotemporal dependencies of POIs. Finally, replacing the Transformer with LSTM results in a decrease of 11.47% and 6.88% in Acc

@ 10

and

M R R

, respectively, highlighting the excellent performance of the Transformer’s self-attention mechanism in capturing complex dependencies in the check-in sequence.

Overall, user friendship relationships, trajectory flow graphs, and Transformers are the core modules that enhance the model’s performance. These components significantly strengthen the model’s ability to model user check-in behavior.

5.4. Effects of Model Parameters

5.4.1. Effects of Threshold $θ$

To evaluate the impact of the threshold

θ

on the performance of the TraFriendFormer model, we trained our model within the range of 0.1 to 0.9 while keeping all other parameters fixed (see Figure 5). Figure 5 clearly shows that the best performance of TraFriendFormer occurs with

θ

= 0.4 on the FourSquare dataset and

θ

= 0.5 on the Gowalla dataset. The results show that setting a threshold that is too small introduces excessive noisy edges when constructing the social graph, while setting a threshold that is too large leads to a sparse social graph. A moderate threshold retains enough connecting edges to reflect genuine social influence while avoiding the interference of noisy edges.

5.4.2. Effects of Embedding Dimension

In this section, we demonstrate the impact of the POI and user embedding dimensions on the model performance, where the POI and user embedding dimensions are set to be equal. Generally, larger embedding dimensions provide stronger expressive power, but excessively large dimensions may lead to overfitting. We varied the dimension

d

from 40 to 240, with a step size of 40. In terms of Acc

@ k

and

M R R

, the results of TraFriendFormer with different embedding dimensions are shown in Figure 6. It is evident that, as the embedding dimension increases, the performance gradually improves, and, after 120 dimensions, it starts to converge. The performance reaches its peak when

d

= 160. Further increasing the dimension to 200–240 causes slight fluctuations or a minor decline in the metrics, indicating that excessively high dimensions may lead to overfitting and increase the computational overhead.

5.5. Analysis of Impact of User Friendship Count

To validate the model’s ability to learn user social relationships, we conducted a visualization analysis of the user embedding vectors trained on the FourSquare dataset. We selected four groups of users based on the number of friends: fewer than 100 (a), 100 to 200 (b), 200 to 400 (c), and more than 400 (d). Users with more than 200 friends had mutual friendships, and we calculated the cosine similarity of their embedding vectors, presenting the results via a heatmap. The analysis reveals that user groups with fewer friends exhibit lower similarity, as indicated by the more scattered color regions in the heatmap. This suggests that the social relationships between these users are weak, and their check-in behavior is less influenced by social factors, as shown in Figure 7a,b. Due to the sparsity of their social networks, the model finds it difficult to capture strong social dependencies, thus leading to weaker performance in POI recommendation for these users. In contrast, user groups with more friends show higher embedding similarity and a more structured pattern. In user groups with high social connectivity, the embedding vectors of users exhibit regularity in the spatial representation, bringing together users with similar interests and behaviors. This reflects the potential influence of social networks on user behavior, as shown in Figure 7c,d.

This result demonstrates that our model not only recognizes and aggregates user check-in behaviors but also uncovers potential areas of interest that overlap through social relationships. It highlights the model’s effectiveness in capturing the impact of social relationships on check-ins, with particularly significant performance in large-scale social networks.

6. Conclusions

In this study, we propose TraFriendFormer, a next POI recommendation model that combines trajectory information with user friendship relationships. It utilizes a trajectory flow graph to globally model users’ spatiotemporal check-in behaviors to capture trajectory dependencies. We also construct a user social graph through both explicit and implicit friendship relationships, using the GraphSAGE model to generate user embeddings and fully explore the interest similarities between users. To further improve the recommendation accuracy, we fuse user embeddings, POI embeddings, timestamp embeddings, and category embeddings, which are then input into the Transformer. The self-attention mechanism of the Transformer is employed to model the check-in sequence and capture the complex temporal relationships in user behavior. We conducted experiments on two real-world datasets (FourSquare and Gowalla), demonstrating that our model outperformed the other evaluated models in terms of Acc

@ k

and

M R R

. Ablation studies further validated the contribution of each module to the overall performance. The experimental results indicate that the combination of trajectory flow graphs, user friendship relationships, and the Transformer architecture plays a crucial role in enhancing the model’s performance.

For future work, we will further optimize the model architecture to reduce its complexity, accelerating the model’s convergence speed and thus lowering the computational costs, making it more feasible and efficient for practical applications. Additionally, we plan to incorporate more user social information, such as comments, opinions, and other semantic data, to further enhance the recommendation quality of the model.

Author Contributions

Conceptualization, Chenglin Yu and Yangyang Zhao; methodology, Chenglin Yu, Lihong Shi, and Yangyang Zhao; validation, Chenglin Yu and Yangyang Zhao; formal analysis, Chenglin Yu; investigation, Lihong Shi; resources, Yangyang Zhao; writing—original draft preparation, Chenglin Yu; writing—review and editing, Yangyang Zhao and Lihong Shi; supervision, Lihong Shi. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The publicly available data used in this study are accessible at https://sites.google.com/site/yangdingqi/home/foursquare-dataset (accessed on 23 April 2025).

Acknowledgments

The authors would like to thank the Research Center for Geospatial Big Data Application of the Chinese Academy of Surveying and Mapping. The authors also thank the editors and reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Islam, M.A.; Mohammad, M.M.; Das, S.S.S.; Ali, M.E. A survey on deep learning based Point-of-Interest (POI) recommendations. Neurocomputing 2022, 472, 306–325. [Google Scholar] [CrossRef]
Cheng, C.; Yang, H.; Lyu, M.R.; King, I. Where You Like to Go Next: Successive Point-of-Interest Recommendation. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013. [Google Scholar]
Gambs, S.; Killijian, M.O.; del Prado Cortez, M.N.E. Next place prediction using mobility Markov chains. In Proceedings of the First Workshop on Measurement, Privacy, and Mobility; ACM: New York, NY, USA, 2012; pp. 1–6. [Google Scholar]
Lian, D.; Zhao, C.; Xie, X.; Sun, G.; Chen, E.; Rui, Y. GeoMF: Joint geographical modeling and matrix factorization for point-of-interest recommendation. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
Zhao, S.; King, I.; Lyu, M.R. Geo-Pairwise Ranking Matrix Factorization Model for Point-of-Interest Recommendation. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, 14–18 November 2017, Proceedings, Part V 24; Springer: Cham, Switzerland, 2017. [Google Scholar]
Cui, Q.; Tang, Y.; Wu, S.; Wang, L. Distance2Pre: Personalized Spatial Preference for Next Point-of-Interest Prediction. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery & Data Mining, Macau, China, 14–17 April 2019. [Google Scholar]
He, J.; Li, X.; Liao, L. Category-aware Next Point-of-Interest Recommendation via Listwise Bayesian Personalized Ranking. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia 19–25 August 2017. [Google Scholar]
Zhao, S.; Zhao, T.; King, I.; Lyu, M.R. Geo-Teaser: Geo-Temporal Sequential Embedding Rank for Point-of-interest Recommendation. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017. [Google Scholar]
Huang, L.; Ma, Y.; Wang, S.; Liu, Y. An Attention-based Spatiotemporal LSTM Network for Next POI Recommendation. IEEE Trans. Services Comput. 2021, 14, 1585–1597. [Google Scholar] [CrossRef]
Manotumruksa, J.; Macdonald, C.; Ounis, I. A Contextual Attention Recurrent Architecture for Context-Aware Venue Recommendation. In Proceedings of the The 41st International ACM SIGIR Conference, Ann Arbor, MI, USA, 8–12 July 2018. [Google Scholar]
Zhao, P.; Luo, A.; Liu, Y.; Zhuang, F.; Zhou, X. Where to Go Next: A Spatio-Temporal Gated Network for Next POI Recommendation. IEEE Trans. Knowl. Data Eng. 2020, 34, 2512–2524. [Google Scholar] [CrossRef]
Zhao, P.; Zhu, H.; Liu, Y.; Li, Z.; Sheng, V.S. Where to Go Next: A Spatio-temporal LSTM model for Next POI Recommendation. arXiv 2018, arXiv:1806.06671. [Google Scholar]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1025–1035. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Monti, F.; Bronstein, M.M.; Bresson, X. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. Adv. Neural Inform. Process. Syst. 2017, 30. [Google Scholar]
Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019. [Google Scholar]
Zhong, T.; Zhang, S.; Zhou, F.; Zhang, K.; Wu, J. Hybrid graph convolutional networks with multi-head attention for location recommendation. World Wide Web 2020, 23, 3125–3151. [Google Scholar] [CrossRef]
Wu, S.; Zhang, Y.; Gao, C.; Bian, K.; Cui, B. GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network. Data Sci. Eng. 2020, 5, 433–447. [Google Scholar] [CrossRef]
Tang, J.; Jin, J.; Miao, Z.; Zhang, B.; Zhang, J. Region-aware POI Recommendation with Semantic Spatial Graph. In Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China, 5–7 May 2021. [Google Scholar]
Abu-El-Haija, M.; Kumar, S.; Szabo, F.; Werlin, S.; Conwell, D.; Banks, P.; Morinville, V.D. Classification of Acute Pancreatitis in the Pediatric Population: Clinical Report from the NASPGHAN Pancreas Committee. J. Pediatr. Gastroenterol. Nutr. 2017, 984, 984–990. [Google Scholar] [CrossRef]
Li, Y.; Chen, T.; Luo, Y.; Yin, H.; Huang, Z. Discovering collaborative signals for next POI recommendation with iterative Seq2Graph augmentation. arXiv 2021, arXiv:2106.15814. [Google Scholar]
Lim, N.; Hooi, B.; Ng, S.-K.; Goh, Y.L.; Weng, R.; Tan, R. Hierarchical Multi-Task Graph Recurrent Network for Next POI Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1133–1143. [Google Scholar]
Qin, Y.; Wu, H.; Ju, W.; Luo, X.; Zhang, M. A Diffusion model for POI recommendation. ACM Trans. Inf. Syst. 2023, 42, 54. [Google Scholar] [CrossRef]
Wang, J.; Yang, B.; Liu, H.; Li, D. Global spatio-temporal aware graph neural network for next point-of-interest recommendation. Appl. Intell. 2022, 13, 16762–16775. [Google Scholar] [CrossRef]
Chang, B.; Park, Y.; Park, D.; Kim, S.; Kang, J. Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Gao, R.; Li, J.; Li, X.; Song, C.; Zhou, Y. A personalized point-of-interest recommendation model via fusion of geo-social information. Neurocomputing 2018, 273, 159–170. [Google Scholar] [CrossRef]
Li, J.; Wang, X.; Feng, W. A Point-of-Interest Recommendation Algorithm Combining Social Influence and Geographic Location Based on Belief Propagation. IEEE Access 2020, 8, 165748–165756. [Google Scholar] [CrossRef]
Xiong, X.; Qiao, S.; Li, Y.; Han, N.; Zhang, Y. A point-of-interest suggestion algorithm in Multi-source geo-social networks. Eng. Appl. Artif. Intell. 2020, 88, 103374. [Google Scholar] [CrossRef]
Seyedhoseinzadeh, K.; Rahmani, H.A.; Afsharchi, M.; Aliannejadi, M. Leveraging social influence based on users activity centers for point-of-interest recommendation. Inf. Process. Manag. Libr. Inf. Retr. Syst. Commun. Netw. Int. J. 2022, 59, 102858. [Google Scholar] [CrossRef]
Christoforidis, G.; Kefalas, P.; Papadopoulos, A.N.; Manolopoulos, Y. RELINE: Point-of-interest recommendations using multiple network embeddings. Knowl. Inf. Syst. 2021, 63, 791–817. [Google Scholar] [CrossRef]
Kefalas, P.; Manolopoulos, Y. A time-aware spatio-textual recommender system. Expert Syst. Appl. 2017, 78, 396–406. [Google Scholar] [CrossRef]
Fan, W.; Ma, Y.; Li, Q.; He, Y.; Zhao, E.; Tang, J.; Yin, D. Graph Neural Networks for Social Recommendation. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 417–426. [Google Scholar]
Lim, N.; Hooi, B.; Ng, S.-K.; Wang, X.; Goh, Y.L.; Weng, R.; Varadarajan, J. STP-UDGAT: Spatial-Temporal-Preference User Dimensional Graph Attention Network for Next POI Recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 19–23 October 2020; pp. 845–854. [Google Scholar]
Huang, Z.; Ma, J.; Dong, Y.; Foutz, N.Z.; Li, J. Empowering Next POI Recommendation with Multi-Relational Modeling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022. [Google Scholar]
Yang, S.; Liu, J.; Zhao, K. GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1144–1153. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2Vec: Learning a Vector Representation of Time. arXiv 2019, arXiv:1907.05321. [Google Scholar]
Yang, D.; Qu, B.; Yang, J.; Cudre-Mauroux, P. Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach. In The World Wide Web Conference; ACM: New York, NY, USA, 2019. [Google Scholar]
Liu, Y. An experimental evaluation of point-of-interest recommendation in location-based social networks. Proc. Vldb Endow. 2017, 10, 1010–1021. [Google Scholar] [CrossRef]
Wu, Y.; Li, K.; Zhao, G.; Qian, X. Personalized long-and short-term preference learning for next POI recommendation. IEEE Trans. Knowl. Data Eng. 2020, 34, 1944–1957. [Google Scholar] [CrossRef]
Luo, Y.; Liu, Q.; Liu, Z. Stan: Spatio-temporal attention network for next location recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2177–2185. [Google Scholar]
Huang, T.; Pan, X.; Cai, X.; Zhang, Y.; Yuan, X. Learning Time Slot Preferences via Mobility Tree for Next POI Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 8535–8543. [Google Scholar]

Figure 1. The trajectories of two users with a friendship relationship on different dates, with the exact locations retrieved from Google Maps based on the check-in data of the two users from the real-world FourSquare dataset.

Figure 2. The proposed TraFriendFormer architecture.

Figure 3. User social graph.

Figure 4. Performance improvements of TraFriendFormer compared to baseline models. (a) Improvement on FourSquare. (b) Improvement on Gowalla.

Figure 5. The impact of threshold

θ

on model performance on FourSquare (a) and Gowalla (b).

Figure 5. The impact of threshold

θ

on model performance on FourSquare (a) and Gowalla (b).

Figure 6. Effects of embedding dimension size

d

.

Figure 6. Effects of embedding dimension size

d

.

Figure 7. Visualization of embedding vectors for users with different friendship relationships. (a) Fewer than 100 friends. (b) Between 100 to 200 friends. (c) Between 200 to 400 friends. (d) More than 400 friends.

Table 1. Dataset statistics.

Dataset	User	POI	Check-In
FourSquare	2282	4218	122,068
Gowalla	922	3151	85,357

Table 2. Performance comparison in terms of Acc

@ k

and MRR on two datasets.

Table 2. Performance comparison in terms of Acc

@ k

and MRR on two datasets.

Model	FourSquare				Gowalla
Model	$A c c @ 1$	$A c c @ 5$	$A c c @ 10$	$M R R$	$A c c @ 1$	$A c c @ 5$	$A c c @ 10$	$M R R$
PLSPL	0.2679	0.5133	0.6057	0.3810	0.0714	0.1903	0.2373	0.1268
STAN	0.2935	0.5507	0.6226	0.4013	0.0925	0.2341	0.2737	0.1598
GETNext	0.3125	0.6048	0.6732	0.4398	0.1148	0.2665	0.3235	0.1904
MTNet	0.3012	0.5691	0.6347	0.4198	0.1075	0.2545	0.3099	0.1873
Ours	0.3229	0.6271	0.6999	0.4518	0.1281	0.2796	0.3460	0.2034

Table 3. Ablation studies on FourSquare.

	$A c c @ 1$	$A c c @ 5$	$A c c @ 10$	$M R R$
Full Model	0.3229	0.6271	0.6999	0.4518
w/o Friendship	0.2886	0.5634	0.6273	0.4091
w/o GraphSAGE	0.2946	0.5573	0.6220	0.4116
w/o Time&Cat	0.3115	0.6089	0.6772	0.4430
w/o Graph	0.3083	0.6035	0.6805	0.4350
w/o Transformer	0.3025	0.5622	0.6197	0.4207

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, C.; Shi, L.; Zhao, Y. Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation. ISPRS Int. J. Geo-Inf. 2025, 14, 192. https://doi.org/10.3390/ijgi14050192

AMA Style

Yu C, Shi L, Zhao Y. Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation. ISPRS International Journal of Geo-Information. 2025; 14(5):192. https://doi.org/10.3390/ijgi14050192

Chicago/Turabian Style

Yu, Chenglin, Lihong Shi, and Yangyang Zhao. 2025. "Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation" ISPRS International Journal of Geo-Information 14, no. 5: 192. https://doi.org/10.3390/ijgi14050192

APA Style

Yu, C., Shi, L., & Zhao, Y. (2025). Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation. ISPRS International Journal of Geo-Information, 14(5), 192. https://doi.org/10.3390/ijgi14050192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Trajectory- and Friendship-Aware Graph Neural Network with Transformer for Next POI Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Next POI Recommendation Based on Graph Neural Networks

2.2. Next POI Recommendation Based on User Social Networks

3. Problem Formulation

4. Methods

4.1. POI Embedding and User Representation Learning

4.1.1. POI Embedding

4.1.2. User Representation Learning

4.2. Social Graph Modeling and User Embedding Learning

4.2.1. Social Graph Modeling

4.2.2. User Embedding Learning

4.3. Fusion and Prediction Based on Transformer

4.3.1. Fusion of Feature Embeddings

4.3.2. Transformer Sequence Encoding

4.3.3. Decoder Prediction and Loss

5. Experiments

5.1. Experimental Setup

5.1.1. Datasets

5.1.2. Baseline Models

5.1.3. Evaluation Metrics

5.1.4. Experimental Settings

5.2. Results

5.3. Ablation Study

5.4. Effects of Model Parameters

5.4.1. Effects of Threshold θ

5.4.2. Effects of Embedding Dimension

5.5. Analysis of Impact of User Friendship Count

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.4.1. Effects of Threshold $θ$