MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation

Zhao, Sai; Chen, Caisen; He, Shuai

doi:10.3390/electronics15071528

Open AccessArticle

MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation

by

Sai Zhao

¹

,

Caisen Chen

² and

Shuai He

^1,*

¹

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

²

Army Academy of Armored Forces, Beijing 100072, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1528; https://doi.org/10.3390/electronics15071528

Submission received: 11 February 2026 / Revised: 28 March 2026 / Accepted: 3 April 2026 / Published: 6 April 2026

(This article belongs to the Special Issue Advances in Deep Learning for Graph Neural Networks)

Download

Browse Figures

Versions Notes

Abstract

With the rapid advancement of location-based services, next Point-of-Interest (POI) recommendation has emerged as a critical task in personalized mobility modeling and recommendation systems. It aims to predict users’ future locations based on their historical trajectories, thereby enhancing the personalization and intelligence of recommendation systems. Despite the promising progress, two key challenges remain insufficiently addressed. First, many existing methods overlook the dynamic evolution of user trajectories across multiple perspectives, resulting in entangled representations that fail to capture user intent accurately. Second, they often ignore the latent synergy across diverse perspectives, which limits the effective utilization of complementary information for recommendation. To address these issues, we propose a novel framework called MRHL. MRHL constructs multiple hypergraphs to represent distinct views of user behavior, including interaction frequency, time decay, and geographical proximity. An enhanced hypergraph convolutional network is employed to effectively model the high-order relationships within them. We propose a cascaded enhancement fusion mechanism that progressively integrates multi-view hypergraph representations to enrich the semantic information of user representations. In addition, a multi-relational contrastive learning strategy is developed to capture the consistent signals across different views, thereby enhancing the robustness and discriminative capability of user and POI representations. Extensive experiments on three public datasets consistently demonstrate that MRHL outperforms a range of strong baselines.

Keywords:

next POI recommendation; hypergraph convolutional network; cascaded enhancement fusion; contrastive learning

1. Introduction

With the rapid proliferation of ubiquitous mobile sensing and online location-sharing platforms, location-based social networks (LBSNs) have become a fundamental component of modern urban information systems. They play a key role in delivering personalized and context-aware experiences in diverse applications, including smart tourism [1,2], intelligent transportation, and mobile commerce. As a core task in LBSNs, next Point-of-Interest (POI) recommendation aims to predict the next location a user is likely to visit based on their historical trajectories [3,4,5]. Predicting the next POI is a particularly dynamic challenge, as it requires modeling immediate user movement. Accurately capturing such mobility patterns is essential for enhancing the intelligence and responsiveness of graph-based urban computing applications.

Most current approaches to next POI recommendation formulate the task as a sequential prediction problem. The research process has evolved from early Markov chain models [6,7] to more sophisticated neural approaches. These range from recurrent neural networks (RNNs) [8,9,10] to self-attention mechanisms [11,12,13], which have progressively enhanced the modeling of user behavior sequences. Most existing approaches have focused on enriching these sequential models by incorporating critical spatio-temporal information, which has proven vital for accurate trajectory modeling [14,15]. Despite their success, most of these methods focus on the intra-sequence features of individual users, while ignoring the high-order correlation information associated with different users [3,8,12]. A core trend in the field of graph learning is the capturing of high-order neighbor dependencies and the modeling of complex relational structures [16,17,18]. As a result, an increasing number of studies [4,19,20] have employed GNNs and their powerful extension. For example, hypergraph neural networks (HGNNs) are used to learn more expressive representations of users and POIs, thereby improving recommendation performance. Despite the significant progress made in next POI recommendation, most existing methods still face two key challenges that remain insufficiently addressed.

(1) Entangled representations of user preferences. Many earlier works have failed to fully recognize the complexity that user interests naturally vary across multiple perspectives and evolve over time, often producing user representations that are confounded. In next POI recommendation scenarios, user–POI interactions are typically influenced by a mix of implicit contextual signals, including geographic distance, temporal patterns, and behavioral tendencies [5,21]. For example, a user may habitually purchase essentials from a nearby store while ignoring a preferred specialty shop located farther away. Nonetheless, existing graph-based and hypergraph-based approaches [18,22] often conflate multiple behavioral signals. They typically model user–POI interactions at a high level, without explicitly accounting for the nuanced factors that influence user decisions. Such representations obscure the distinct motivations behind user behavior, making it difficult to uncover fine-grained and context-aware user preferences. Therefore, identifying and modeling user intentions from multiple perspectives remains a critical challenge that requires urgent attention.

(2) Challenges in cross-perspective synergy. Many current approaches fall short in effectively capturing the essential synergy across diverse perspectives. This limitation restricts their ability to exploit complementary signals during representation learning. Complementarity refers to the integration of heterogeneous information from multiple perspectives to achieve a more holistic understanding of user behavior and improve recommendation accuracy. Regarding the next POI recommendation task, several studies have explored multi-view or disentangled learning frameworks [23,24,25]. A common approach in these works is to construct representations for individual perspectives separately and then aggregate them in a straightforward manner for prediction. However, such strategies often overlook the relational nuances between perspectives. For instance, when a user frequently visits a location, considering only interaction frequency may ignore the influence of visit timing and geographical context. Furthermore, some existing works [18,26] address cross-perspective cooperation only at the output level, making it difficult to ensure any interactive reinforcement during the representation phase. Therefore, there is an urgent need for more sophisticated mechanisms to model the cooperation between different perspectives and foster cross-perspective interactive reinforcement.

To tackle these challenges, we propose a multi-relational hypergraph learning (MRHL) framework, which consists of two main components. (1) Multi-view representation disentanglement. To address the first limitation, we propose an approach that disentangles representations by modeling the three crucial views of interaction frequency, time decay, and geographical proximity [20,23,25]. These three views are selected as representative behavioral perspectives as they jointly capture the intensity, temporal dynamics, and spatial constraints of user mobility. Moreover, the proposed framework is not restricted to these views and can be naturally extended to incorporate additional behavioral or contextual factors. Importantly, these views are not implemented as simple reweighted variants of a shared graph structure. Instead, each view is modeled as an independent hypergraph that encodes a distinct type of relational dependency, enabling the model to explicitly capture heterogeneous semantics across different behavioral dimensions. Specifically, unlike simple graphs that model relationships in a pairwise manner, we construct a distinct hypergraph for each view to capture high-order correlations among users and POIs. This design enables the model to represent group-wise behavioral patterns and joint effects that naturally arise in user mobility data. Such a design goes beyond conventional multi-view graph modeling by disentangling heterogeneous relational structures rather than implicitly mixing them within a unified adjacency space. Following this disentanglement, we introduce a novel cascaded enhancement mechanism to synthesize the final user representation, improving personalization and interpretability. Unlike conventional fusion strategies based on parallel aggregation or residual connections, our cascaded enhancement mechanism operates across heterogeneous relational views, each encoding distinct behavioral semantics. Specifically, the proposed fusion follows a sequential dependency paradigm, where representations from preceding views are incorporated as complementary context to enrich subsequent ones. This process can be interpreted as cross-view semantic augmentation rather than simple additive combination. It enables the model to capture complementary and interdependent patterns across behavioral perspectives. (2) Cross-view synergistic enhancement. To address the second limitation, we further devise a cross-view contrastive learning strategy that promotes semantic consistency across different views through a self-supervised objective. Instead of simply aggregating multi-view representations, this strategy encourages the model to capture complementary and mutually reinforcing patterns across views. As a result, the learned representations can better preserve consistent user preference signals shared among different behavioral perspectives.

To systematically address the aforementioned challenges and guide our methodology, this paper aims to answer the following core research questions (RQs):

RQ1: How can we effectively disentangle the multi-perspective driving factors (e.g., interaction frequency, time decay, and geographical proximity) behind user check-in behaviors to capture fine-grained user intents?
RQ2: How can we design an effective mechanism to foster cross-view interactive reinforcement and synergy, thereby mitigating the data sparsity issue in POI recommendation?
RQ3: Does the proposed MRHL framework outperform existing state-of-the-art models, and what is the specific contribution of each disentangled view to the overall performance?

The primary contributions can be outlined as follows:

We propose novel representations of three structurally diverse hypergraphs from the views of visit frequency, time decay, and geographical proximity. Furthermore, we enhance the aggregation and propagation strategies in the hypergraph convolutional network, to mitigate the issues of entanglement and information sparsity in user representation learning.
We employ a cross-view contrastive learning strategy that leverages auxiliary tasks to enhance the supervision across different views, thereby effectively mitigating the difficulty of capturing complementary recommendation cues during training.
Comprehensive experiments conducted on three real-world datasets demonstrate the superior performance of our proposed MRHL model, compared to a range of cutting-edge approaches for next POI recommendation.

2. Related Work

2.1. Next POI Recommendation

Early studies typically formulated next POI recommendation as a sequential task based on traditional sequence modeling techniques. For instance, Cheng et al. [7] enhanced FMPC [6] by incorporating local spatial constraints to emphasize geographically adjacent POIs. On the other hand, He et al. [27] incorporated softmax into personalized Markov chains to capture latent transitions. However, these approaches are limited in modeling complex sequential dependencies compared to deep learning methods.

With the rise of deep learning, later studies explored models based on RNNs and attention mechanisms. Kong and Wu [14] introduced HST-LSTM, which integrates spatio-temporal signals into the gating process and applies a hierarchical structure to capture periodic visit patterns. LSTPM [28] employs three LSTM modules, combining non-local neural operations and a geo-aware short-term preference encoder. STAN [13] uses multimodal embeddings to jointly model users, locations, and time effects, leveraging a dual-attention architecture to capture explicit spatio-temporal dependencies. CFPRec [29] focuses on multi-step future behavior prediction by using attention-based encoders, including both Transformer and LSTM, to extract relevant historical preferences. However, these models primarily rely on user trajectories while overlooking collaborative signals among users.

Graph-based methods have gained increasing attention in next POI recommendation due to their natural suitability for representing data in LBSNs. STP-UDGAT [30] was among the first to introduce graph attention networks, enabling users to selectively aggregate information from others with a global view. Qin et al. [16] designed two specialized graph encoders to aggregate users’ visit sequences and spatial features and employed a diffusion-based sampling strategy to explore users’ spatial visiting patterns. Lim et al. [9] developed a hierarchical multi-task graph recurrent network (HMT-GRN) to model user–POI and user–region interactions, replacing LSTM units with GRNs to better capture sequential and spatio-temporal dependencies. DRGN [4] explored POI characteristics by learning disentangled representations from both distance- and transition-based relation graphs using GCNs. To mitigate the data sparsity issue, GETNEXT [21] leveraged collaborative signals from other users and introduced a global trajectory flow graph combined with a graph-enhanced Transformer. STGCN [18] designs a multi-graph structure for user records, integrating all contextual information into a unified graph model to mitigate data sparsity and over-smoothing. However, while traditional graph models excel at capturing pairwise relationships between nodes, they have inherent limitations in representing higher-order interactions. Although Rao et al. [31] acknowledged the role of temporal and sequential information, their model was unable to effectively incorporate such dynamics into the message-passing process. To further enhance representation learning, our method addresses these gaps by constructing multiple hypergraphs and integrating a contrastive learning framework. This approach allows us to jointly optimize complementary information from different types of interactions.

2.2. Hypergraph Learning in Recommendation

Hypergraph neural networks have emerged as a powerful extension of traditional graph-based models by enabling the modeling of high-order relations among entities. Unlike conventional graphs that represent only pairwise relationships, hypergraphs capture multi-way interactions through hyperedges. This makes them particularly suitable for recommendation scenarios, where user preferences are influenced by complex contextual factors. In recent years, HGNNs have been successfully applied to a variety of recommendation tasks, including session-based recommendation [32], social recommendation [33], and group recommendation [34]. These methods leverage hypergraph structures to integrate richer user-item interaction patterns and latent dependencies. For example, STHGCN [22] proposes a hypergraph model designed to capture higher-order features based on user trajectories and collaborative relationships. Additionally, dual-graph embedding has been elegantly integrated into Bayesian low-rank matrix completion to assist recommendation systems by handling static data sparsity [35].

Despite their strong expressive power, existing hypergraph-based models [22] often focus on homogeneous interactions and fail to capture user preferences from different perspectives. Furthermore, while methods like [35] are primarily designed for static matrix recovery, they are not well-suited for modeling the highly dynamic and high-order spatio-temporal dependencies inherent in next POI recommendation. Some recent studies [24,25] introduce disentangled or multi-intent learning mechanisms. However, they typically model complex relational patterns using independent views or pairwise-weighted structures and rely on shallow fusion strategies, such as concatenation or late aggregation. Such designs treat different views as loosely coupled components and fail to capture cross-view dependencies or semantic interactions among heterogeneous behavioral signals. As a result, these approaches are limited in modeling both high-order intra-view correlations and inter-view collaborative patterns within complex user mobility data. In contrast, MRHL explicitly deconstructs user trajectories into dedicated hypergraph structures for each behavioral view, enabling the modeling of view-specific high-order relational semantics. More importantly, instead of adopting conventional parallel or additive fusion strategies, MRHL introduces a cascaded enhancement mechanism for cross-view integration. This mechanism performs sequential refinement, where representations from preceding views are progressively incorporated to enrich subsequent ones. Through this design, the proposed model enables cross-view semantic interaction and collaborative reinforcement at both the structural and representation levels, leading to a more holistic understanding of complex user mobility patterns.

2.3. Contrastive Learning for Recommendation

Contrastive learning [36,37] has demonstrated impressive effectiveness across domains such as computer vision and natural language processing by enhancing the original supervision signal through auxiliary learning objectives. In addition, it has been applied to challenging fields such as domain generalization, where methods like Contrastive-ACE [38] align causal mechanisms across different domains to extract invariant and robust features, improving model generalization on unseen target data. It enables the extraction of meaningful representations from unlabeled data, thereby reducing reliance on large-scale manual annotations. Given the label-scarcity problem commonly encountered in graph-structured data, several studies [36] have extended contrastive learning into graph representation learning to improve model generalization. In recommender systems, graph contrastive learning has been explored to address challenges like data sparsity and long-tail distributions [39].

To construct contrastive views, early works typically rely on basic perturbation techniques such as node dropping, feature masking, or edge deletion [40]. However, these approaches can unintentionally disrupt semantic consistency in recommendation scenarios. Recent studies have introduced more semantics-preserving alternatives, including singular value decomposition [41], similarity-based sampling [42], and multi-view contrastive learning strategies [43]. To mitigate the noise introduced by traditional data augmentation, NLGCL [44] utilizes naturally existing neighbor layers in GCNs as contrastive views. Furthermore, FastMMRec [45] improves efficiency by restricting graph convolutions to the testing phase, thereby alleviating modality isolation and training bottlenecks in multimodal recommendation.

While state-of-the-art methods [44,45] perform well in general and multimodal collaborative filtering, extending them to next POI recommendation requires handling trajectory sparsity and complex spatio-temporal constraints. Building on these insights, we incorporate graph contrastive learning into the next POI recommendation task. Distinct from prior efforts, we propose a cross-view contrastive learning framework, which dynamically aligns and enhances supervision signals across heterogeneous hypergraph representations. This allows our model to better capture complementary user preferences and generate more accurate and interpretable recommendations.

3. Preliminaries

3.1. Formalization of the Task

Let

U = {u_{1}, u_{2}, . . ., u_{U}}

denote the set of users and

L = {l_{1}, l_{2}, . . ., l_{L}}

denote the set of POIs, where U and L denote the total number of users and POIs, respectively. Each POI

l = (l o n, l a t) \in L

represents a specific geographic location defined by its longitude and latitude coordinates. Each user

u \in U

is associated with a chronological sequence of check-in records denoted as

S_{u} = {l_{u, i}, t_{u, i}) | i = 1, 2, . . ., | S_{u} |}

, where

l_{u, i}

refers to the POI visited and

t_{u, i}

denotes the corresponding timestamp of visit location

l_{u, i}

.

To effectively capture user behavior, we generate training samples by segmenting each user’s complete check-in sequence into multiple trajectory–target pairs. Specifically, for a user’s check-in sequence

S_{u}

, we generate

| S_{u} | - 1

samples in an auto-regressive manner. For the i-th sample, the trajectory is represented as

T_{u, i} = (l_{u, 1}, t_{u, 1}), (l_{u, 2}, t_{u, 2}), \dots, (l_{u, i}, t_{u, i})

. In the next POI recommendation task, the model aims to produce a ranked list of POIs from the entire candidate set

L

, ensuring that the ground-truth POI

l_{u, i + 1}

appears at a high position in the ranking.

3.2. Constructed Hypergraphs

A hypergraph [24,32,34] is an extension of a conventional graph, distinguished by its ability to connect two or more vertices within a single hyperedge. Formally, a hypergraph is defined as

G = {V, E}

, where

V

denotes the set of vertices and

E

denotes the set of hyperedges. To characterize the topology of a weighted hypergraph, a weighted incidence matrix

H \in R^{| V | \times | E |}

is employed, in which

H_{v, ε}

quantifies the connection strength between vertex

v \in V

and hyperedge

ε \in E

. If vertex v is contained in hyperedge

ε

,

H_{v, ε}

represents the corresponding connection weight; otherwise,

H_{v, ε} = 0

. The weight

H_{v, ε}

can be determined using various interaction metrics, such as interaction frequency, time decay or geographical proximity.

4. Methodology

In this section, we provide a comprehensive description of the proposed MRHL framework. Next POI prediction is inherently influenced by multiple heterogeneous factors, and the dominance of these factors may vary across users and time. Motivated by this observation, MRHL adopts a multi-view modeling strategy that explicitly factorizes user check-in behaviors into several complementary behavioral perspectives.

As depicted in Figure 1, the complete pipeline of our methodology can be thoroughly expounded through three primary steps:

Step 1: Multi-view Hypergraph Construction. We begin by constructing multiple factorized hypergraph representations derived from users’ check-in data, informed by three distinct metrics, i.e., interaction frequency, time decay, and geographical proximity. Instead of directly merging all relational information into a single hypergraph, we construct separate hypergraphs for each relation to avoid information interference and better preserve their heterogeneous characteristics.
Step 2: Disentangled Representation Learning. We then perform disentangled representation learning using a hypergraph convolutional network equipped with an enhanced aggregation–propagation mechanism to achieve feature decomposition.
Step 3: Cascaded Fusion and Cross-view Contrastive Learning. Next, based on the hypergraph structures, we integrate the learned representations through a cascaded enhancement strategy to capture multi-view user preferences. Furthermore, cross-view contrastive learning is employed to strengthen the supervisory signals and promote consistency across different views.

Finally, we present the prediction and optimization details. All notations used in this paper are summarized in Table 1.

Figure 1. Illustration of the proposed framework MRHL. (a) Hypergraph learning for interaction frequency, time decay, and geographical proximity. (b) Cascaded enhancement fusion & cross-view contrastive learning.

4.1. Multi-Relational Hypergraph Construction

In the context of next POI recommendation, the interactions between users and POIs exhibit diverse and intricate patterns, encompassing user–POI interaction frequency, time decay in sequential transitions, and the geographical proximity among POIs. Existing studies [16,18] typically employ conventional graph structures, where users and POIs are modeled as nodes and their pairwise connections are represented by edges. However, such graph formulations are inherently restricted to binary relations and fail to capture higher-order neighborhood dependencies under specific semantic contexts. Inspired by the structural flexibility of hypergraphs, we propose three heterogeneous hypergraph designs to comprehensively encode these multi-perspective relationships.

4.1.1. Interaction Frequency Hypergraph

The interaction frequency hypergraph is designed to capture high-order dependencies between users and POIs based on historical visiting frequencies. Specifically, we define the interaction frequency hypergraph as

G_{F} = {V_{F}, E_{F}}

, where

V_{F}

denotes the set of POIs. Each user is associated with a hyperedge that summarizes the POIs visited in their historical sequence

S_{u}

, resulting in a hyperedge set

E_{F}

that covers all users. To quantify the interaction intensity between users and POIs, we construct a frequency-based incidence matrix

H_{F} \in R^{L \times U}

, where each column corresponds to a user (i.e., a hyperedge) and each row corresponds to a POI. Each entry

{(H_{F})}_{l, u}

records the total number of times user u has visited POI l across the entire visiting history. Formally, it can be defined as

{(H_{F})}_{l, u} = \sum_{i = 1}^{| S_{u} |} 1 {l_{u, i} = l},

(1)

where entries corresponding to POIs not visited by user u are set to 0. It is worth noting that although an entire user sequence is conceptually treated as a hyperedge, the hypergraph is implemented as a fixed POI–user incidence matrix. Therefore, the size of each hyperedge is bounded by the number of POIs rather than the length of the user sequence, ensuring stable and efficient computation. This formulation enables the hypergraph to effectively encode both intra-sequence and cross-sequence relationships within user trajectories. By leveraging this enriched representation, the model can more accurately identify users with analogous visiting behaviors and better characterize their varying degrees of preference for different POIs, enhancing its ability to model user intent.

4.1.2. Time Decay Hypergraph

In the conventional hypergraph structures, hyperedges are inherently undirected, which limits their ability to capture directional dependencies such as POI–POI sequential transitions. To address this limitation, we introduce a directed hypergraph to model these sequential relationships more effectively. Specifically, we design a time decay hypergraph

G_{D} = {V_{D}, E_{D}}

, where the vertex set

V_{D}

represents all POIs and the hyperedge set

E_{D}

is constructed by aggregating temporal dependency contexts across all user trajectories. For each user u, every POI in the sequence

S_{u}

is connected to all subsequent POIs, where the edge weight between two POIs is determined by a time decay function:

w_{i, j} = \frac{1}{1 + log (1 + Δ t_{i, j})},

(2)

where

Δ t_{i, j} = (t_{u, j} - t_{u, i}) / 3600

denotes the time interval in hours between visiting

l_{u, i}

and

l_{u, j}

, with

i < j

. If multiple transitions occur between the same POI pair, their corresponding weights are summed. Consequently, the incidence matrix

H_{D} \in R^{L \times L}

encodes the membership strength of POIs to temporal context hyperedges. Formally, each element

{(H_{D})}_{l_{i}, l_{j}}

is computed as follows:

{(H_{D})}_{l_{i}, l_{j}} = \sum_{u \in U} \sum_{i = 1}^{| S_{u} |} \sum_{j = i + 1}^{| S_{u} |} 1 {l_{u, i} = l_{i}, l_{u, j} = l_{j}} w_{i, j},

(3)

where rows correspond to source nodes and columns correspond to target nodes. Although the incidence matrix mathematically resides in an

L \times L

space, user mobility is inherently sparse. In practice, users only transition between a highly limited subset of locations. To ensure scalability and avoid the

O (L^{2})

memory explosion associated with dense structures,

H_{D}

is strictly implemented and stored using a sparse matrix format. By only allocating memory for actual non-zero transitions, the space complexity is drastically reduced to

O (| E_{n n z} |)

, where

| E_{n n z} |

is the number of valid non-zero interactions. This time decay hypergraph captures global transition patterns aggregated across all user trajectories while emphasizing temporally correlated dependencies. Unlike a conventional directed graph that models only pairwise transitions, this hypergraph formulation enables high-order message aggregation among multiple temporally related POIs simultaneously. Such a design allows the model to capture collective temporal context patterns beyond adjacent transitions, which cannot be fully represented by edge-based message-passing mechanisms.

4.1.3. Geographical Proximity Hypergraph

The hypergraph is designed to model the spatial correlations among POIs under distance constraints. Specifically, we construct a hypergraph

G_{P} = {V_{P}, E_{P}}

based on the distances between POIs, where

V_{P}

denotes the set of POIs. In

G_{P}

, hyperedges are formed by connecting POIs whose pairwise Haversine distance does not exceed a predefined threshold

Δ d

. The incidence matrix

H_{P} \in R^{L \times L}

quantitatively represents the spatial relationships between POIs, with each entry

{(H_{P})}_{i, j}

computed as

{(H_{P})}_{i, j} = e^{- d {(l_{i}, l_{j})}^{2}},

(4)

where

d (l_{i}, l_{j})

is the Haversine distance between

l_{i}

and

l_{j}

, with

d (l_{i}, l_{j}) \leq Δ d

. By incorporating this continuous distance-based weighting scheme, the hypergraph is able to capture varying strengths of geographical proximity among POIs, thereby modeling the spatial correlations in a more nuanced and realistic manner.

4.2. Multi-Relational Hypergraph Convolutional Networks

To effectively capture multi-view and multi-relational POI representations from the three heterogeneous hypergraphs, we design customized improvements to the aggregation and propagation mechanisms of the hypergraph convolutional network. Prior to the encoding phase, the POI embedding matrix

E = {[e_{1}, e_{2}, . . ., e_{L}]}^{⊤} \in R^{L \times d}

is first initialized, where d denotes the embedding dimensionality. In the following, we elaborate on the three proposed hypergraph neural network models.

4.2.1. Interaction Frequency Hypergraph Convolutional Network

After constructing the interaction frequency hypergraph, we are able to better characterize the higher-order dependencies among nodes. To this end, we propose an interaction frequency-based hypergraph convolution approach to learn high-level node representations. Specifically, we first derive hyperedge embeddings by aggregating the features of neighboring nodes connected to each hyperedge and then employ these hyperedge embeddings to refine the node representations with higher-order information. Formally, the hyperedge embedding matrix

M_{F} \in R^{U \times d}

is obtained as follows:

M_{F} = B_{F}^{- 1} H_{F}^{⊤} E,

(5)

where

B_{F}

denotes the hyperedge degree matrix,

H_{F}

is the vertex–hyperedge incidence matrix of the frequency hypergraph, and E represents the initialized Node embedding matrix. Subsequently, the updated node embedding matrix

{\bar{E}}_{F} \in R^{L \times d}

can be computed as follows:

{\bar{E}}_{F} = D_{F}^{- 1} H_{F} M_{F},

(6)

where

D_{F}

represents the node degree matrix. Specifically, Equation (5) aggregates features of all nodes connected to each hyperedge, capturing high-frequency co-occurrence patterns of POIs. Equation (6) then updates each node by integrating its incident hyperedges, allowing nodes to incorporate information from other frequently co-selected POIs. Based on the above two steps, our hypergraph convolution process can effectively encode frequency-enhanced information underlying users’ POI selection behavior.

To further enhance the capacity to model complex dependencies among nodes, we employ a multi-layer hypergraph convolution scheme. Within this scheme, where the information propagation from the

(ℓ - 1)

-th layer to the ℓ-th layer for node embedding matrix is formulated as

E_{F}^{ℓ} = {\bar{E}}_{F}^{ℓ} + {\bar{E}}_{F}^{ℓ - 1},

(7)

where

E_{F}^{ℓ}

denotes the node embedding matrix updated at the ℓ-th layer of the hypergraph. These skip connections are strategically employed to enrich the semantic information of nodes and mitigate the over-smoothing phenomenon in hypergraph neural networks. Finally, the embeddings produced by all layers are averaged to compute the final node embeddings

E_{F} = {[e_{1}^{F}, e_{2}^{F}, . . ., e_{L}^{F}]}^{⊤} \in R^{L \times d}

, which can be formulated as

E_{F} = \frac{1}{K + 1} \sum_{ℓ = 0}^{K} E_{F}^{ℓ},

(8)

where K denotes the number of layers in the hypergraph convolution network and

E_{F}^{0}

represents the initialized POI embedding matrix E. In this manner, the updated node representations can effectively capture high-order relational features. Moreover, mean pooling technology is applied during the hypergraph convolution process to enhance both the efficiency and performance of hypergraphs across different views.

4.2.2. Time Decay Hypergraph Convolutional Network

Since frequency hypergraphs are limited in capturing directed relationships during convolution operations, we propose a time decay hypergraph convolutional network to model temporal transition strengths between POIs. This approach emphasizes recent interactions while preserving the global sequential structure. The network also adopts a two-step process in which hyperedges aggregate directionally associated nodes and then update node representations through hyperedges. Similar to the aggregation mechanism of the frequency hypergraph, the time decay hyperedge embedding matrix

M_{D}

and the updated node embedding matrix

{\bar{E}}_{D}

can be expressed as follows:

\{\begin{matrix} M_{D} = B_{D}^{- 1} H_{D}^{⊤} E \\ {\bar{E}}_{D} = D_{D}^{- 1} H_{D} M_{D} \end{matrix},

(9)

where

B_{D}

and

D_{D}

denote the hyperedge degree matrix and the node degree matrix, respectively. From a computational perspective, since

H_{D}

is highly sparse, the hypergraph convolution operations in Equation (9) are executed via Sparse Matrix-Matrix Multiplication (SpMM). Consequently, the propagation process scales linearly with the number of non-zero interactions, with a time complexity of

O (| E_{n n z} | \times d)

. This is significantly more efficient than the

O (L^{2} \times d)

complexity of dense matrix multiplication. This ensures that the runtime remains highly efficient even for datasets like Gowalla with tens of thousands of POIs. The proposed time decay mechanism balances global transitions and short-term preferences, mitigating outdated effects and enhancing adaptability to dynamic user behaviors.

By applying sequential convolution, the time decay hypergraph models the global POI–POI transition patterns. Following the propagation scheme of the K layer in the frequency hypergraph, we derive the POI representations

E_{D}

for the time decay view.

4.2.3. Geographical Proximity Hypergraph Convolutional Network

In proximity view

G_{P}

, the spatial correlation between POIs gradually weakens as the distance increases. To capture this spatial decay effect, we introduce a geographical proximity hypergraph convolutional network. As illustrated in Figure 1, we still employ a node–hyperedge–node message-passing scheme to generate POI representations. Similar to the aggregation mechanisms of the aforementioned hypergraphs, the geographical proximity hyperedge embedding matrix

M_{P}

and the corresponding node embedding matrix

E_{P}

can be expressed as follows:

\{\begin{matrix} M_{P} = B_{P}^{- 1} H_{P}^{⊤} E \\ {\bar{E}}_{P} = D_{P}^{- 1} H_{P} M_{P} \end{matrix},

(10)

where

B_{P}

denotes the hyperedge degree matrix of the proximity hypergraph and

D_{P}

represents its node degree matrix. This design not only enhances the model’s capability to capture spatial correlations but also effectively suppresses noise from distant POIs, thereby improving both interpretability and generalization performance.

Similar to the previous hypergraph convolutional process, we still adopt a K layer network to enrich the semantic representations of nodes and incorporate a skip connection mechanism to alleviate the over-smoothing issue. Consequently, we obtain the node embedding matrix of the proximity hypergraph denoted as

E_{P}

.

By introducing three differentiated aggregation and propagation mechanisms, we can capture diverse POI representations from the views of interaction frequency, time decay, and geographical proximity.

4.3. Cascaded Enhancement Fusion for User Preferences

Based on the above procedure, we obtain the POI embedding matrices

E_{F}

,

E_{D}

, and

E_{P}

from the three distinct hypergraph views. Given a segmented user trajectory

T_{u, i}

, the user preference representations can be derived under different hypergraph views by looking up the corresponding POI embeddings in the matrices and summing them accordingly. Formally, the user preferences are expressed as

\{\begin{matrix} p_{u, i}^{F} = e_{u, 1}^{F} + e_{u, 2}^{F}, . . . e_{u, i}^{F} \\ p_{u, i}^{D} = e_{u, 1}^{D} + e_{u, 2}^{D}, . . . e_{u, i}^{D} \\ p_{u, i}^{P} = e_{u, 1}^{P} + e_{u, 2}^{P}, . . . e_{u, i}^{P} \end{matrix},

(11)

where

p_{u, i}^{F}

indicates the user preference under the interaction frequency view,

p_{u, i}^{D}

pepresents the preference under the time decay view, and

p_{u, i}^{P}

corresponds to the preference under the geographical proximity view. These user representations learned from different views collaboratively work together to drive user behavior.

This subsection aims to explore how to effectively fuse user representations from different views to determine the final preference. Conventional fusion strategies either neglect the interactions among different views or fail to capture fine-grained cross-view relationships, such as linear fusion and adaptive fusion. To address these limitations, we propose a novel cascaded enhancement fusion strategy. Unlike conventional residual connections that perform homogeneous feature aggregation within a shared representation space, the proposed fusion operates across heterogeneous relational views, each encoding distinct behavioral semantics. This method performs sequential cross-view propagation, allowing the user preference information from the preceding view to progressively integrate into the preferences of subsequent views. Although the formulation appears as an additive operation, it should be interpreted as a cross-view semantic enrichment process rather than a simple combination. Specifically, the representation from a preceding view serves as complementary contextual information, which is injected into the subsequent view to enhance its semantic expressiveness. Formally, the process can be expressed as

\{\begin{matrix} p_{u, i}^{D} = p_{u, i}^{F} + p_{u, i}^{D} \\ p_{u, i}^{P} = p_{u, i}^{D} + p_{u, i}^{P} \end{matrix},

(12)

where

p_{u, i}^{D}

denotes the enhanced user preference under the time decay view and

p_{u, i}^{P}

denotes the enhanced preference under the geographical proximity view. Through this sequential refinement process, information from earlier views is progressively accumulated and propagated, enabling later views to capture richer contextual dependencies and cross-view complementary patterns. In this way, the mechanism not only generates more discriminative and robust user representations but also strengthens the correlation and consistency among different view-specific representations.

4.4. Multi-Relational Contrastive Learning

After obtaining the POI embeddings and user preference vectors, we design a multi-relational contrastive learning framework to enforce consistency and correlation across different views. This framework encourages the representations of the same user or POI from different views to be closer in the embedding space, thereby improving the effectiveness of multi-view information fusion. Specifically, the representations of the same user or POI across views are regarded as positive pairs such as

p_{u, i}^{F}

and

p_{u, i}^{D}

, while those of different users or POIs are regarded as negative pairs. Formally, the contrastive loss function for user representations between the interaction frequency and temporal decay views can be defined as follows:

J_{F, D}^{U} = - \frac{1}{R} \sum_{u = 1}^{U} \sum_{i = 1}^{| S_{u} | - 1} log \frac{exp ({(p_{u, i}^{F})}^{⊤} p_{u, i}^{D} / τ)}{\sum_{(u^{'}, j) \in B} exp ({(p_{u, i}^{F})}^{⊤} p_{u^{'}, j}^{D} / τ)},

(13)

where

R = \sum_{u = 1}^{U} (| S_{u} | - 1)

denotes the total number of samples, B represents the mini-batch, and

τ

is the temperature hyperparameter.

This formulation follows a standard contrastive learning objective. The numerator corresponds to the positive pair (i.e., representations of the same user-POI interaction across different views), while the denominator aggregates the similarities from both positive and negative pairs within the mini-batch. By optimizing the contrastive objective, the model increases the similarity of the positive pair while reducing the relative similarity of negative pairs (where

(u^{'}, j) \neq (u, i)

), thereby effectively pushing dissimilar representations apart in the embedding space.

Based on the above formulation, the contrastive loss between the frequency view and the proximity view for users can be expressed as

J_{F, P}^{U}

, while that between the decay view and the proximity view is denoted as

J_{D, P}^{U}

. We then aggregate the contrastive losses across any pair of views, yielding the overall contrastive loss for modeling user preferences as

J_{S C L}^{U} = J_{F, D}^{U} + J_{F, P}^{U} + J_{D, P}^{U} .

(14)

The contrastive learning objective is symmetrically applied to POI representations across different relational views. Specifically, for each POI, its embeddings obtained from different views (i.e., frequency, time decay, and proximity) are treated as positive pairs, while embeddings of other POIs are treated as negative pairs. Following the same formulation as in (13)–(14), the contrastive losses between different view pairs are computed and aggregated, yielding the final contrastive loss for POI embeddings, denoted as

J_{S C L}^{L}

. By summing the contrastive losses of users and POIs, we derive the overall final contrastive loss as

J_{S C L} = J_{S C L}^{U} + J_{S C L}^{L} .

(15)

4.5. Prediction and Optimization

As the geographical proximity pattern is sparser and less expressive compared with the decay and frequency patterns, we enhance the user geographical proximity representation

p_{u, i}^{P}

. To incorporate user preferences from multiple views into the geographical proximity view, we employ a concatenation operation denoted by ⊕. In particular, the geographical preference is first concatenated with the frequency preference

p_{u, i}^{F}

and the time decay preference

p_{u, i}^{D}

. The combined vector is then passed through a multi-layer perceptron (MLP) to obtain the refined geographical representation

p_{u, i}^{P} = MLP (p_{u, i}^{P} \oplus p_{u, i}^{F} \oplus p_{u, i}^{D}) \in R^{d}

.

By leveraging the user preference representations

p_{u, i}^{F}

,

p_{u, i}^{D}

,

p_{u, i}^{P} \in R^{d}

obtained from different views together with the corresponding POI embeddings

E_{F}

,

E_{D}

,

E_{P} \in R^{L \times d}

, the interaction score between user u in a specific trajectory

T_{u, i}

and POIs is defined as

{\hat{Y}}_{u, i} = p_{u, i}^{F} \cdot E_{F}^{⊤} + p_{u, i}^{D} \cdot E_{D}^{⊤} + p_{u, i}^{P} \cdot E_{P}^{⊤},

(16)

where

{\hat{Y}}_{u, i} = {{\hat{y}}_{u, i}^{1}, {\hat{y}}_{u, i}^{2}, . . ., {\hat{y}}_{u, i}^{L}}

; each value

{\hat{y}}_{u, i}^{j}

represents the preference score of user u for POI

l_{j}

.

To optimize the alignment between the predicted distribution and the actual user behavior, we adopt the cross-entropy loss function, which is formally defined as

J_{C E} = - \sum_{u = 1}^{U} \sum_{i = 1}^{| S_{u} - 1 |} y_{u, i}^{k} log ({\hat{y}}_{u, i}^{k}) + λ {∥ Θ ∥}_{2}^{2},

(17)

where

y_{u, i}^{k} = 1

denotes the ground-truth label, with k indicating the index of the POI that is actually visited.

{∥ Θ ∥}_{2}^{2}

represents the

L_{2}

regularization over all model parameters to mitigate overfitting and

λ

is the weight coefficient for the

L_{2}

regularization term.

The overall loss of the model can be formulated as a combination of the contrastive learning loss and the cross-entropy loss, expressed as follows:

J_{T otal} = J_{C E} + λ_{1} J_{S C L},

(18)

where the coefficient

λ_{1}

is introduced to balance the two types of losses.

5. Experiments

In this section, we provide a systematic account of the experimental design and comprehensively assess the performance of MRHL from multiple perspectives. Specifically, our evaluation focuses on overall recommendation effectiveness, the contribution of different components, the impact of the cascaded fusion strategy, hyperparameter sensitivity, and performance in highly sparse scenarios.

5.1. Experimental Settings

5.1.1. Data Sets

We conduct experiments on three widely used real-world check-in datasets, namely, New York (NYC), Tokyo (TKY), and Gowalla. These datasets exhibit different levels of sparsity and distinct patterns of user mobility trajectories, as detailed below.

NYC and TKY (https://sites.google.com/site/yangdingqi/home/foursquare-dataset, accessed on 1 April 2026): These two datasets are derived from the Foursquare platform, comprising user check-in records collected in the metropolitan areas of Tokyo and New York City. The collection period spans from 12 April 2012 to 16 February 2013.
Gowalla (https://snap.stanford.edu/data/loc-gowalla.html, accessed on 1 April 2026): This dataset captures user check-in activities on the Gowalla platform worldwide, covering the period from February 2009 to October 2010, and contains rich spatial and temporal information.

Following prior work to ensure data quality [12,16], users and POIs with fewer than 5 interactions are filtered out from the NYC and TKY datasets. For the Gowalla dataset, users with fewer than 20 interactions and POIs visited fewer than 30 times are removed, and only users who interacted with at least 10 locations are retained. Table 2 summarizes the statistics of the three datasets. Each user’s check-in records are arranged in chronological order, with the first 80% used for training, the middle 10% for validation, and the remaining 10% for testing. This strict chronological partition is widely adopted as the standard evaluation protocol in sequence-based recommendation tasks. Unlike random cross-validation, it effectively prevents temporal data leakage, ensuring that future check-ins are not inappropriately utilized to predict past mobility behaviors.

It is worth noting that while these datasets were collected in earlier years, they remain the most widely adopted and standardized benchmarks in the POI recommendation community. Due to increasingly strict global privacy regulations in recent years, the public release of contemporary, large-scale, and fine-grained human mobility trajectories has been heavily restricted. Therefore, utilizing these established benchmark datasets is essential to ensure fair, rigorous, and reproducible comparisons with existing state-of-the-art baseline models.

5.1.2. Baselines

To evaluate the effectiveness of MRHL, we compare it with seven baseline methods, where two belong to general recommendation models (SAE-NAD and TEMN) and the remaining five correspond to sequential recommendation models (STGCN, PLSPL, TCKT, PG2Net and Diff-POI).

SAE-NAD [12] employs a multi-dimensional attention mechanism to adaptively model the varying importance of user preferences across different dimensions.
TEMN [3] proposes a deep architecture that nonlinearly integrates topic modeling with memory networks, thereby leveraging both the global structure of latent patterns and the advantages of local neighborhood-based features.
STGCN [18] constructs a multi-graph representation of user records to integrate all contextual information and designs scoring functions to capture users’ periodic patterns for recommendation.
PLSPL [8] jointly models users’ long- and short-term preferences via attention and parallel LSTMs, which are linearly combined to characterize the user preference.
LSTPM [28] employs a nonlocal network to model long-term preferences and utilizes a geo-dilated LSTM to capture non-consecutive geographical correlations.
PG2Net [46] captures users’ personalized preferences and group-level spatio-temporal preferences through Bi-LSTM-based sequential modeling and auxiliary representation learning.
Diff-POI [16] employs two graph modules to model users’ visit sequences and spatial features and incorporates a diffusion-based sampling strategy to capture visit trends.

5.1.3. Evaluation Metrics

To evaluate the performance of each model, we adopt three widely used metrics in sequential recommendation, including Recall@K, Normalized Discounted Cumulative Gain (NDCG@K), and Mean Reciprocal Rank (MRR).

Recall@K measures the proportion of ground-truth POIs successfully captured within the top-K recommended results, and is defined as follows:

Recall @ K = \frac{1}{R} \sum_{i = 1}^{R} \frac{| C_{r e c}^{i} \cap C_{v i s i t e d}^{i} |}{| C_{v i s i t e d}^{i} |},

(19)

where R denotes the total number of samples,

C_{r e c}^{i}

represents the set of top K recommended POIs, and

C_{v i s i t e d}^{i}

indicates the ground-truth POI visited by the user. In the next POI recommendation task,

| C_{v i s i t e d}^{i} | = 1

.

NDCG@K evaluates the ranking quality by assigning exponentially decayed weights to positions within the top K recommendations, and its formulation for the next POI recommendation task is given as follows:

NDCG @ K = \frac{1}{R} \sum_{i = 1}^{R} \frac{1}{{log}_{2} (R a n k_{i} + 1)},

(20)

where

R a n k_{i}

denotes the position of the ground-truth POI within the top K recommendation list. If the POI does not appear in the list,

R a n k_{i}

is set to ∞.

MRR measures the reciprocal rank of the ground-truth position in the recommendation list, serving to quantify the overall ranking performance, and is defined as follows:

MRR = \frac{1}{R} \sum_{i = 1}^{R} \frac{1}{R a n k_{i}} .

(21)

5.1.4. Parameter Settings

All experiments are conducted on an NVIDIA 3080Ti GPU using the PyTorch (version 2.8.0) framework. For the baseline methods, we adopt the parameter settings reported in the original papers and perform hyperparameter tuning on the three datasets. For training the MRHL model, we employ the Adam optimizer with a learning rate of

1 \times 10^{- 3}

and weight decay of

5 \times 10^{- 4}

. The embedding dimension for users and POIs is set to

d = 64

, and the batch size is fixed at 1024. For data preprocessing, the order of the auto-regressive model is set to 100. For trajectories longer than this threshold, only the most recent 100 POIs are retained to preserve the latest user preferences. For shorter sequences, zero-padding is applied to ensure uniform input dimensions for efficient batch computation. Following prior empirical settings, the distance threshold

Δ d

is set to 1 km. The number of convolutional layers is tuned from

{1, 2, 3, 4, 5}

, and the temperature parameter

τ

is tuned within

{0.07, 0.1, 0.3, 0.5, 0.7}

. The regularization weight

λ

is set to

1 \times 10^{- 3}

, and the balance coefficient

λ_{1}

is fixed at 0.1.

5.2. Performance Comparison with Baselines

The overall performance of all baseline methods and the proposed MRHL model is summarized in Table 3. Although earlier methods such as STAN and GETNEXT are relevant, they are not included in the empirical comparison. More recent methods (e.g., PG2Net and Diff-POI) have been shown to achieve superior performance under similar evaluation settings. To verify the robustness of our model, all results are reported as the mean of five independent runs with different random initializations. The variance across runs is consistently negligible, with all standard deviations below 0.002. In addition, a paired t-test shows that the improvements of MRHL over the best-performing baselines are statistically significant (

p < 0.01

). Based on these results, several key findings can be drawn. The proposed MRHL model consistently surpasses all baseline methods across the three benchmark datasets under various evaluation metrics. This superior performance primarily stems from two design aspects. First, MRHL constructs multi-relational hypergraphs guided by interaction frequency, time decay, and geographical proximity information. This design enables the model to capture user preferences from multiple views and enhances semantic expressiveness through an improved hypergraph convolution process. Second, a multi-view contrastive objective is introduced to encourage information exchange across different views. This mechanism enhances the representation capacity of each specific view and allows the model to leverage self-supervised signals to uncover richer and more comprehensive recommendation patterns.

In the baseline comparison, we first observe that sequence-based recommendation models consistently outperform general recommendation models. This advantage arises from the stronger influence of users’ historical trajectories on their subsequent behavioral patterns, enabling sequence models to more effectively capture latent preferences and deliver superior recommendation performance. Among the sequence-based baselines, PLSPL, LSTPM, and PG2Net are representative LSTM-based variants that explicitly model users’ sequential mobility patterns. These LSTM-based methods demonstrate competitive performance by effectively modeling sequential dependencies in user trajectories. However, their improvements remain limited by relying primarily on single-view sequential information, which restricts their ability to exploit high-order relations and complementary signals from multiple perspectives. At a more advanced modeling level, Diff-POI disentangles the effects of geographical constraints and sequential user interactions, thereby achieving substantial improvements over other baselines across three different evaluation metrics. Building upon this line of research, our proposed MRHL framework, which explicitly separates hypergraphs from three distinct views, further surpasses all baseline methods. Collectively, these results demonstrate the importance and necessity of constructing multi-view representations through explicit disentangled learning for effectively modeling user—POI relationships.

In the context of disentangled multi-view representation learning, MRHL exhibits a clear advantage over the second-best model Diff-POI. Particularly, it improves the MRR by 11.70% on the NYC dataset, highlighting its superior capability in capturing user preferences. This performance advantage can be attributed to three key factors. First, MRHL effectively captures non-consecutive POI information in user trajectories. For instance, the time decay hypergraph goes beyond adjacent transitions and models the complete transition patterns. Second, by leveraging the hypergraph structure, MRHL is more capable of capturing high-order signals than GNN-based Diff-POI. Hyperedges can simultaneously connect multiple nodes, thereby alleviating data sparsity and oversmoothing while enriching semantic representations. Finally, the proposed contrastive learning framework effectively enhances the complementary effects across different views in a self-supervised manner, thereby further strengthening the overall recommendation performance and improving the robustness of user preference modeling.

Most next POI recommendation models achieve relatively high performance on dense urban datasets such as NYC and TKY, but their effectiveness decreases significantly on sparse datasets like Gowalla, where user check-ins are distributed globally. This discrepancy arises because dense datasets provide continuous and localized trajectories that facilitate sequential transition modeling and neighborhood aggregation. In contrast, the sparsity and discontinuity in Gowalla make it difficult to capture such patterns. However, our MRHL model exhibits a more stable performance gain on the sparse Gowalla dataset. By constructing multiple relational hypergraphs from complementary perspectives, MRHL is able to alleviate trajectory sparsity by modeling high-order associations among POIs and users. Moreover, the proposed multi-relational contrastive learning strategy encourages consistency across different views, enabling the model to extract robust signals even when sequential information is limited. As a result, MRHL achieves consistently superior performance on Gowalla compared with representative trajectory modeling and relational learning approaches, demonstrating its effectiveness in sparse and globally distributed scenarios.

In summary, the outstanding performance of MRHL across diverse datasets demonstrates its effectiveness and broad applicability in next POI recommendation. By constructing multi-view hypergraphs and incorporating contrastive learning, the model can more precisely capture the complex structures and relational patterns within user-POI interaction networks. This enables the learning of more accurate and personalized user preferences and node representations, ultimately leading to significant improvements in recommendation performance. It is worth noting that the evaluated datasets exhibit substantially different characteristics in terms of spatial scale, data sparsity, and user mobility patterns. Despite these differences, MRHL consistently achieves superior performance across all datasets, indicating that the proposed framework is not tailored to a specific controlled setting but can effectively adapt to more complex and dynamic recommendation scenarios.

5.3. Ablation Study

5.3.1. Impact of Different Components

To evaluate the impact of each component in MRHL, we conduct ablation studies on three datasets to analyze their respective contributions. The complete MRHL is regarded as the base model, from which four different variants are derived by removing specific components.

w/o P-HG: This variant removes the geographical proximity hypergraph module.
w/o F-HG: This variant removes the interaction frequency hypergraph module.
w/o D-HG: This variant removes the time decay hypergraph module.
w/o CL: This variant removes contrastive learning between different views.

The results of the ablation study are presented in Table 4, from which the following conclusions can be drawn. Throughout all datasets, the full MRHL model maintains consistently superior performance across all metrics, demonstrating its robustness and effectiveness. On both the high-density NYC and medium-density TKY datasets, removing the geographical proximity hypergraph leads to the largest performance drop, underscoring the dominant role of spatial correlations in improving recommendation accuracy. The exclusion of the interaction frequency hypergraph also results in noticeable degradation, particularly on TKY, indicating the importance of visit frequency for modeling in medium-density scenarios. In contrast, removing the time decay hypergraph or contrastive learning module results in only a minor performance drop. This indicates that although temporal dynamics and contrastive signals provide additional benefits, their importance is less pronounced than that of spatial and frequency information in relatively dense datasets.

The ablation study on the Gowalla dataset reveals that removing the time decay hypergraph leads to the most significant performance degradation. This highlights that in the lowest-density scenario, user behaviors exhibit strong sequential dependencies, and modeling sequential transitions plays a central role in capturing user interests. In contrast, eliminating the geographical proximity and interaction frequency hypergraphs also results in notable performance decreases, indicating that spatial constraints and interaction intensity remain effective auxiliary factors. The exclusion of the contrastive learning module produces only a relatively minor decline, suggesting that its primary contribution lies in enhancing representation robustness.

In summary, the ablation results indicate that the modules of MRHL exhibit complementary effects across different datasets. Spatial correlations play a dominant role in high-density scenarios, while time decay is crucial in sparse environments. Meanwhile, interaction frequency and contrastive learning further enhance the robustness of the model under diverse data conditions.

5.3.2. Impact of Different Fusion Strategies

To further validate the effectiveness of the proposed cascaded fusion strategy, we conduct a comparative study against two widely used approaches, namely, linear fusion and adaptive fusion. Figure 2 presents the performance comparison of the three fusion strategies on the NYC, TKY, and Gowalla datasets.

On the NYC dataset with the highest density, all fusion methods achieve strong performance. Nevertheless, cascaded fusion consistently outperforms both linear and adaptive fusion across all evaluation metrics. This suggests that even simple fusion remains competitive in dense data, while the cascaded design offers additional advantages. For the TKY dataset with moderate density, adaptive fusion already surpasses linear fusion, reflecting the benefit of dynamically adjusting the contributions of different components. However, cascaded fusion further improves performance on all metrics, indicating that the cascaded strategy enhances robustness and more effectively leverages intermediate signals. On the sparser Gowalla dataset, the performance gaps among methods become most pronounced. Linear fusion performs the weakest, while adaptive fusion yields moderate improvements. In contrast, cascaded fusion achieves the highest scores across all evaluation metrics. These results highlight that the cascaded design is especially effective in sparse data, where higher-order collaborative signals are crucial.

5.4. Hyperparameter Analysis

Theoretical analysis alone is insufficient to identify the optimal hyperparameter configuration. Therefore, we conduct sensitivity experiments to assess the impact of different hyperparameter settings on model performance, providing a basis for model optimization and tuning. Given the complex dependence of the MRHL model on the number of hypergraph convolution layers and the temperature in contrastive learning, this subsection focuses on the analysis of these two key hyperparameters.

5.4.1. Sensitivity to the Number of Layers

To explore the impact of the hypergraph convolutional layer, we conducted a hyperparameter study on the number of layers

K \in {1, 2, 3, 4, 5}

. Figure 3 depicts the performance trends with varying layer numbers on three datasets. As the depth increases from 1 to 3, all metrics (Recall@5/10 and NDCG@5/10) consistently improve, demonstrating the benefit of capturing higher-order dependencies through deeper propagation. However, beyond three layers, the curves either slightly decline or plateau, suggesting that excessive depth introduces redundant information and oversmoothing, which undermine discriminative capacity. This suggests that

K = 3

achieves a favorable balance between representational capacity and avoidance of oversmoothing and noise accumulation; thus, we use

K = 3

as the default depth in subsequent experiments.

5.4.2. Sensitivity to Temperature

To investigate the sensitivity of the contrastive learning to the temperature

τ

, we vary

τ \in {0.07, 0.1, 0.3, 0.5, 0.7}

. Figure 4 presents the Recall@5/10 and NDCG@5/10 curves across three datasets, illustrating the effect of the temperature

τ

on model performance. Overall, the curves remain relatively smooth, indicating that the model is not highly sensitive to

τ

, but clear differences can still be observed. On TKY, the performance consistently increases from

τ = 0.07

to

τ = 0.1

, reaching its peak at

τ = 0.1

, with the highest Recall@5/10 and NDCG@5/10 values, after which the curves show a gradual decline as

τ

further increases. A similar trend is observed on NYC, where the best results are also achieved at

τ = 0.3

, followed by a slight decrease. On Gowalla, the curves fluctuate mildly, but the best performance is observed near

τ = 0.1

, with larger

τ

values leading to degradation. These results demonstrate that a properly tuned

τ

is essential for effective contrastive learning. Extremely small or large values tend to produce suboptimal embedding representations, whereas moderate values lead to more robust and discriminative performance.

5.5. Sparsity Analysis

To investigate whether the proposed model can alleviate the data sparsity issue in next POI recommendation, we design two types of sparsity evaluation experiments. The first experiment is conducted at the user level. Users are divided into three groups based on the number of complete trajectories in the training set, with the top 30% considered active users, the bottom 30% considered inactive users, and the rest are normal users. Moreover, all samples belonging to the same user in the test set inherit that user’s activity category. The second experiment is sample-level, where groups are determined by the trajectory length of each test sample. Specifically, the top 30% of samples are classified as active samples, the bottom 30% as inactive samples, and the remaining as normal samples. Unlike the user-level setting, this allows samples from the same user to fall into different groups.

As shown in Figure 5, the user-level results reveal that active users consistently achieve the best performance across all datasets, while inactive users perform the worst, highlighting the challenge of modeling long-term inactive users. Normal users are situated between the two extremes, reflecting the model’s ability to partially generalize to users with moderate interaction histories. Moreover, the performance degradation is more pronounced on the Gowalla dataset, underscoring the severe challenges posed by extreme sparsity. These results indicate that the proposed model can effectively leverage rich interaction histories, while performance degradation under extreme user-level sparsity remains a common challenge for next POI recommendation models. Nevertheless, MRHL still maintains stable relative performance across datasets, suggesting a certain degree of robustness even in highly sparse and irregular user behavior settings.

As shown in Figure 6, the sample-level results exhibit a trend distinct from the user-level setting. Normal samples generally achieve comparable or even superior performance to active samples. For instance, in the NYC and TKY datasets, normal samples slightly outperform active ones across all metrics, suggesting that the trajectory length of individual samples plays a more direct role in model prediction. Moreover, inactive samples also demonstrate relatively stable performance in this scenario, indicating that the model can effectively capture sequential information even without rich historical records. This suggests that MRHL is less dependent on long sample trajectories and can generalize to dynamic scenarios where only short-term or partial behavioral contexts are available. Overall, these findings suggest that MRHL can effectively leverage short-term trajectory information to alleviate sample-level sparsity and support robust preference learning in dynamic scenarios.

In summary, both experiments consistently demonstrate that although the model continues to be constrained by sparsity, it nevertheless exhibits notable robustness and stability in handling sparse sequences. This advantage can be attributed to its effective exploitation of high-order association signals, such as complex inter-user similarities, which facilitate the transfer of knowledge from active to inactive users. Consequently, even users with sparse trajectories can leverage richer association signals, thus leading to improved recommendation quality under highly sparse conditions. More importantly, these sparsity analysis results provide empirical evidence that MRHL can generalize beyond idealized or densely observed settings. Instead of relying solely on dense historical interactions, the proposed framework leverages multi-view hypergraph structures and high-order relational modeling, enabling it to remain effective and robust to different trajectory patterns in complex and dynamic environments characterized by sparse, incomplete, or rapidly evolving user behaviors. It is worth noting that our sparsity analysis focuses on highly sparse scenarios, rather than strict cold-start cases involving entirely unseen users or POIs. Handling completely unseen entities would require incorporating additional side information or designing inductive learning mechanisms, which is beyond the current scope of this study. Extending the proposed framework to such inductive settings remains an important direction for future work.

5.6. Efficiency Analysis

To evaluate the computational efficiency and practical feasibility of the proposed MRHL model, we conduct experiments on the Gowalla dataset. Among the datasets utilized in our study, Gowalla features the largest scale and the highest data sparsity, representing the most challenging scenario for computational costs. This rigorous setting allows us to thoroughly assess the scalability of different models in a realistic and demanding recommendation environment. We compare MRHL with several representative baseline methods that span diverse modeling paradigms and exhibit distinct computational characteristics. Specifically, these baselines include the LSTM-based PG2Net, the spatio-temporal graph neural network STGCN, and the diffusion-based Diff-POI.

As shown in Table 5, MRHL achieves a favorable trade-off between model complexity and efficiency. Specifically, MRHL contains 3.16 M parameters, which is significantly smaller than PG2Net (19.84 M) and comparable to STGCN (3.96 M), indicating a relatively compact model design. In terms of training efficiency, MRHL requires 42.96 min per epoch, which is notably faster than PG2Net and Diff-POI. This demonstrates that MRHL maintains highly competitive training efficiency despite the intricate modeling of multi-relational hypergraph structures. For inference, MRHL achieves a low latency of 3.26 ms. While marginally slower than the lightweight STGCN, this inference speed is highly efficient and easily satisfies the strict low-latency requirements of real-world recommendation systems, especially given MRHL’s superior capacity to capture high-order collaborative signals.

Regarding GPU Memory consumption, MRHL utilizes 9568 MB, which is higher than the baseline models. This increased memory footprint is an acceptable trade-off for constructing multi-relational hypergraphs, which naturally require storing extensive high-order incidence matrices. Nevertheless, this memory cost remains well within the capacity of mainstream commercial GPUs (e.g., 11 GB or 12 GB VRAM), ensuring that MRHL maintains a practical balance between sophisticated modeling capability and real-world deployability.

Overall, the results demonstrate that MRHL achieves a strong efficiency-performance balance under the most challenging settings. It remains computationally feasible while effectively capturing complex multi-relational dependencies, highlighting its scalability and suitability for real-world large-scale POI recommendation scenarios.

6. Discussion

Our empirical evaluations systematically demonstrate the superiority of the proposed MRHL framework. Our experimental results show that effectively disentangling multi-view behaviors plays a critical role in improving next POI recommendation. Specifically, the ablation studies confirm that our cascaded fusion strategy and cross-view contrastive learning module are fundamental drivers in capturing complementary behavioral signals and mitigating data sparsity.

Beyond the empirical performance, our findings offer important theoretical insights. They also contribute to the broader literature on graph-based recommendation systems. As discussed in Section 2, many existing approaches, including unified graph models and homogeneous hypergraphs (e.g., [18,22]), are inherently limited due to their reliance on confounded user representations. Our work explicitly disentangles interaction frequency, time decay, and geographical proximity, thereby validating the hypothesis from recent multi-intent studies (e.g., [24,25]) that factorized modeling improves interpretability. Moreover, our methodology significantly advances this paradigm. Instead of relying on independent auxiliary signals or shallow fusion strategies [24,25], MRHL leverages cross-view synergistic reinforcement to address long-standing sparsity and integration issues in mobility forecasting. These findings further confirm and extend prior studies on multi-view learning, demonstrating that effective cross-view interaction is not only beneficial but essential for robust representation learning in complex recommendation scenarios.

From a practical perspective, the implications of this study extend beyond next POI recommendation to related fields. The ability to extract fine-grained user intents from mobility trajectories provides significant value for Location-Based Service (LBS) providers, allowing them to deliver more context-aware and personalized marketing strategies. Furthermore, the large-scale mobility patterns derived from our multi-relational hypergraphs can be effectively applied to smart city planning and urban computing. For instance, urban planners and transportation authorities can leverage these disentangled behavioral patterns to optimize infrastructure allocation, improve traffic management, and support the development of more sustainable urban environments.

7. Conclusions

In this paper, we propose a novel POI recommendation model named MRHL. The model leverages disentangled hypergraphs, namely, interaction frequency, time decay, and geographical proximity, to comprehensively capture the complex intrinsic correlations among POIs from multiple views. Specifically, the frequency hypergraph reflects users’ preference intensity toward different locations, the decay hypergraph characterizes the transitional relations in evolving user sequences, and the proximity hypergraph captures the influence of spatial correlations on user behaviors. By enhancing the hypergraph propagation process and designing a cascaded fusion strategy, MRHL enriches POI embeddings and integrates multi-view user–POI relations more effectively. In addition, we introduce cross-view contrastive learning to capture complementary effects and strengthen the discriminative power of representations, thereby improving robustness and generalization under sparse or noisy conditions.

In future work, we will focus on systematically uncovering the latent intentions embedded in user–POI interactions, aiming to better capture the implicit information of user decision-making in POI recommendation. By disentangling the different driving factors of user behaviors, such as functional needs, social influence, and contextual preferences, we aim to obtain a more precise and finer-grained representation of user decision-making. This direction is expected to not only improve the accuracy of next POI recommendation but also enhance the interpretability of the model, enabling clearer explanations for recommendation outcomes. We also plan to explore ways to leverage disentangled representations to enhance the system’s robustness in sparse or noisy environments, ultimately contributing to the development of more reliable and user-friendly recommendation systems. Despite the promising results, we acknowledge several limitations in our current study, which naturally pave the way for further extensions. First, while our hypergraphs effectively capture complex correlations, they inherently treat nodes within a hyperedge as an unordered set. To explicitly emphasize the strict sequential nature of trajectories, future studies could explore incorporating Temporal Position Encoding into the hypergraph nodes. Second, another structural consideration is that the current model assumes fixed relationships within the hypergraphs, which may limit its ability to handle dynamic sparsity. Incorporating an attention mechanism into hyperedge convolutions to dynamically weigh POIs under specific contexts (e.g., time of day) is a promising direction to improve intent accuracy. Lastly, we acknowledge that our current multi-relational hypergraph is constructed based on a fixed set of users under a transductive setting. In real-world scenarios with dynamic user populations, the system relies on periodic offline retraining. Therefore, another important direction for future work is to explore inductive hypergraph learning and dynamic graph update mechanisms. This allows the model to incrementally incorporate new users without full graph reconstruction, enabling adaptation to dynamic recommendation scenarios.

Author Contributions

Conceptualization, S.Z. and S.H.; methodology, S.Z.; software, S.Z. and C.C.; validation, S.Z.; formal analysis, S.H.; writing—original draft preparation, S.Z.; writing—review and editing, S.H.; visualization, S.Z.; supervision, C.C.; project administration, S.Z. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 62502040 and the Postdoctoral Fellowship Program of China Postdoctoral Science Foundation under Grant GZC20251056.

Data Availability Statement

The data used in this study are publicly available online or are available from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Zeng, F.; Zhang, N.; Chen, Z.; Zhou, L.; Huang, M.; Zhu, T.; Wang, J. Multitask learning using feature extraction network for smart tourism applications. IEEE Internet Things J. 2023, 10, 18790–18798. [Google Scholar] [CrossRef]
Zhou, W.; Guo, F.; Xu, H.; Wang, Z. Joint friend and item recommendation based on multidimensional feature reciprocal interaction in social e-commerce. Electron. Commer. Res. Appl. 2024, 65, 101406. [Google Scholar] [CrossRef]
Zhou, X.; Mascolo, C.; Zhao, Z. Topic-enhanced memory networks for personalised point-of-interest recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3018–3028. [Google Scholar]
Wang, Z.; Zhu, Y.; Liu, H.; Wang, C. Learning Graph-based Disentangled Representations for Next POI Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1154–1163. [Google Scholar]
Gao, C.; Zheng, Y.; Li, N.; Li, Y.; Qin, Y.; Piao, J.; Quan, Y.; Chang, J.; Jin, D.; He, X.; et al. A survey of graph neural networks for recommender systems: Challenges, methods, and directions. ACM Trans. Recomm. Syst. 2023, 1, 3. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing Personalized Markov Chains for Next-Basket Recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
Cheng, C.; Yang, H.; Lyu, M.R.; King, I. Where you like to go next: Successive point-of-interest recommendation. In Proceedings of the IJCAI; AAAI Press: Washington, DC, USA, 2013; Volume 13, pp. 2605–2611. [Google Scholar]
Wu, Y.; Li, K.; Zhao, G.; Qian, X. Personalized Long- and Short-term Preference Learning for Next POI Recommendation. IEEE Trans. Knowl. Data Eng. 2020, 34, 1944–1957. [Google Scholar] [CrossRef]
Lim, N.; Hooi, B.; Ng, S.K.; Goh, Y.L.; Weng, R.; Tan, R. Learning Hierarchical Spatial Tasks with Visiting Relations for Next POI Recommendation. ACM Trans. Recomm. Syst. 2023, 1, 20. [Google Scholar] [CrossRef]
Wu, Y.; Li, K.; Zhao, G.; Qian, X. Long- and Short-term Preference Learning for Next POI Recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2301–2304. [Google Scholar]
Feng, S.; Meng, F.; Chen, L.; Shang, S.; Ong, Y.S. Rotan: A Rotation-based Temporal Attention Network for Time-specific Next POI Recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and data Mining, Barcelona, Spain, 25–29 August 2024; pp. 759–770. [Google Scholar]
Ma, C.; Zhang, Y.; Wang, Q.; Liu, X. Point-of-interest recommendation: Exploiting self-attentive autoencoders with neighbor-aware influence. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 697–706. [Google Scholar]
Luo, Y.; Liu, Q.; Liu, Z. STAN: Spatio-Temporal Attention Network for Next Location Recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2177–2185. [Google Scholar]
Kong, D.; Wu, F. HST-LSTM: A Hierarchical Spatial-Temporal Long-Short Term Memory Network for Location Prediction. In Proceedings of the Ijcai; AAAI Press: Washington, DC, USA, 2018; Volume 18, pp. 2341–2347. [Google Scholar]
Xu, Y.; Cong, G.; Zhu, L.; Cui, L. MMPOI: A Multi-modal Content-aware Framework for POI Recommendations. In Proceedings of the ACM Web Conference 2024, Austin, TX, USA, 13–17 May 2024; pp. 3454–3463. [Google Scholar]
Qin, Y.; Wu, H.; Ju, W.; Luo, X.; Zhang, M. A Diffusion Model for POI Recommendation. ACM Trans. Inf. Syst. 2023, 42, 54. [Google Scholar] [CrossRef]
Xin, H.; Lu, X.; Xu, T.; Liu, H.; Gu, J.; Dou, D.; Xiong, H. Out-of-town recommendation with travel intention modeling. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2021; Volume 35, pp. 4529–4536. [Google Scholar]
Han, H.; Zhang, M.; Hou, M.; Zhang, F.; Wang, Z.; Chen, E.; Wang, H.; Ma, J.; Liu, Q. STGCN: A Spatial-Temporal Aware Graph Learning Method for POI Recommendation. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM); IEEE: New York, NY, USA, 2020; pp. 1052–1057. [Google Scholar]
Zuo, J.; Zhang, Y. Diff-DGMN: A Diffusion-based Dual Graph Multi-attention Network for POI Recommendation. IEEE Internet Things J. 2024, 11, 38393–38409. [Google Scholar] [CrossRef]
Lai, Y.; Su, Y.; Wei, L.; Wang, T.; Zha, D.; Wang, X. Adaptive Spatial-Temporal Hypergraph Fusion Learning for Next POI Recommendation. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2024; pp. 7320–7324. [Google Scholar]
Yang, S.; Liu, J.; Zhao, K. GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1144–1153. [Google Scholar]
Yan, X.; Song, T.; Jiao, Y.; He, J.; Wang, J.; Li, R.; Chu, W. Spatio-Temporal Hypergraph Learning for Next POI Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 403–412. [Google Scholar]
Lai, Y.; Su, Y.; Wei, L.; Chen, G.; Wang, T.; Zha, D. Multi-view spatial-temporal enhanced hypergraph network for next POI recommendation. In Proceedings of the International Conference on Database Systems for Advanced Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 237–252. [Google Scholar]
Li, Y.; Gao, C.; Yao, Q.; Li, T.; Jin, D.; Li, Y. DisenHCN: Disentangled Hypergraph Convolutional Networks for Spatiotemporal Activity Prediction. arXiv 2022, arXiv:2208.06794. [Google Scholar] [CrossRef]
Qin, Y.; Wang, Y.; Sun, F.; Ju, W.; Hou, X.; Wang, Z.; Cheng, J.; Lei, J.; Zhang, M. DisenPOI: Disentangling Sequential and Geographical Influence for Point-of-Interest Recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February– 3 March 2023; pp. 508–516. [Google Scholar]
Xie, X.; Sun, F.; Liu, Z.; Wu, S.; Gao, J.; Zhang, J.; Ding, B.; Cui, B. Contrastive learning for sequential recommendation. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE); IEEE: New York, NY, USA, 2022; pp. 1259–1273. [Google Scholar]
He, J.; Li, X.; Liao, L.; Song, D.; Cheung, W. Inferring a personalized next point-of-interest recommendation model with latent behavior patterns. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2016; Volume 30. [Google Scholar]
Sun, K.; Qian, T.; Chen, T.; Liang, Y.; Nguyen, Q.V.H.; Yin, H. Where to go next: Modeling long-and short-term user preferences for point-of-interest recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2020; Volume 34, pp. 214–221. [Google Scholar]
Zhang, L.; Sun, Z.; Wu, Z.; Zhang, J.; Ong, Y.S.; Qu, X. Next Point-of-Interest Recommendation with Inferring Multi-step Future Preferences. In Proceedings of the IJCAI, Vienna, Austria, 23–29 July 2022; pp. 3751–3757. [Google Scholar]
Lim, N.; Hooi, B.; Ng, S.K.; Wang, X.; Goh, Y.L.; Weng, R.; Varadarajan, J. STP-UDGAT: Spatial-Temporal-Preference User Dimensional Graph Attention Network for Next POI Recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 19–23 October 2020; pp. 845–854. [Google Scholar]
Rao, X.; Chen, L.; Liu, Y.; Shang, S.; Yao, B.; Han, P. Graph-flashback network for next location recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, USA, 14–18 August 2022; pp. 1463–1471. [Google Scholar]
Li, Y.; Gao, C.; Luo, H.; Jin, D.; Li, Y. Enhancing hypergraph neural networks with intent disentanglement for session-based recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1997–2002. [Google Scholar]
Han, J.; Tao, Q.; Tang, Y.; Xia, Y. DH-HGCN: Dual Homogeneity Hypergraph Convolutional Network for Multiple Social Recommendations. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 2190–2194. [Google Scholar]
Zhang, J.; Gao, M.; Yu, J.; Guo, L.; Li, J.; Yin, H. Double-scale self-supervised hypergraph learning for group recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 1–5 November 2021; pp. 2557–2567. [Google Scholar]
Chen, Y.; Cheng, L.; Wu, Y.C. Bayesian low-rank matrix completion with dual-graph embedding: Prior analysis and tuning-free inference. Signal Process. 2023, 204, 108826. [Google Scholar] [CrossRef]
Ju, W.; Mao, Z.; Yi, S.; Qin, Y.; Gu, Y.; Xiao, Z.; Wang, Y.; Luo, X.; Zhang, M. Hypergraph-enhanced dual semi-supervised graph classification. arXiv 2024, arXiv:2405.04773. [Google Scholar]
Lai, Y.; Su, Y.; Wei, L.; He, T.; Wang, H.; Chen, G.; Zha, D.; Liu, Q.; Wang, X. Disentangled Contrastive Hypergraph Learning for Next POI Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington DC, USA, 14–18 July 2024; pp. 1452–1462. [Google Scholar]
Wang, Y.; Liu, F.; Chen, Z.; Wu, Y.C.; Hao, J.; Chen, G.; Heng, P.A. Contrastive-ACE: Domain generalization through alignment of causal mechanisms. IEEE Trans. Image Process. 2022, 32, 235–250. [Google Scholar] [CrossRef] [PubMed]
Guo, F.; Wang, Z. KEMB-Rec: Knowledge-Enhanced Explainable Multi-Behavior Recommendation With Graph Contrastive Learning. IEEE Internet Things J. 2024, 12, 3563–3576. [Google Scholar] [CrossRef]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 726–735. [Google Scholar]
Cai, X.; Huang, C.; Xia, L.; Ren, X. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. arXiv 2023, arXiv:2302.08191. [Google Scholar] [CrossRef]
Gu, S.; Wang, X.; Shi, C.; Xiao, D. Self-supervised Graph Neural Networks for Multi-behavior Recommendation. In Proceedings of the IJCAI, Vienna, Austria, 23–29 July 2022; pp. 2052–2058. [Google Scholar]
Wu, Y.; Xie, R.; Zhu, Y.; Ao, X.; Chen, X.; Zhang, X.; Zhuang, F.; Lin, L.; He, Q. Multi-view multi-behavior contrastive learning in recommendation. In Proceedings of the International Conference on Database Systems for Advanced Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 166–182. [Google Scholar]
Xu, J.; Chen, Z.; Yang, S.; Li, J.; Wang, H.; Wang, W.; Hu, X.; Ngai, E. NLGCL: Naturally Existing Neighbor Layers Graph Contrastive Learning for Recommendation. In Proceedings of the Nineteenth ACM Conference on Recommender Systems, Prague, Czech Republic, 22–26 September 2025; pp. 319–329. [Google Scholar]
Xu, J.; Chen, Z.; Yang, S.; Li, J.; Ngai, E.C. The Best is Yet to Come: Graph Convolution in the Testing Phase for Multimodal Recommendation. In Proceedings of the 33rd ACM International Conference on Multimedia, Dublin, Ireland, 27–31 October 2025; pp. 6325–6334. [Google Scholar]
Wang, B.; Li, H.; Wang, W.; Wang, M.; Jin, Y.; Xu, Y. PG²Net: Personalized and Group Preferences Guided Network for Next Place Prediction. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8655–8670. [Google Scholar] [CrossRef]

Figure 2. Performance comparison of different fusion methods. (a) Recall@5. (b) Recall@10. (c) NDCG@5. (d) NDCG@10.

Figure 3. Impact of layer number on model performance. (a) Recall@5. (b) Recall@10. (c) NDCG@5. (d) NDCG@10.

Figure 4. Impact of temperature on model performance. (a) Recall@5. (b) Recall@10. (c) NDCG@5. (d) NDCG@10.

Figure 5. Performance comparison under the user-level sparsity setting. (a) Recall@5. (b) Recall@10. (c) NDCG@5. (d) NDCG@10.

Figure 6. Performance comparison under the sample-level sparsity setting. (a) Recall@5. (b) Recall@10. (c) NDCG@5. (d) NDCG@10.

Table 1. Detailed Notations Used in This Article.

Notations	Descriptions
$U, L$	Sets of users and POIs
$S_{u}$	Full check-in trajectory sequence of user u
$T_{u, i}$	The i-th trajectory sample of user u
$G_{F}$	Interaction frequency hypergraph
$G_{D}$	Time decay hypergraph
$G_{P}$	Geographical proximity hypergraph
$V_{F}, E_{F}$	Vertex set and hyperedge set of $G_{F}$
$V_{D}, E_{D}$	Vertex set and hyperedge set of $G_{D}$
$V_{P}, E_{P}$	Vertex set and hyperedge set of $G_{P}$
$Δ d$	Distance threshold
E	Initialized POI embedding matrix
$E_{F}$	POI embedding matrix in $G_{F}$
$E_{D}$	POI embedding matrix in $G_{D}$
$E_{P}$	POI embedding matrix in $G_{P}$
$p_{u, i}^{F}$	User preference in $G_{F}$
$p_{u, i}^{D}$	User preference in $G_{D}$
$p_{u, i}^{P}$	User preference in $G_{P}$
$J_{F, D}^{U}$ , $J_{F, P}^{U}$ , $J_{D, P}^{U}$	User contrastive loss
$J_{S C L}^{U}$ , $J_{S C L}^{L}$ , $J_{S C L}$	Overall contrastive loss
${\hat{Y}}_{u, i}$	Predicted score vector of user u
$λ, λ_{1}$	Regularization coefficient, balance coefficient
$J_{C E}$ , $J_{T o t a l}$	Recommendation loss, total loss

Table 2. Statistics of datasets.

Dataset	NYC	TKY	Gowalla
#Users	1083	2293	22,810
#POIs	9989	15,177	24,038
#Interactions	179,468	494,807	1,450,159
#Avg.visit per user	165.71	215.79	63.58
#Density	0.0166	0.0142	0.0026

Table 3. Performance comparison of different methods on three datasets.

Dataset	Metric	SAE-NAD	TEMN	STGCN	PLSPL	LSTPM	PG2Net	Diff-POI	MRHL
NYC	Recall@5	0.2739	0.3586	0.4380	0.4958	0.5498	0.5759	0.6394	0.7064
	Recall@10	0.3075	0.3799	0.4781	0.5358	0.5738	0.6004	0.6585	0.7235
	NDCG@5	0.2481	0.3214	0.4196	0.4698	0.5193	0.5476	0.6139	0.6839
	NDCG@10	0.2641	0.3327	0.4308	0.4845	0.5222	0.5553	0.6200	0.6893
	MRR	0.2443	0.3372	0.4298	0.4739	0.5134	0.5441	0.6103	0.6817
TKY	Recall@5	0.2452	0.3065	0.3568	0.4362	0.5144	0.5379	0.5972	0.6440
	Recall@10	0.2693	0.3370	0.3984	0.4839	0.5472	0.5753	0.6309	0.6788
	NDCG@5	0.2236	0.2993	0.3432	0.4006	0.4480	0.4963	0.5564	0.6046
	NDCG@10	0.2412	0.3132	0.3579	0.4174	0.4570	0.5081	0.5673	0.6156
	MRR	0.2362	0.3063	0.3526	0.4031	0.4502	0.4915	0.5513	0.6008
Gowalla	Recall@5	0.1893	0.2099	0.2582	0.3381	0.3857	0.4170	0.4630	0.4961
	Recall@10	0.2052	0.2389	0.2854	0.3862	0.4268	0.4592	0.5036	0.5388
	NDCG@5	0.1757	0.2131	0.2372	0.3057	0.3420	0.3788	0.4247	0.4541
	NDCG@10	0.1924	0.2317	0.2466	0.3221	0.3527	0.3921	0.4378	0.4680
	MRR	0.1849	0.2366	0.2379	0.3092	0.3531	0.3770	0.4229	0.4535

Table 4. Ablation Study of Key Components in MRHL.

Dataset	Metric	w/o P-HG	w/o F-HG	w/o D-HG	w/o CL	MRHL
NYC	Recall@5	0.6578	0.6712	0.6835	0.6997	0.7064
	Recall@10	0.6797	0.6902	0.7002	0.7171	0.7235
	NDCG@5	0.6348	0.6483	0.6622	0.6781	0.6839
	NDCG@10	0.6416	0.6538	0.6670	0.6832	0.6893
TKY	Recall@5	0.5780	0.6148	0.6330	0.6309	0.6440
	Recall@10	0.6227	0.6489	0.6631	0.6648	0.6788
	NDCG@5	0.5270	0.5749	0.5947	0.5919	0.6046
	NDCG@10	0.5409	0.5860	0.6044	0.6029	0.6156
Gowalla	Recall@5	0.4342	0.4558	0.4188	0.4735	0.4961
	Recall@10	0.4773	0.4987	0.4615	0.5160	0.5388
	NDCG@5	0.3945	0.4147	0.3813	0.4322	0.4541
	NDCG@10	0.4084	0.4288	0.3948	0.4459	0.4680

Table 5. Efficiency comparison on the Gowalla dataset.

Model	#Params	Training Time	Inference Time	GPU Memory
Model	(Millions)	(min/Epoch)	(ms)	(MB)
STGCN	3.96 M	26.57 ± 0.67	2.23	2890
PG2Net	19.84 M	58.36 ± 1.07	4.89	3936
Diff-POI	1.64 M	53.51 ± 0.81	4.06	2146
MRHL	3.16 M	42.96 ± 1.15	3.26	9568

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, S.; Chen, C.; He, S. MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation. Electronics 2026, 15, 1528. https://doi.org/10.3390/electronics15071528

AMA Style

Zhao S, Chen C, He S. MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation. Electronics. 2026; 15(7):1528. https://doi.org/10.3390/electronics15071528

Chicago/Turabian Style

Zhao, Sai, Caisen Chen, and Shuai He. 2026. "MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation" Electronics 15, no. 7: 1528. https://doi.org/10.3390/electronics15071528

APA Style

Zhao, S., Chen, C., & He, S. (2026). MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation. Electronics, 15(7), 1528. https://doi.org/10.3390/electronics15071528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MRHL: Multi-Relational Hypergraph Learning for Next POI Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Next POI Recommendation

2.2. Hypergraph Learning in Recommendation

2.3. Contrastive Learning for Recommendation

3. Preliminaries

3.1. Formalization of the Task

3.2. Constructed Hypergraphs

4. Methodology

4.1. Multi-Relational Hypergraph Construction

4.1.1. Interaction Frequency Hypergraph

4.1.2. Time Decay Hypergraph

4.1.3. Geographical Proximity Hypergraph

4.2. Multi-Relational Hypergraph Convolutional Networks

4.2.1. Interaction Frequency Hypergraph Convolutional Network

4.2.2. Time Decay Hypergraph Convolutional Network

4.2.3. Geographical Proximity Hypergraph Convolutional Network

4.3. Cascaded Enhancement Fusion for User Preferences

4.4. Multi-Relational Contrastive Learning

4.5. Prediction and Optimization

5. Experiments

5.1. Experimental Settings

5.1.1. Data Sets

5.1.2. Baselines

5.1.3. Evaluation Metrics

5.1.4. Parameter Settings

5.2. Performance Comparison with Baselines

5.3. Ablation Study

5.3.1. Impact of Different Components

5.3.2. Impact of Different Fusion Strategies

5.4. Hyperparameter Analysis

5.4.1. Sensitivity to the Number of Layers

5.4.2. Sensitivity to Temperature

5.5. Sparsity Analysis

5.6. Efficiency Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI