AST-PG: Attention-Based Spatial–Temporal Point-of-Interest-Group Model for Real-Time Point-of-Interest Recommendation

: Research on next-point-of-interest (POI) recommendation has become a new focus in the field of POI recommendation in recent years. The goal of POI recommendation tasks is to predict a user’s future movement trajectory based on their current state and historical behavioral information. Recent studies have shown the effectiveness of neural network-based next-POI recommendation engines. However, most existing models only consider the correlation between consecutive visits, neglecting the complex dependencies of the POIs in the area and category features, as well as the processing of unstructured time series. This paper presents a new Attention-Based Spatial–Temporal Point-of-Interest-Group (AST-PG) model for POI recommendation. The model consists of a spatial module and a temporal module combined with each other by a multiple-attention mechanism. The spatial module in the proposed model groups the POIs based on geographic and category features, while the temporal module develops a uniform-length time trajectory vector for the unstructured temporal features. Comprehensive experimental results on two real datasets demonstrate that the proposed model of this study is superior to the state-of-the-art POI recommendation models in terms of performance.


Introduction
Location-based social networks (LBSNs) [1], such as Foursquare, Gowalla, and Yelp, have experienced rapid growth in recent years due to the continuous improvement in mobile-device technologies [2][3][4].These platforms are popular for users to share their experiences and explore their surroundings [5,6].They have accumulated a large amount of user check-in data that reflect users' activity trajectories and preferences.These data provide an important source for the next-point-of-interest (POI) recommendation service [7,8].
It is important to note that users' behaviors are not only influenced by their historical preferences but also by their real-time preferences and time constraints.For example, Figure 1 illustrates that the user left home at 7:00 a.m., went shopping at 9:00 a.m., and then visited a cafe.When recommending the next point of interest (POI) to the user between 1:00 p.m. and 2:00 p.m., it is important to consider their real-time needs and preferences.Therefore, the user's trajectory should be used to mine their preferences at a specific time period for real-time POI recommendations.However, applying these trajectories to realtime POI recommendations is not simple due to two issues.Firstly, the time series is unstructured in terms of users' trajectories.Users have irregular time intervals between visits to different POIs, or they visit different POIs for inconsistent lengths of time in different time periods.This situation may be caused by the user's personal habits, practical needs, or environment, which makes it difficult for structured models to extract relevant temporal features.Additionally, the harvest data from LBSNs often contain information about the regions and categories to which the POIs belong [9].The utilization of these features is insufficient in most commonly used POI recommendation engines, thereby limiting further improvements in the performances of these models.temporal features.Additionally, the harvest data from LBSNs often contain information about the regions and categories to which the POIs belong [9].The utilization of these features is insufficient in most commonly used POI recommendation engines, thereby limiting further improvements in the performances of these models.To address the above issues, this paper proposes a new real-time POI recommendation model, namely, the Attention-Based Spatial-Temporal Point-of-Interest-Group (AST-PG) model.The proposed model mines the transfer patterns of check-in behaviors in the sequence of user check-in records by a temporal module and a spatial module and then couples them together by a multi-head attention mechanism [10].The temporal module converts the unstructured temporal trajectory information into a fixed-length vector.
Relevant experiments on large public datasets show that the model developed in this study has a state-of-the-art predictive performance.Moreover, the ablation experiments show that both the temporal and spatial modules developed in this study are effective.

Preliminaries
In this paper, let = , , … , , = , , … , , and = , , … , denote the set of users, timestamps, and candidate POIs, where , , and denote the number of users, timestamps, and number of POIs, respectively.Let = , , … , and = , , … , denote the sets of region and category levels of the POIs, where γ and ς denote the number of regions and number of category levels of the POIs, respectively.Let = , , ∈ , ∈ denote the set of groups.Each POI ( ∈ ) is represented by a unique group , .
Based on the above symbols, the following concepts are introduced: Category levels of the POIs: an integer representing the proportion of the check-in frequency by the target user for different categories relative to the frequency of the total check-ins by the target user; Region of POIs: a two-tuple vector, consisting of the latitude and longitude of the POI; Group of POIs: A bunch of POIs with the same region and category levels.Each POI is assigned to one and only one group; Check-in record: A check-in record is represented as = , , ∈ × × , which represents a check-in record of the user ( ) on the POI ( ) at the timestamp ( ).Since each POI is assigned to a group, the check-in can also be expressed with a tuple , , , .All check-in activities created by the user ( ∈ ) form a check-in sequence ( = , , , … … ), where represents the l-th check-in record of the user (u).Relevant experiments on large public datasets show that the model developed in this study has a state-of-the-art predictive performance.Moreover, the ablation experiments show that both the temporal and spatial modules developed in this study are effective.
Let R = {r 1 , r 2 , . . . ,r γ } and C = {c 1 , c 2 , . . . ,c ς } denote the sets of region and category levels of the POIs, where γ and ς denote the number of regions and number of category levels of the POIs, respectively.
Let G = r i , c j , r i ∈ R, c j ∈ C denote the set of groups.Each POI (p k ∈ P) is represented by a unique group r i , c j .
Based on the above symbols, the following concepts are introduced: Category levels of the POIs: an integer representing the proportion of the check-in frequency by the target user for different categories relative to the frequency of the total check-ins by the target user; Region of POIs: a two-tuple vector, consisting of the latitude and longitude of the POI; Group of POIs: A bunch of POIs with the same region and category levels.Each POI is assigned to one and only one group; Check-in record: A check-in record is represented as ck = (u, p, t) ∈ U × P × T, which represents a check-in record of the user (u) on the POI (p) at the timestamp (t).Since each POI is assigned to a group, the check-in can also be expressed with a tuple (u, r, c, t).All check-in activities created by the user (u ∈ U) form a check-in sequence (CK u = ck 1 u , ck 2 u , ck 3 u , . . . . . .), where ck l u represents the l-th check-in record of the user (u).

POI Recommendation without Real-Time Performance
POI recommendation has emerged as a new research hot spot, attracting extensive attention from researchers in recent years.The early POI recommendation methods relied heavily on machine learning methods [8], especially stochastic models based on Markov chains.Additionally, inspired by the successful application of matrix factorization (MF) [11,12] methods in other domains, researchers began integrating MF methods into the modeling of POI recommendations.The factorizing personalized Markov chain (FPMC) model [13] combines matrix factorization with Markov chains to model users' long-term preferences and sequential behaviors, aiming to achieve more accurate recommendation results.
Currently, Recurrent Neural Networks (RNNs) [14] have become the mainstream model for modeling user check-in sequences.The spatial-temporal recurrent neural network (ST-RNN) model [15] extends the RNN model by introducing a transformation matrix of temporal and distance features, which effectively captures the cyclic effects of time and space.In addition, variants of RNNs, such as LSTM [16], show good performances.The stochastic tailor-made gradient noise (STGN) model [17] modifies the basic LSTM by adding four new gating mechanisms, including two long-term gates and two short-term gates.The Spatial-Temporal-Aware Graph Convolutional Neural Network (STGCN) model [18] extends the STGN model by adding coupled input gates and forgetting gates to reduce the computational complexity.In addition, several studies have explored the attention mechanism to better model spatial data and rich contextual information.The Personalized Long-Short-Term Preference Learning (PLSPL) model [19] combines the attention mechanism with the LSTM to recommend the next POI.In this model, the attention mechanism is used to learn the user's long-term preferences, while the LSTM focuses on capturing the user's short-term preferences.By employing personalized linear weights, the PLSPL model comprehensively takes into account the user's overall preferences, thereby providing more accurate results.The Spatiotemporal Attention Network (STAN) model [20] uses a self-attention layer to mine the relationship between non-adjacent check-in records in a check-in record sequence of the user.However, these POI recommendation models tend to perform poorly in real-time POI recommendation scenarios because they lack attention to real-time user preferences and cannot effectively model preferences over time.

Real-Time POI Recommendation
Real-time POI recommendation is different from traditional POI recommendation in that it pays more attention to the user's possible real-time needs and preferences at a given time.The location recommendation model (LRT) [21] demonstrates its ability to improve the performance of the POI recommendation by learning the user behavior through the temporal patterns of check-in records and introduces four temporal aggregation strategies to integrate the user's check-in preferences for different time states.The real-time preference-mining (RTPM) model [22] uses the LSTM to mine the real-time preferences from the user's long-term and short-term preferences.For long-term preferences, the RTPM analyzes the cyclical trend of the user behavior between weeks.For short-term preferences, trainable time-shift vectors are introduced to model the user's publicly influenced preferences at the current time.The Timestamp Cross-Attention Network (TSCAN) model [23] uses a two-layer cross-attention network to predict the most relevant next POIs by focusing on the cross-over information between the timestamps through a timestamp cross-attention module; secondly, a cross-interval-aware module adjusts the POI sequences by using the time intervals to enhance the similarity of the neighboring POIs.The exploring user's preferences and real-time demand simultaneously (DSPR) model [24] utilizes various types of contextual information, including the absolute time, POI-POI transition time, and POI type, combined with user preferences, which are further modeled and automatically learned by an attention-based recurrent neural network model to support the final next-POI recommendation.However, these POI recommendation methods do not fully exploit the spatial correlation features and ignore the fact that users have different choices between different regions and different categories of POIs.Therefore, we propose a real-time POI recommendation model (AST-PG) that fully captures both the spatial and real-time information of the POIs, aiming to achieve better recommendation results.

POI Recommendation Based on Multi-Head Attention Mechanism
Multi-head attention is a prevalent attention mechanism in the domain of deep learning, particularly in the fields of natural language processing and computer vision.The fundamental concept is to parallelize the attention computation of the input information in multiple distinct projection spaces (or "heads") in order to capture a more comprehensive range of features and contextual information.In [25], a novel long-and short-term preference learning (LSPL) model is proposed that combines an attention mechanism and a longand short-term memory (LSTM) network to capture and analyze users' long-term and shortterm preferences, respectively.Specifically, the attention mechanism is used to focus and extract the user's long-term stable interests, while the LSTM is responsible for capturing the user's dynamically changing short-term preferences.By combining these two preferences, the model is able to provide users with more accurate and personalized POI (point-ofinterest) recommendations.DeepMove [26] introduces an attentional framework that effectively captures multi-level periodic patterns.To model the sequential transitions for the final recommendations, the framework uses a recurrent neural network, which allows it to incorporate temporal dependencies and dynamics.The authors of [27] presented a multitasking transformer model dubbed the TLR-M, which incorporates a multi-head attention mechanism.This model not only recommends the next POI to the target user but also forecasts the estimated waiting time for concurrent visits to these POIs.In [28], a spatiotemporal long-and short-term memory (ST-LSTM) network is proposed.By feeding spatiotemporal context information into the LSTM network at each step, the ST-LSTM can better model the spatiotemporal information.In addition, an attention-based spatiotemporal LSTM (ATST-LSTM) network is developed for the next-POI recommendation.By using the attention mechanism, the ATST-LSTM can use spatiotemporal contextual information to focus selectively on the relevant historical check-in records in the check-in sequence.

Methodology
This section describes the structure of the proposed AST-PG model.As shown in Figure 2, the model consists of four modules: the time-embedding module, POI-groupembedding module, encoding-decoding module, and prediction module.

Time-Embedding Module
This section describes the construction of the temporal-embedding module.To solve the problem of temporal disorder in the user trajectories, we convert the user's time trajectory data into fixed-length time trajectory vectors (TTVs).We utilize these vectors to predict whether a user made a POI check-in at a specific time interval.First, we determine

Time-Embedding Module
This section describes the construction of the temporal-embedding module.To solve the problem of temporal disorder in the user trajectories, we convert the user's time trajectory data into fixed-length time trajectory vectors (TTVs).We utilize these vectors to predict whether a user made a POI check-in at a specific time interval.First, we determine the time of the user's first check-in point and then compute the check-in time for each subsequent point.These time points are then mapped to the corresponding locations in the time trace vector.The position associated with a check-in timestamp is assigned a value of 1, indicating that a check-in timestamp exists for that period.Conversely, the remaining locations are assigned a value of 0, indicating that there is no timestamp for that period.Presumably, if a location in the time track vector is 0, then the user will not access the POI in the period after that.
To further improve the accuracy of the time embedding, we add two dimensions after the 48-dimensional time track vector to indicate whether the track occurs mainly on weekdays or weekends.First, we iterate through all check-in events and count the number of check-ins on weekdays and weekends separately.If the number of weekday check-ins is more than half of the total number of check-ins in the trajectory, the first extra dimension (weekday dimension) should be set to 1; otherwise, it should be set to 0. Similarly, if the number of weekend check-ins is more than half of the total number of check-ins in the trajectory, the second extra dimension (weekend dimension) should be set to 1; otherwise, it should be set to 0. Figure 3 illustrates an example of temporal-data mapping.With this approach, we construct a 50-dimensional final trajectory vector that contains information about the temporal distribution and global information about whether the trajectory mainly occurs on weekdays or weekends.In order to capture the general patterns of the TTVs, we trained an embedd to map each TTV to a low-dimensional vector.The embedding for each TTV is from its historical check-in sequences.The embedding of the time trajectory v denoted as , serves as the input data and is defined by Equation (1), as follows where is the time trajectory vector-embedding dimension.

Geo-Categorical-Based POI-Group-Embedding Module
This section presents a strategy for categorizing points of interest (POIs) b their geographic and categorical information.Initially, the POIs are divided into based on their latitude and longitude characteristics.Then, they are grouped into geographic regions.As noted in [29], in reality, users are more likely to revisit than a specific POI.When users do visit POIs, it is not necessarily because the tracted to the POIs themselves, but rather because they are interested in the regio In order to capture the general patterns of the TTVs, we trained an embedding layer to map each TTV to a low-dimensional vector.The embedding for each TTV is learned from its historical check-in sequences.The embedding of the time trajectory vector (t), denoted as e t , serves as the input data and is defined by Equation (1), as follows: where τ is the time trajectory vector-embedding dimension.

Geo-Categorical-Based POI-Group-Embedding Module
This section presents a strategy for categorizing points of interest (POIs) based on their geographic and categorical information.Initially, the POIs are divided into regions based on their latitude and longitude characteristics.Then, they are grouped into different geographic regions.As noted in [29], in reality, users are more likely to revisit a region than a specific POI.When users do visit POIs, it is not necessarily because they are attracted to the POIs themselves, but rather because they are interested in the region.
To provide more insight into the importance of grouping POIs based on the region and category hierarchy, we are aided by two charts.Figure 4a shows in detail the distribution of the numbers of POIs across the 30 regions.It is evident from the chart that there are significant differences in the numbers of POIs in different regions.For example, the region with ID 20 tends to have many POI check-ins due to intensive user activity, while the region with ID 24 has relatively few.This distribution pattern reflects the degree of prosperity and user activity in different regions, which is essential for POI grouping.Figure 4b reveals the distribution of the numbers of POIs in different category classes within a given area.The category rank is calculated based on the ratio of the frequency of user check-ins in that category to the total number of check-ins.We can see from the chart that even within the same region, POIs in different categories may have different category ranks.Accordingly, this study developed a geo-categorical-based POI-group-embedding module, which uses the embedding vectors of the regions and categories to which the target POI belongs to represent the spatial features.Figure 5 below illustrates the grouping process of the POIs.The POIs are divided into different geographic regions and graded according to the characteristics of their categories in each region.This helps to understand the interest in and importance of the different categories of POIs in each region in a more effective way.The grading is based on the obvious quantitative differences in the categories of POIs in each region.In short, with this grouping approach, we can better capture the attributes of these POIs and the access patterns of users to them.Even POIs that are accessed less frequently, as long as they represent a certain domain region and category rank value, will still be included in the corresponding POI group and have a chance to be recommended to the target users.Our prediction task is to predict the probability of a target user visiting a specific group from the POI-group level to generate the final POI recommendation list.This method not only improves the prediction efficiency but also improves the recommendation accuracy by aggregating similar POIs.At the same time, infrequently visited POIs may have higher representation in some specific POI groups, and thus they also have a chance to be prioritized higher in the recommendation list.We model spatial features using POI groups that consist of region and category features.Accordingly, this study developed a geo-categorical-based POI-group-embedding module, which uses the embedding vectors of the regions and categories to which the target POI belongs to represent the spatial features.Figure 5 below illustrates the grouping process of the POIs.The POIs are divided into different geographic regions and graded according to the characteristics of their categories in each region.This helps to understand the interest in and importance of the different categories of POIs in each region in a more effective way.The grading is based on the obvious quantitative differences in the categories of POIs in each region.In short, with this grouping approach, we can better capture the attributes of these POIs and the access patterns of users to them.Even POIs that are accessed less frequently, as long as they represent a certain domain region and category rank value, will still be included in the corresponding POI group and have a chance to be recommended to the target users.Our prediction task is to predict the probability of a target user visiting a specific group from the POI-group level to generate the final POI recommendation list.This method not only improves the prediction efficiency but also improves the recommendation accuracy by aggregating similar POIs.At the same time, infrequently visited POIs may have higher representation in some specific POI groups, and thus they also have a chance to be prioritized higher in the recommendation list.We model spatial features using POI groups that consist of region and category features.
final POI recommendation list.This method not only improves the prediction efficiency but also improves the recommendation accuracy by aggregating similar POIs.At the same time, infrequently visited POIs may have higher representation in some specific POI groups, and thus they also have a chance to be prioritized higher in the recommendation list.We model spatial features using POI groups that consist of region and category features.The region division of the POIs based on latitude and longitude information is calculated by the following Equations ( 2)-( 4): The region division of the POIs based on latitude and longitude information is calculated by the following Equations ( 2)-( 4): where α is a pre-determined parameter employed to regulate the precision of the decimal places; Lat The category-level division of the POIs based on check-in time information is calculated by the following Equation ( 5): where β is a pre-determined leveling parameter employed to regulate the precision of the decimal places; CatL r u (c) represents the category level of the user (u) in region r, where c denotes the category; Fre r u (c) indicates the frequency of check-ins for category c by the user (u) in region r (i.e., the number of times the user checks into the POIs of that specific category in the region); Total Fre r u denotes the total check-in frequency of the user (u) in region r (i.e., the total number of check-ins made by the user in that region).
Next, we fuse the region-index vector and category-level vector by a dense layer using Equation ( 6) below: e r,CatL = ReLU(Cat(e r , e CatL )) ∈ R γ+ς (6) where e r and e CatL are the embeddings of the region-index vector and category-level vector, respectively.The dimension of the output embedding (e r,CatL ) is the sum of the embeddings of the region-and category-level embeddings.

Encoder-Decoder Module
The embedding of each check-in record is obtained using the methods described in Sections 3.1 and 3.2.The embedding vectors of the input are concatenated to generate the inputs for layer 0 in the encoder.The input is represented as ξ 0 ∈ R k×d , where d is the dimension of the check-in record embedding.The output of the first attention header in layer h is denoted by Equations ( 7)- (10) below: MultiHead ξ h = Cat head h 1 ; . . .; head h n × w O (10) where w h Q , w h K , and w h V represent the learnable weight matrices corresponding to the query, key, and value of the h-th attention head, respectively; H h ′ represents the output of the h′-th attention head in the multi-head attention mechanism; w O is the learnable weight of the multi-headed attention output in the linear layer.The units of the decoder model are essentially the same as those of the encoding module.

Prediction Module
The prediction model consists of several possible check-in locations for the target user at the next time slot, ranked in the order of probability after the fully connected layers.This study takes the most probable location as the prediction result and provides POI recommendations.More details about the experiment results and analysis can be found in Section 4.

Experiments
This section introduces the control and ablation experiments designed on two different datasets to validate the proposed data and model.It then analyzes the experimental results.

Problem Definition
We preprocess the check-in sequence (CKu) of each user (u) by segmenting it into a set of consecutive trajectories, denoted as CK u = S 1 u ⊕ S 1 u ⊕ . .., where ⊕ represents the concatenation.Each trajectory spans 48 h and contains a maximum of 10 and a minimum of 3 unique check-in locations that the user visits.To simplify the temporal analysis, we approximate the check-in times in the trajectories to hourly intervals.Specifically, for each hour within the 48 h, we retain only the earliest check-in location, as multiple check-ins within the same hour are rare in our dataset.This approximation helps maintain data conciseness while capturing the user's overall mobility patterns.If a user visits more than ten unique locations within 48 h, we segment the check-in sequence into multiple trajectories, each containing ten locations.In this way, a user can have multiple trajectories in the dataset, representing different periods.
In addition to the preprocessing steps mentioned above, we also introduce handling for weekdays and weekends in the trajectory data.When approximating the check-in times in the trajectory as hourly intervals, not only is the check-in time recorded but also whether each check-in point occurs on a weekday or weekend.Moreover, a label is added for each check-in point using binary coding (0 for weekdays and 1 for weekends).Suppose a user visits more than ten unique locations within 48 h, and these locations span both weekdays and weekends.In this case, the trajectories are segmented by weekday/weekend boundaries to ensure the accuracy of the data labels.
The task of our model is to predict the n-th check-in location of a trajectory containing n check-ins based solely on the previous n-1 check-in locations within that trajectory.This prediction capability enables us to capture the sequential nature of the user mobility and infer future check-in locations based on past behavior.

Experiment Environment
The model was implemented in PyTorch 1.7.1, and all programs were run in the Python 3.8 environment.This choice of Python version ensured compatibility with the latest libraries and tools, facilitating flexibility and efficiency in both the model development and data analysis.To facilitate the training and evaluation of the model, we used several key software libraries.These included PyTorch for deep learning operations, NumPy for numerical calculations, Pandas for data manipulation, and Matplotlib for data visualization.This combination of libraries, along with Python 3.8, provided a robust and reliable setup for running our experiments.

Datasets
We conducted experiments on two public datasets: Foursquare-NYC and Four-square-TKY.The Foursquare-NYC dataset was gathered in New York City between April 2012 and February 2013, and the Foursquare-TKY dataset was collected during the same timeframe in Tokyo.For each dataset, we sorted the check-in records by time and divided them into training, validation, and test datasets in an 8:1:1 ratio.Basic dataset statistics are shown in Table 1.The stability and reliability of these datasets have been extensively validated and used as benchmark data in several studies.These datasets also provide a reasonable exploration and validation of the effectiveness of the AST-PG model in the next-POI recommendation task.Moreover, it can be of great value in terms of rationally evaluating and comparing different recommender system approaches.We compared our proposed method with several baseline models to evaluate its performance and effectiveness in next-POI recommendation.The following seven models were used as the baseline models in this study: the MF [11], FPMC [13], ST-RNN [15], LSTM [16], ATST-LSTM [28], STGN [17], STGCN [18], and STAN [20] models.The underlying theories associated with these models are briefly described in the Related Work Section.Except for the necessary modifications to the hyper-parameters, such as the dimensions of input and output that ensure program operation, the rest of the hyper-parameters, such as the network composition, learning rate, masks, etc., use the optimal values provided in the open-source codes of the methods.

Evaluation Matrix
We introduce the Accuracy (@K) (denoted by Acc @K) and Mean Reciprocal Rank (denoted by MRR) as the evaluation metrics for assessing the performances of our proposed method and the above-mentioned baseline models.Acc @K emphasizes the accuracy of the top K recommendations provided by the model, where K takes values of 1, 5, 10, and 20.The MRR is a metric used to evaluate the average ranking of the first correct next POI in the recommendation list provided by the model for a given user.Given a dataset with Q samples, the two metrics are calculated by Equations ( 11) and (12):

Experiment Results
As shown in Table 2, the model proposed in this study performed optimally in all five evaluation indicators compared to the baseline models.The evaluation indicators improved approximately 7-10%.In the baseline model, self-attention models like the STAN demonstrated superior performances compared to RNN-based models, such as the ST-RNN.This is attributed to the self-attention mechanism's ability to compute attention weights for all positions in the input sequence simultaneously, facilitating the efficient utilization of computational resources, particularly when processing lengthy sequences.The traditional MF model is limited in terms of capturing the temporal pattern and temporal dynamics in the sequence of interest points, so it cannot effectively deal with the time-dependent relationship and spatial-temporal information in the sequence.While the FPMC model takes into account user preferences and contextual information in the sequence, its Markov assumption-based structure may not fully capture long-term time dependencies, resulting in an inadequate long-term predictive performance.ST-RNNs and LSTMs may face computational and storage challenges when dealing with large-scale spatial and temporal data, and modeling long-term dependencies may also be inadequate, resulting in degraded predictive performances over long time series.
The introduction of the multi-head attention mechanism in the ATST-LSTM model represents a significant advancement, as it enables the model to automatically assess the relevance of various inputs to the network at each step and adjust the attention weights of the inputs accordingly.This mechanism is highly beneficial to enhancing the model's comprehension and utilization of spatiotemporal contextual information, thereby improving the accuracy of the subsequent POI prediction.However, in contrast to our model, the ATST-LSTM does not appear to fully consider the impact of additional information, such as categories, on the POI prediction accuracy.Category information plays a pivotal role in recommender systems, as it can provide valuable insights into user preferences and POI characteristics.Models such as the STGN, STGCN, and STAN may encounter computational and storage bottlenecks when dealing with large amounts of data and may be limited by the model complexity and training efficiency when considering complex spatial and temporal dependencies.Therefore, while these models have achieved some results in next-POI recommendation, there is still room for improvement.The comprehensive experimental results conducted on two real datasets demonstrate that our approach surpasses the existing state-of-the-art POI recommendation models in terms of performance.3.As shown in Table 3, the full model has the best performance.For the rest of the components, the results show that the GCP plays a larger role in the overall performance than the other components.For example, without GCP embedding, the TOP-1 accuracy dropped from 19.92% to 18.29%.Other components also contributed to the final recommendation outcome.For example, with no time or category embeddings, the TOP-1 accuracy dropped from 19.92% to 19.62%.Multi-head attention mechanisms also play an important role in the model, enhancing the representation of the model by performing multiple independent attention operations in parallel and then merging the outputs.Without multi-head attention mechanisms, the TOP-1 accuracy dropped from 19.92% to 18.84%.To sum up, the ablation experiment proves the effectiveness of the attention-based time-embedding module and geo-categorical-based POI-group-embedding module developed in this study.

Conclusions and Future Work
In conclusion, the present study successfully demonstrated a real-time POI (pointof-interest) recommendation system leveraging social-media check-in data.The AST-PG model, which integrates a spatial module grouping POIs by geography and category, and a temporal module creating a uniform time trajectory vector, linked by a multi-attention mechanism, achieved a significant improvement in the POI recommendation accuracy, surpassing the state-of-the-art models by over 10%.This achievement demonstrates the efficacy of the AST-PG model in POI prediction and recommendation.Accurate POI recommendations have the potential to be utilized in a multitude of applications, particularly in the transportation sector.They can facilitate in-depth analyses of the travel demand, transportation modes, and user behaviors, ultimately contributing to the rational planning and optimization of transportation networks.
As we look towards the future, we intend to explore the integration of large models into our POI recommendation system.To accomplish this, we will investigate techniques for efficiently integrating large models into our current framework while maintaining its computational efficiency and scalability.This will involve optimizing the model architecture, input representations, and training procedures to ensure that the integrated system can effectively handle large volumes of social-media data and provide real-time POI recommendations.Additionally, we plan to enhance the algorithmic efficiency of our POI recommendation system by exploring techniques such as parallel processing and distributed computing.Furthermore, we will continue to refine the input and output structures of our model to further improve the prediction speed and accuracy.Lastly, we aim to incorporate a dynamic adaptation mechanism into our system to respond to real-time changes in user preferences and spatial dynamics.This mechanism will leverage updates from social-media data to continually refine the POI recommendations, ultimately leading to an improved recommendation quality over time.

Figure 1 .
Figure 1.Example of time-series trajectories with varying lengths.

Figure 1 .
Figure 1.Example of time-series trajectories with varying lengths.To address the above issues, this paper proposes a new real-time POI recommendation model, namely, the Attention-Based Spatial-Temporal Point-of-Interest-Group (AST-PG) model.The proposed model mines the transfer patterns of check-in behaviors in the sequence of user check-in records by a temporal module and a spatial module and then couples them together by a multi-head attention mechanism [10].The temporal module converts the unstructured temporal trajectory information into a fixed-length vector.Relevant experiments on large public datasets show that the model developed in this study has a state-of-the-art predictive performance.Moreover, the ablation experiments show that both the temporal and spatial modules developed in this study are effective.

14 Figure 2 .
Figure 2. The architecture of the proposed AST-PG model.

Figure 2 .
Figure 2. The architecture of the proposed AST-PG model.

Figure 3 .
Figure 3. Example of forming time trajectory vector.

Figure 3 .
Figure 3. Example of forming time trajectory vector.

Figure 4 .
Figure 4. (a) Distribution of the numbers of POIs in different regions and (b) different category levels in the same region.

Figure 4 .
Figure 4. (a) Distribution of the numbers of POIs in different regions and (b) different category levels in the same region.
original latitude and longitude of the POI (p), respectively; Lat p 0 and Lon p represent the rounded latitude and longitude of the POI (p), respectively; r p represents a specific region.

4. 3 .
Ablation Study We conducted ablation experiments on the TKY dataset to evaluate the individual impact of each component on the overall performance of the model.Specifically, four experiments were conducted: (1) the remove geo-categorical-based POI-group-embedding model (denoted by "w/o GCP"); (2) the remove time and category information model (denoted by "w/o Time & Cat"); (3) the remove the multi-head attention mechanism module (denoted by "w/o multi-head attention"); (4) the use only the fully connected layer model (denoted by "Fully Connected"); (5) the full model.The results are shown in Table

Table 2 .
Acc @K and MRR performance comparison of two datasets.

Table 3 .
Ablation study: comparing the full model with the three variants.