1. Introduction
Recommending a place to visit to a user of a recommendation system is not an easy task. Contributing to this difficulty are, among others, these factors:
Data incompleteness: Many approaches to recommending the next place to visit to a user that are offered in the literature are based on data from location-based social networks (LBSNs). The specificity of LBSN data relies on users marking their whereabouts (check-ins) in such networks, which they do only when they feel like doing so. For example, they oftentimes check in to fancy places, or places that are for some reason particularly interesting to them. The reason for this is usually to let their friends on the social network know of the extraordinary places they visit. To this end, only a fraction of the places that they visit are registered on LBSNs. From the perspective of using such a dataset to recommend a next place to visit to a user, the available information is based on mostly special places in some way, and does not include a vast majority of the places the user actually visited during the day. The data are thus incomplete from the perspective of using them as a basis for generating recommendations;
Weather: Weather can be a very important factor when deciding to choose one place to visit over another. For example, if we were looking for a recommendation of a place to visit in the afternoon, our choices would be potentially different if the weather was warm and dry, and they would be different still if it was raining (e.g., we would probably prefer to go to an open air place in the first case and an indoor place in the second);
Company: The choice of a place to visit can be influenced by the company we are in. If we are alone, the choice might be different than when we are in a group of people (e.g., family or friends). In the second case, other people will influence our choices, so these choices will be different than if we were choosing the place ourselves;
Varying interests over time: People’s habits and interests tend to change slightly over time. Due to these changes, not all the places that we visited, e.g., 2 years ago will still arouse our interest at the present moment;
Multi-level periodicity [
1]: Some places people tend to visit quite regularly, e.g., on a daily or weekly basis (like going to work, to a gym, etc.). Some, however, are visited less often, but are still visited with some regularity as well. An example might be a visit to a dentist, to a music festival, etc.
Not taking into account the above circumstances as a whole significantly reduces the accuracy of location recommendation, even if the location recommendation algorithm itself is quite efficient. However, in this paper, we do not consider the influence of the abovementioned factors on the recommendation precision, as that would require dedicated methods that careful incorporating these factors in the final predictions to be produced.
Our focus here is on examining how different data preprocessing strategies impact the accuracy of location recommendations generated using popular methods, namely neural networks and hidden Markov models. Specifically, we explore the effects of considering only users with shared features or common locations, varying the sequence length of visited places, altering the number of categories for these locations, and incorporating time-of-day information. The results of these examinations can be especially interesting from the business perspective, i.e., they can help improve the efficiency of location-based services, enhance user engagement through personalized recommendations, and support targeted marketing strategies by allowing for a better understanding of user behavior and preferences.
To achieve this, we implement and evaluate several transformations on LBSN data to assess their impact on the precision of location recommendations. The implemented transformations are as follows:
Narrowing the input dataset to users sharing some common features;
Utilizing user similarity in location preferences based on users’ history of visited locations;
Adjusting the length of the sequences considered;
Varying the number of location categories;
Incorporating time-of-day information in the recommendation computation.
The rest of this paper is organized in the following way.
Section 2 reviews related work in location-based recommendation systems, categorizing existing approaches such as collaborative filtering, Markov models, and neural networks.
Section 3 describes the preprocessing methods applied to the datasets, including the sequence generation, the restructuring of location categories, and the incorporation of temporal data.
Section 4 explains the implementation and configuration of the tested recommendation systems, focusing on recurrent neural networks (RNNs) and hidden Markov models (HMMs).
Section 5 gives data characteristics.
Section 6 discusses the computational costs and scalability of the proposed solutions.
Section 7 presents the experimental results, evaluating the impacts of the data preprocessing strategies on the accuracy of the resulting recommendations. It also provides a detailed comparison of the models based on their performance.
Section 8 concludes the paper, summarizing the findings and proposing directions for future research to further improve location-based recommendation systems. All abbreviations used in the paper are explained in section Abbreviations.
2. Related Work
In the literature, one can find multiple approaches to solving the task of next-location recommendation. These approaches can be categorized, subjectively, into several groups. We present an overview of the existing methods, divided into the following categories: collaborative filtering and matrix factorization, Markov models, neural networks, and other methods.
The first group of methods, using
collaborative filtering and matrix factorization, uses an approach stemming from the one used in recommending regular items, adjusting it to the specificity of the spatial context. In [
2], the authors propose a method of recommendation based on collaborative filtering that additionally incorporates Ebbinghaus’s memory theory [
3,
4,
5], taking into consideration people’s natural tendency to forget things over time. In [
6], the authors apply collaborative filtering to learn users’ transition patterns between location categories using other users’ similar transition patterns. Similar users are clustered based on their check-in frequency in different place categories. Matrix factorization is used for each cluster to predict preference transitions. In [
7], the authors use an observation that individual visiting locations tend to cluster together. They reflect this observation in the proposed factorization model. In the proposed system, a weighted matrix factorization approach is used. In particular, users’ latent factors are augmented with activity area vectors and points of interests’ (POIs’) latent factors are augmented with influence area vectors. Reference [
8] considers rankings of factorizations. In the proposed solution, it is assumed that POIs with a higher number of check-ins are of higher interest to users. Users’ preference rankings for POIs are fitted to learn the latent factors of users and POIs. The solution also takes into account the contribution of unvisited POIs. Reference [
9] proposes a factorization model using multiple feature spaces that is capable of using multiple context types in POI recommendation. In particular, the approach improves the method given in [
8] by removing the interaction factor matrix and splitting the POI latent space into slices with different context information. Reference [
10] aims to recommend new places to users, ones they have not previously visited. The proposed approach consists of two steps. Firstly, a set of potential locations is learned from three types of friends (social friends, location friends, and neighboring friends). Next, the learned potential locations of each user are incorporated into a matrix factorization model with different error loss functions. The generated recommendation strategies cover standard recommendation, location cold-start recommendation, and user cold-start recommendation.
The next group of methods of next-location recommendation uses
Markov chains. This approach to recommendation characterizes, especially, early works in the field. Reference [
11] uses a mobility Markov chain to predict the next place a user will visit. As opposed to other methods presented in this section, the tests conducted by the authors did not include LBSN data but instead used GPS traces of (mostly) researchers. The presented method consists of two steps. Firstly, POIs are identified via a clustering algorithm. Next, transitions and relative probabilities are computed, whereas their chronological order, obtained from mobility traces representing POIs, is preserved. Lastly, the transitions between states, taking into account the
n last visited states, are computed. Reference [
12] predicts the next locations for pedestrian movement. The locations are represented by their timestamp, longitude, and latitude. Firstly, the locations are clustered based on the temporal information, i.e., daytime, nighttime, and weekend events. Next, the created clusters are used to train different hidden Markov models that correspond to the different types of location histories. A new sequence of visited locations is first assigned to the appropriate cluster. Then, inference is conducted using the corresponding HMM to discover the most probable next location. Locations are approximated here by a decomposition of the Earth’s surface into triangular meshes of variable resolutions. Reference [
13] extends an idea of the use of personalized Markov chain factorization for next-basket recommendation, which was proposed in [
14] for the recommendation of next new POIs. The authors assume locality of users’ movements to their previous check-in history. The proposed matrix factorization approach incorporates the localized region constraint and the personalized Markov chain.
The next group of methods of location recommendation is based on the use of
neural networks. In [
15], the authors extend the architecture of an RNN. The authors introduce spatial and temporal elements to the network’s architecture. The proposed solution addresses the deficiencies of an RNN in location recommendation: RNNs cannot model local temporal contexts well and are not capable of modelling the continuous geographical distance between locations. An RNN is applied to location recommendation in the semantics-enriched recurrent model (SERM) approach offered in [
16]. The SERM distinctively enriches GPS location information with contextual text data obtained from social platforms in order to offer better prediction results. Reference [
17] further improves the recommendation ideas given in [
15] through the use of long short-term memory (LSTM) architecture so as to simultaneously model user’s short-term and long-term interests. In reference [
18], the authors modify the structures of an RNN and LSTM to incorporate different contexts (social, temporal, spatial) in the hidden and output layers to produce better recommendations. Reference [
19] proposes a content-aware POI embedding model using text information about POIs (e.g., from Instagram). The proposed model consists of two context layers: check-in and text. The check-in context layer makes sure the POIs in a sequence are close enough, and the text layer is designed to capture the characteristics of POIs from the text describing them. Reference [
1] focuses on the multi-level periodicity of human behavior. To incorporate this into the location recommendation model, a historical attention model is proposed and incorporated into a recurrent neural network. First, historical spatiotemporal features are extracted. They are then selected by the current mobility status to generate the most related context. By combining this context with the current mobility status, the model recommends locations based both on the sequential relation as well as on the historical regularity.
Recent approaches use attention-based models. Reference [
20] uses a spatiotemporal dilated convolutional generative network for POI recommendation. The advantage of using such a methodology is the possibility of using parallel computation within a check-in sequence and thus reducing the training and evaluation times of the model. The model contains modules responsible for modeling user’s geographical distance preferences by adjusting the distance in the proposed recommendations to the user’s past behaviors and modeling the user’s time preferences related to different categories of visited places. Reference [
21] proposes an attentional recurrent neural network (ARNN) framework to improve personalized next-location recommendations in location-based social networks (LBSNs), addressing the challenge of data sparsity. A knowledge graph incorporating geographical, semantic, and user preference factors was built, and meta-path-based random walks were used to discover similar locations (“neighbors”). The ARNN integrates neighbor relationships with sequential user behavior through an attention mechanism and LSTM network to predict next locations. Real-world datasets from Foursquare (New York, Tokyo) and Gowalla (San Francisco) were used in experiments. Reference [
22] re-evaluates the use of pre-trained language models (PLMs) in sequential recommendation (SR), highlighting their underutilization and redundancy in behavior sequence modeling. It proposes a simplified yet effective framework that leverages behavior-tuned PLMs for item embedding initialization, combining them with lightweight sequence models like SASRec and BERT4Rec. Experiments on real-world datasets demonstrated that this approach improves the recommendation performance significantly without incurring additional inference costs. Reference [
23] introduces MCN4Rec, a multi-level collaborative neural network for next-location recommendation. It addresses challenges like data sparsity, cold starts, and capturing complex correlations in location-based social networks. MCN4Rec integrates a multi-level view representation learning module with level-wise contrastive learning to model user–POI interactions, incorporating temporal and activity semantics. A causal encoder-decoder framework processes check-in sequences for next-location prediction. Experiments on four real-world mobility datasets showed improvements in recommendation accuracy compared to state-of-the-art baselines.
There are also approaches to recommending locations that use different techniques, other than the ones already mentioned, with interesting observations influencing the process of their use. For example, [
24] considers the task of predicting the next location of a moving object. The author uses a data mining approach to predict user movement. The idea is based on association rules mining. Firstly, frequent trajectories are discovered and then transformed into movement rules. Next, the movement rules are matched to the trajectory of a moving object to determine its current location. Reference [
25] analyzes human mobility patterns based on cellular carrier data and LBSNs, especially those related to friends. They find that places within a short distance that have been visited by friends may have an influence on our future location choices. They also claim that friends can influence our long-distance travels in that we are more likely to choose a distant destination because a friend of ours lives there. Reference [
26] considers the problem of discovering geographical regions that are visited periodically by users of LBSNs. In particular, they aim to discover clusters of places in which a user regularly shows up, as well as the frequency related to each such cluster. The authors propose an approach to solving this task that is based on a Bayesian non-parametric model. Other research aims to first discover periodicity in the movement of objects that can subsequently be used to predict further locations. A Periodica method offered in [
27] is one such example. The Periodica method first discovers periods in the movements of users as well as reference spots and subsequently attempts to find the periodic behaviors of objects within the reference spots.
The presented literature overview confirms that location recommendation is a topic that is important and interesting for researchers, yet has many sides to it. The discussed works propose approaches to location recommendation based on different paradigms. Some of them focus solely on algorithms that find the best next place to recommend, assuming that there is an existing LBSN dataset at hand. Others try to additionally consider system users’ contexts (are they alone, with friends, is text information on the POIs available, etc.), resulting in more available information that can, in effect, potentially lead to better recommendation results. The presented studies focus on creating efficient methods for next-location recommendation. Our goal is to see how input data preprocessing and transformation can influence the obtained recommendation results. For this purpose, we conduct a series of experiments with modified data using the most common approaches to next-POI recommendation found in the literature. The defining characteristic of the proposed method is that it is based mainly on the preprocessing stages of data. After the data is preprocessed, one can attempt to use various models for location recommendation (specifically the GRU networks and HMMs used in this work).
5. Dataset Characteristics
The location recommendations considered in this paper are created from data collected from Twitter and Foursquare between 22 June and 15 July 2018. They contain users’ check-ins from different parts of the world. The structure of a sample check-in is presented in Listing A1 in
Appendix A.
The raw dataset of users’ check-ins contains 2,483,713 entries. The data are unprocessed and, in this form, cannot be used practically.
5.1. Check-In Sequences
The first stage of data preprocessing consisted of flattening their structure and leaving only attributes useful in further processing. The new structure is observably simpler and contains only the most important information. The new structure is shown in Listing A2 in
Appendix A.
We removed entries containing empty fields, in particular empty category entries. The number of data entries in the dataset after this operation was reduced to 2,476,244. We also removed data related to (i) users who were rarely active, (ii) bots, (iii) and entries that do not form time sequences.
Users that were rarely active were users that checked in less than 10 times in the considered time frame, i.e., such users who generated less than 10 data entries. Bot-generated data were data entries that were generated more often than twice within a 60 s period. In the last step, we assumed that data entries have to be located within a given data window, i.e., they have to be neighbors with at least one other entry assigned to that user. For consecutive pairs, a user-timestamp (c1, τ1), …, (cN, τN) data entry i was removed if it did not fulfill the conditions τi−τi−1 < T and τi+1−τi < T, where T is a time window set to 8 h. These operations allowed us to eliminate data entries that were not involved in meaningful check-in sequences.
The further data preprocessing that we conducted covers additional steps: (i) data preprocessing relative to check-in sequences, (ii) data preprocessing relative to location categories, (iii) adding time-of-day information.
According to [
39], the ordering of visited locations has significant importance in recommending these locations. Thus, the recommended locations are generated based on sequences of recently visited locations by a given user. To this end, the data were sorted by users and the data entries assigned to each user were grouped into sequences. Subsequent entries in sequences are located in a time window, as previously explained. For sequential recommendations to be meaningful, those with less than three entries were removed. In this way, we obtained 89,617 sequences for 28,687 users. The longest sequence consisted of 105 data entries. The average sequence length was 4.8. After performing this operation, the amount of data decreased significantly.
For place recommendations, we used sequences of consecutively visited locations where the input data are all entries, with the exception of the last one in the sequence, and the output data are the respective last entries in the sequences. We needed data containing sets of subsequences with the same constant number of records. Therefore, from the sets of sequences produced earlier, new subsequences of a given length were generated. For example, for sequences of subsequent locations [Shop & Service, Convenience Store, Park, Spiritual Center, Restaurant], one can generate three subsequences of length 3 [[Shop & Service, Convenience Store, Park], [Convenience Store, Park, Spiritual Center], [Park, Spiritual Center, Restaurant]], two subsequences of length 4: [[Shop & Service, Convenience Store, Park, Spiritual Center], [Convenience Store, Park, Spiritual Center, Restaurant]], or one subsequence of length 5: [[Shop & Service, Convenience Store, Park, Spiritual Center, Restaurant]]. Because the average length of a sequence from all the data is 4.8, we used three different datasets with subsequences containing three, four, and five entries, and verified the influence of their length on the results.
5.2. Analysing Locations Categories and Adding Time-of-Day Attribute
The recommendation systems discussed in this paper do not generate propositions of specific places but rather place categories, which, according to [
40], offers considerably better recommendation results. It is thus vital to prepare the categories correctly. Foursquare location data are assigned to about 930 categories that are stored in a tree-like structure containing at most five levels. Based on experiments done in [
35,
36], the adjusted categories were either too detailed or too general. What is more, the categories created in these works were prepared created manually. According to [
41], higher-level categories output better results, as users typically share only 10% of their location data, which results in difficulty in finding similarities in preferences among individual users when using low-level categories.
Based on the above-mentioned observations, we automated the restructuring of category fields in the dataset. All records were, first, sorted according to their category_id. If the number of records related to a given category was smaller than a given threshold, that category was replaced with the category of the higher level. Records not fulfilling the threshold assumption were completely deleted from the dataset. The threshold was computed as the number of records for the n-th most numerous category in the dataset, where n can be modified for test purposes.
Using this approach, we generated three datasets to test the influence of the number of categories on the results. The first dataset contains all 834 different categories, while the second and the third one have 45 and 20 categories, respectively.
Table 1 presents the most numerous and the least numerous categories after restructuring in the case of limiting the number of categories to 34. Most of the generated categories consist of subcategories. For example, “Japanese Restaurant” includes records drawn from as many as 18 different subcategories, such as “Tonkatsu Restaurant”, “Sushi Restaurant”, and so on.
Some of the resultant categories can be uninteresting from the location recommendation perspective, e.g., “Train Station” or “Home”. At later stages of creating recommendation systems, such records were removed from the dataset. This can, however, lead to distortion and deterioration of the results. In particular, the category “Train Station”, being very numerous, can have a significant impact on the final results. Additional tests were thus be conducted to verify the influence of this category on the recommendation quality.
Based on conclusions from [
37] suggesting a significant improvement in recommendations when including the time of day information for check-ins, we added a new attribute with information on the time of day when a user visited a given place. The specific time of the day was generated based on the timestamp, from which hour in the day was extracted using the intervals given in
Table 2. The time zone was also taken into account, based on the longitude and latitude information of a given check-in.
5.3. Preparation of Datasets
In [
35,
36,
37,
42], the authors determined that recommendation systems using data personalized for every user enable the achievement of much better results. Using user sex and language information, we generated datasets characterized by these attributes. In this, way we generated two datasets of female and male users. As far as user languages are concerned, the dataset contained 38 languages, out of which we focused on the two most numerous: Japanese and English. Additionally, we created datasets that narrowed users down according to both sex and language. The data cardinality for each dataset is given in
Figure 2. For each data division criterion, the most numerous dataset was selected for use in the tests. The selection encompasses the following: (i) a general dataset, (ii) a dataset of male users, (iii) a dataset of users speaking Japanese, and (iv) a dataset of male users speaking Japanese.
5.4. Individual Datasets
One of the observations made by the authors of [
38] was that using models generated particularly for a user for whom a recommendation is created had a very positive influence on the recommendation. To that end, we used individual models.
Implicit profiles for each user were generated by counting the number of occurrences of each location category where the user checked in. Vectors created in this way have a length equal to the maximal number of unique, possible categories. The vectors store information on how often a given user stayed in a location with an assigned category. The larger the value of a given element of the vector, the more the user preferred locations of that category. The sequence of categories in each vector was constant and identical for all users. The vectors were then put in a matrix, where each row represented a user’s profile. Next, similarities among these vectors were computed. For this purpose, we used a cosine similarity measure. The threshold value for the similarity of users was set to 0.7. The personalized model for each user was determined by computing the cosine similarity of each user in relation to those of all other users and selecting only those exceeding the predetermined threshold.
We also generated individual models based on users’ location. The model for a given user was determined as all other users located in the longitude and latitude range of < − 1, 1> in relation to that user. This translates to other users who are at a distance of roughly 111 km from the given user.
5.5. Training and Testing Datasets
In the case of regular datasets, the division into training and testing parts was done randomly in the proportion of 7:3. For the individual datasets for a given user, we used sequences of all users with similar profiles. A test set consists of sequences assigned to the analyzed user.
5.6. Data Preprocessing: A Summary
One of the main conclusions from the literature analysis was that correct data preprocessing has a significant influence on the outcomes of location recommendation. To that end, we utilized the experiences of the authors of [
35,
36] in this regard and introduced some improvements.
The generated general dataset for sequences of the minimal length of three and divided into 34 categories containing 430,473 records grouped into 89,617 sequences for 28,687 users. The number of data entries in the respective stages of data preprocessing are given in
Figure 3.
The conducted data preprocessing steps are summarized in
Figure 4.
6. Computational Cost and Scalability
Knowing the details of the proposed preprocessing stages, we would like to assess their computational cost, including also the two methods used for location prediction: GRUs and HMMs.
First, let us consider the cost of the preprocessing stages presented in
Figure 4.
The initial data preprocessing stage can be performed in liner time frame. This initial stage consists of steps such as: the removal of unused fields, the removal of records with missing data, and the removal of activity generated by bots. Generally, such steps can be performed by scanning the input data only once.
Subsequently, the data are preprocessed according to check-in sequences. First, one needs to apply a sorting algorithm to group the entries according to their user ID number. Subsequently, sorting is applied to the event sequences of each user. For sorting, the Quicksort algorithm, for example, can be applied, which typically achieves an O(N log N) time complexity, where N is the size of the dataset to be sorted. The final step of sequence preprocessing consists of the generation of subsequences with a given number of records. This can be achieved by scanning the event sequence of each user once.
In the next stage, data preprocessing according to location categories, the number of recommendation categories is reduced in the dataset. To this end, the automatic process is applied that replaces categories with a cardinality below a given threshold with their super-categories. Knowing how numerous each category is, this step can be achieved in a linear time frame. Similarly, the stage of adding the time-of-day attribute to each record can be achieved by scanning the dataset only once.
For generating the training/testing data, non-individual datasets can be easily created by selecting data based on specific attributes, such as gender or language, which can typically be done in a constant or linear time frame relative to the dataset size. The generation of training and testing datasets for individual (personalized) profiles is more computationally demanding. For the implicit profiles, first, a matrix containing the categories of locations visited by each user needs to be calculated. This can be achieved, again, by scanning the input dataset once. Subsequently, the cosine similarity is calculated for each pair of rows in the matrix, which generally requires a quadratic number of operations relative to the size of the matrix. To obtain location-based personalized datasets, one needs to apply calculations that provide geographical distance between various visited locations of pairs of users. This can be achieved in polynomial time relative to the number of users and visited locations.
In the case of both GRUs and HMMs, the computational complexity depends on the length of the generated subsequences as well as other factors. Specifically, HMM’s hidden states are generated by combining all categories of locations with the times of day obtained in the preprocessing step. As described in the Experimental Evaluation section, this can have a significant impact on the efficiency of the proposed model when HMMs are used, practically limiting the number of categories used in experiments to up to 40.
The scalability of the proposed solution is limited due to the need to train a location recommendation model (being either a GRU network or a HMM). However, at least some of the preprocessing steps proposed in this work can be implemented directly on a mobile device that is used to collect location/review data. For example, the preprocessing steps such as the initial removal of empty fields, adding the time of the day, or generating locations’ subsequences can be performed on the user side rather than on a computing server. Furthermore, nowadays, training a GRU neural network or conducting matrix computations (e.g., similarity between users in a location matrix) can be significantly sped up with the use of graphical cards.
7. Experimental Evaluation
In this work, we compare two location recommendation systems using a common dataset and the same quality measures. In the following subsections, we discuss the implementation of these systems and provide the recommendation results generated by each of them. Finally, we compare the two systems.
7.1. Quality Measures
Prediction accuracy is the most commonly used quality metric. Recommendation systems themselves largely operate as predictive systems. The main premise of this metric is that a model is better if it can more accurately predict a user’s decision. This metric, unlike many others, can also be successfully applied in offline tests.
The recommendation systems created in this work are based on data collected in the past from a LBSN site. This means that only offline tests of the models will be performed. This fact alone limits the quality measures that can be used. In addition, recommendations will be issued based on the categories of locations that a given user visited in the past. In this case, prediction accuracy metrics that take into account user’s decision will be used.
When evaluating the tested models, we use the precision. The precision is defined as the ratio of the number of recommended categories that correspond to those that the user actually visited at a given moment in the past to the number of all recommendations [
43]. Additionally, in order to extend the scope of recommended objects, we included the precision, taking into account situations where the expected category was among one of the three most likely model proposals (Precision@N [
43], or
Precision@3 and
Precision top 3 in our case).
7.2. Recurrent Neural Networks Setup
The location recommendation system constructed herein using recurrent neural networks follows a similar approach to [
37], and extends the model from [
35]. It was created using the Keras library, which allows for easy and quick design and testing of neural networks.
To create a model, we needed to first create a three-dimensional tensor, where the respective dimensions are built up by (i) samples—subsequent data sequence, (ii) timestamps—singular sample containing events in a given time interval, and (iii) features—data characterizing a single event.
The created recurrent neural network consists of three layers, each of which is built up of 40 GRU neurons. The input data tensor consists of consecutive sequences of visited locations. Sequentially visited location categories in a given sample are time stamps, and a single category is a feature. Features were encoded using the one-hot method, where a feature is represented as a vector of length equal to the number of categories, containing zeros except for when there is a represented category, indicated by one. The output layer contains a number of neurons corresponding to the number of categories, in accordance with the one-hot encoding method, as the output value is a single predicted category. The activation function,
softmax, normalizes the values for each output neuron so that they are in the range <0, 1> and corresponds to the probability of visiting a place of a given category.
Adam is an advised optimizer that returns good results for a wide range of problems. The loss function is set to
categorical crossentropy, which is used in problems related to classification with multiple categories. The parameter values of the recurrent neural network are given in
Table 3. The tested aspects of the implemented location recommendation systems, along with possible values, are gathered in
Table 4.
As suggested by the conclusions from [
35] and previously conducted experiments with the recurrent neural network, we used constant values of network parameters for further testing. It turns out they have a limited impact on the achieved results, so we set these parameters to values that return, possibly, the best results, taking into consideration the short network training time.
7.3. Recurrent Neural Networks: Results
In
Figure 5, we show the results for different datasets for the same network parameters. As expected, the global model returns the worst results. Additionally, it turns out that filtering using users’ gender information returns a barely visible improvement. On the other hand, better results were obtained for sets of users filtered out by the language they use. Based on the obtained results, it can be observed that, when choosing places to visit, cultural aspects (which can be inferred from the used language) are much more important than the gender of the users. For further testing we will be using the dataset for which the best results were achieved, i.e., the
Japanese language dataset.
Tests of the impact of the number of place categories on the results confirmed the results reported in [
35]. Overall, we observed that the fewer categories the analyzed dataset contains, the better the recommendation results are that can be obtained. However, reducing this number leads to a deterioration in the differentiation of recommendations. Therefore, we decided to focus on 25 categories, which still allows for a much better differentiation than in [
35,
36]. A comparison of the results for different numbers of categories is presented in
Figure 6.
The results obtained by comparing the datasets with different sequence lengths show that the longer the sequence, the better the precision, as shown in
Figure 7. However, this tendency may only apply to short sequences of events. With more places visited over time, those from the distant past may lose their relevance completely to a recommendation for a given moment. However, this cannot be verified, because in the data from LSBN services, as pointed out in [
41], a very small percentage of users reports their check-ins regularly, which results in a lot of time gaps and the obtained sequences typically being short. For sequences with a length of four, the number of records obtained for analysis drops by more than 35% compared to the number of records for sequences with a length of three. For most tests, we used sequences of visited places with the minimum length of three.
We also verified what values of precision can be obtained when the records with the most numerous category (“Train Station”) are deleted completely or when they are deleted only when they appear as the last element in the sequence.
Figure 8 shows that both of these attempts produced rather weak results.
All the models obtained for standard datasets still give unsatisfactory results. The solution that most significantly improved the results turned out to be the use of personalized models. Individual models based on the similarity of preferences between users allowed us to improve the precision by as much as 14% in relation to the global model using the same dataset. However, these tests were performed on specially selected users who had a lot of check-ins and a sufficient group of similar users (training set). Despite the high average precision, the results varied considerably from one user to the next, as shown in
Figure 9. The improvement in the results was also achieved thanks to the location-based individual models. In this case, the increase in precision was 10%, which is a good result, but still weaker than in the case of models based on the similarity of preferences between users.
Figure 10 shows the results obtained when the tested model also took into account the time of day of each check-in. There was a slight, 1.8% improvement in the precision for the global model and a 1.3% improvement for the individual model based on the similarity of locations. However, this method allowed us to achieve very good results in the case of the personalized model in terms of preferences. The obtained precision value was 53% and the precision value for the three best categories was 72%. These are the best results so far. The improvement in the results of individual models personalized for subsequent users, taking into account the time of day, can be seen in
Figure 9.
We made an attempt to improve the best-performing individual model based on the similarities of preferences, taking into account the time of day, by increasing the length of the sequence. Processing the data so that they had a minimum sequence length of five resulted, as already mentioned above, in a significant decrease in the amount of data to be analyzed. The individual models generated from this type of data were therefore much less numerous than before, and the results obtained were based on a smaller number of tests. However, the obtained precision value, which was 55.5%, and the precision value for the three best categories, amounting to 74.2%, confirmed our earlier assumptions and turned out to be the best configuration in the case of recursive neural networks. The results for this model are shown and compared in Figure 14.
7.4. Hidden Markov Models Setup
The location recommendation system based on hidden Markov models (HMMs) was created using the Python hmmlearn library. The use of the same environment as in the case of the model based on recursive neural networks allows for easier sharing of the test input sets and easier comparison of the obtained results.
In the created model, the hidden states are the categories of visited places together with the time of day in which the event took place. The observations are categories that were logged by the user last in a given sequence. Such a structure of the model allows for the use of the check-in sequence together with the time-of-day information. A fragment of the structure of the discussed hidden Markov model is presented in
Figure 11.
In order to compute the next visited place, the problem of evaluation should be solved. For this, initial probability, transition, and emission matrices are required. These matrices were computed as follows.
Initial probabilities matrix: separate sets of records for each type of hidden state (attributes: category, time of day) were assembled and the size of each of those sets was divided by the total number of all records;
Transition matrix: separate sets of records for each type of hidden state (attributes: category, time of day) were created with a corresponding hidden state preceding them, and the size of each of these sets was divided by the total number of records in the sets created from the corresponding previous attribute, category and time of day.
Emission matrix: separate sets of records for each type of observation (category) were created with a corresponding hidden state type preceding them (attribute: category and time of day), and the size of each of these sets was divided by the total number of records in the sets created from the corresponding previous attribute, category and time of day.
The evaluation problem is solved using the forward-backward algorithm. This algorithm is available in the hmmlearn library and can be invoked with a method that returns the posterior probability matrix for each hidden state. From this matrix, probabilities assigned only to the categories corresponding to the time of day for the searched recommendation are selected.
The tests performed using hidden Markov models reflect the testing procedure executed with RNNs, so that comparison of the results is possible. Taking into account the conclusions obtained when checking the impact of the number of categories, we found that the results for the models taking into account all possible categories are unsatisfactory. In addition, with around 800 different categories and five times of day, more than 4,000 hidden states can be obtained. This causes a very high complexity of such a model, so we limited the tests to 40 and 25 different categories.
7.5. Hidden Markov Models Results
The first test performed for the recommendation system based on hidden Markov models was to determine the influence of the dataset type on the obtained results. The comparison of the obtained results is shown in
Figure 12. As in the case of the recursive neural network, the precision values for the global sets and for the male users are practically identical and a few percent worse than for the set of Japanese speakers. One can also notice here a slight deterioration of the model with the combination of the sets “Men” and “Japanese language” compared to the model using only the set “Japanese language”. Based on the above conclusions, our further testing mainly used the dataset for Japanese-speaking users.
Similarly to the model based on recursive neural networks, the reduction in the number of location categories resulted in a slight increase in precision. This is shown in
Figure 13. In the same plot, one can also see the influence of the sequence length on the results. Here, too, the increase in the number of elements of the sequence of events from three to four resulted in slightly better results. However, for sequences with a length of five, the precision value decreased slightly. The differences between the results for the individual models with different sequence lengths are less than a percent and, unlike for the neural network models, are not that significant.
In the next stage we tested individual models. The results of these studies are presented in
Figure 14. A very large increase in precision was obtained here, both for the models based on the similarity of preferences and those using location similarity. In the first case, the precision value was over 51% and the precision value for the three best categories was over 70%. In both cases, this is an increase of approximately 20% against the global model that uses the same dataset. In the case of the models based on the distance between users, this increase was slightly smaller, but also allowed us to obtain satisfactory results.
The differences in the results between global and individual models in the case of hidden Markov models are much more visible than those for recurrent neural networks. Moreover, in both of these cases, one can observe the advantage of preferences similarity over locations similarity. It should also be noted that all tests for hidden Markov models were performed taking into account the time of day of the check-in.
To further improve the results, we generated an individual model for data with sequences of a minimum length of four and that was based on the similarity of preferences between users. The obtained values of precision and precision@3 are presented in
Figure 15 and amount to 51.8% and 70.8%, respectively. These results are almost the same as in the case of data with sequences of a length of three. This means that, in the case of the recommendation system based on hidden Markov models, the effect of the sequence length is of very little importance, while the creation of personalized models allows us to achieve satisfactory recommendations.
7.6. Comparison of Recurrent Neural Networks and Hidden Markov Models
Figure 15 summarizes the results of location recommendation systems based on recursive neural networks and recommendation systems using hidden Markov models. For almost all types of models, the results were better by several percent in the case of the neural networks approach. The exception is the individual model based on location similarity. In both cases, only the creation and application of models personalized in terms of user preferences allowed us to exceed the threshold of 50% for the precision value and to exceed the threshold of 70% for the precision for the three best categories. Recommendation systems using global datasets, regardless of how they would be processed, do not give satisfactory results and it is not possible to use them in practice.
A slight improvement in the results, even for individual models, can be achieved by manipulating the length of the check-ins sequence. For the RNN models, the use of a sequence with a length of five allowed us to improve the precision by a few percent. Longer sequences are impossible to test on the collected dataset for the reasons mentioned previously. In the case of HMM models, of the tested sequence lengths, the length of four turned out to be the best.
For both types of location recommendation systems, the supposition was confirmed that better results can be obtained using a smaller number of location categories. However, these differences are not as significant as in the case of using personalized models. Therefore, in order to achieve a greater diversification of recommendations at the cost of a slight decrease in precision, it is possible to use lower-order categories and restructure the hierarchy of location categories differently.
The comparison of both systems showed that both the initial data processing and the type of recommendation system used are much less important than the use of an individual model. Manipulating data processing allows us to improve the precision by several percent. The same is the case with changing the type of recommendation system. Only the use of individual models resulted in an increase in precision by over a dozen percent, and, in the case of taking into account the time of day for RNN systems, even 20%.
8. Conclusions and Outlook
In this paper we were looking for solutions which would allow us to improve the quality of location recommendation systems based on data collected from LBSN sites. As part of the work, two location recommendation systems were created. The first one used recursive neural networks, and the second one used hidden Markov models. The structure of both of these models was based on the conclusions drawn from the works [
35,
36] and other scientific articles dealing with the topic of recommendations based on data from social network sites.
Among the applied solutions aimed at improving the precision of the recommendations of the created systems, the most effective turned out to be the use of personalized models based on user preferences. This solution in combination with the use of recommendations based on categories rather than specific places, taking into account the time of day of the check-in and the appropriate data processing, including the restructuring of the hierarchy of location categories, the selection of an appropriate dataset for the appropriate group of users, and the use of check-in sequences of an appropriate length, allowed us to obtain satisfactory results for the both tested systems.
The comparison of the two types of recommendation systems showed that slightly better results can be achieved with the use of recursive neural networks. However, the difference is not that significant. Much more important than the type of system used is the personalization of the model in relation to the user data.
The recommendation systems that were created, despite the fact that they allowed us to obtain precision values that were better than before, are still not good enough for practical applications. Only the use of the top three categories as a recommendation, where the highest precision value for the RNN model was 74% and that for the HMM model was 70%, could enable the use of these systems by real users. However, to achieve these precision values, it would be necessary to change the range of analyzed categories, as the solutions used in this work did not exclude categories that, from a practical point of view, do not make much sense as recommendations when determining suggested locations. These include categories such as “Train Station” or “Home,” which are among the most numerous, and their omission in the recommendations significantly reduces their precision.
The conclusions from the conducted research indicate that the greatest improvement in results can be achieved through the personalization of models for individual users. This study utilized only two types of such models. Therefore, one possible direction for further development is creating other types of individualized datasets for specific users. These models could then be combined in appropriate ways to achieve the best possible results.
The further refinement of personalized models could focus on distinguishing user behavior between workdays and weekends. On workdays, activity patterns often include distinct morning and afternoon peaks, corresponding to commuting and lunchtime, with the highest activity typically being observed in the evening. In contrast, weekends exhibit different dynamics, such as the absence of a morning peak, higher activity during midday, and increased late-night engagement. Special consideration could also be given to hybrid days like Fridays, where user behavior blends characteristics of both working and non-working days. By leveraging such nuanced temporal patterns, future models could better anticipate user needs, leading to more contextually relevant and accurate recommendations.
Another development opportunity lies in the observation that, for neural networks, longer check-in sequences over time lead to improved precision. In the dataset used in this study, the average sequence length was only 4.8. It might therefore be worthwhile to collect datasets where these sequences are significantly longer for individual users and then conduct tests on them.
Additionally, by analyzing the results from studies [
35,
36], which were obtained using the manual restructuring of location categories, and comparing them to the results from the automated method applied in this work, it is evident that appropriately manipulating the hierarchy of categories can slightly improve the model. Further development would thus involve attempting to find a structure of location categories that allows for achieving the highest possible precision.