Next Article in Journal
Linear, High Dynamic Range Isolated Skin Resistance Transducer Circuit for Neurophysiological Research in Individuals after Spinal Cord Injury
Next Article in Special Issue
A Cascade Framework for Privacy-Preserving Point-of-Interest Recommender System
Previous Article in Journal
Low Phase-Noise, 2.4 and 5.8 GHz Dual-Band Frequency Synthesizer with Class-C VCO and Bias-Controlled Charge Pump for RF Wireless Charging System in 180 nm CMOS Process
Previous Article in Special Issue
Recommending Reforming Trip to a Group of Users
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Personalized Tour Recommendation via Analyzing User Tastes for Travel Distance, Diversity and Popularity

1
Department of Computer and Software, Hanyang University, Seoul 04763, Korea
2
The Data Science Institute, Columbia University, New York, NY 10027, USA
3
Division of Nanotechnology, Daegu Gyeongbuk Institute of Science & Technology (DGIST), Deagu 42988, Korea
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(7), 1120; https://doi.org/10.3390/electronics11071120
Submission received: 27 February 2022 / Revised: 23 March 2022 / Accepted: 30 March 2022 / Published: 1 April 2022
(This article belongs to the Special Issue Recommender Systems: Approaches, Challenges and Applications)

Abstract

:
The goal of a tour recommendation is to recommend the best destinations according to the preferences of each tourist. The task of tour recommendation is challenging in that it not only has to consider the ratings, as do existing traditional recommendation problems, but it must also consider the personalization of the unique characteristics, such as diversity, travel distance, and popularity of the travel destination, which previous studies have failed to take into account. In this paper, we propose, for the first time, aspect personalization: we find out how important each user considers the diversity, distance and popularity of a travel destination when choosing where to visit. Then, we provide recommendations on tourist attractions by combining the personalized score for each factor and the predicted score. For the evaluation, we gathered user ratings and metadata of POIs from TripAdvisor and Naver. Experimental results showed that the proposed method had an 82%, 24% and 20% improvement in precision and a 129%, 35% and 22% improvement in recall in terms of top-1, top-2 and top-3 recommendations.

1. Introduction

Tourism is one of the largest leisure industries. Nearly 1.45 billion people travel and spend USD 1.48 trillion every year [1]. As both private and public transportation improves, the number of new travelers and possible travel destinations has increased. Because of the increase, travelers are required to spend an extensive time making a proper decision. To efficiently review increased destination options, points-of-interest (POIs) data of massive user experience on the Internet can be used [2,3]. With countless amounts of data, recommender systems can assist a user’s decision making based on one’s personal preference [4,5]. For tour recommendations, tourism-related data collected from social media are expected to be significant for suggesting personalized POIs. The aim of this study is to develop a recommender system that provides suitable POIs to users.
The factors that must be considered when choosing a destination vary from person to person. Some travelers prefer popular destinations that are close to their accommodation, while other travelers may prefer to visit not-well-known, hidden spots. Some travelers may want to visit a variety of venues, such as parks, museums and shopping malls, while others may prefer to visit only certain types of locations. There are thus various factors that go into the POI selection, and it is very important to personalize the recommendations for each user, with those factors taken into consideration.
Existing studies have focused on the nonpersonalized popularity of POI, and the user’s time budget, including the total travel time, which considers the distance traveled between travel destinations and the time spent at the destination [6,7]. Regardless of the user’s personal preference, if a place is well-known, that place unconditionally has a high popularity score. Therefore, it does not take into account the preference of users who do not consider the popularity of the destination as an important factor. Moreover, personalized importance for other factors, such as tour distance or diversity, are rarely considered [8].
In order to solve the problem, this study focused on the personalization of each element for each user. Each user’s personal preference for popularity, distance and diversity was inferred based on the ratings that each user had given to travel destinations in the past. The personalized scores for each element were calculated as follows:
  • Personalized diversity score ( p - Div ): This score reflects a user’s preference score for each category, and is obtained by counting the categories of the travel destinations each user has visited in the past. If the user repeatedly visits only a specific category, the score of that category is relatively high; when a user visits various categories, the score distribution is even.
  • Personalized popularity score ( p - Pop ): This score indicates how much importance a user places in the popularity of the travel destination. This score is obtained based on the average popularity of the travel destinations that the user has visited in the past. This score is higher if the user prefers famous tourist destinations, and lower otherwise. It controls the impact of the popularity of the tourist destination on the final recommendation.
  • Personalized distance score ( p - Dis ): This score indicates how much the user considers the travel distance in selecting the POI. It is determined based on the average distance between destinations that the user has visited in the past. The score is high if the average distance is short, and a lower weight is given for longer average distance, thus controlling the impact distance has in recommending a tourist destination based on the distance between the user’s estimated location and the destination.
We trained an autoencoder with the ratings left by users in POIs to predict ratings for POIs that users have not yet visited. Users’ more recent ratings provided a greater weight to the model’s training. Then, the top N POIs were recommended after deriving the final score, by summing the personalized scores for each aspect, as described above. The proposed model was evaluated based on data from one of the popular travel destinations in South Korea, Jeju Island. Specifically, we collected 156 POIs located on Jeju Island on TripAdvisor, 29,020 ratings left by 7718 users who visited the venue, 109 POIs located on Jeju Island in Naver, and 270,806 ratings left by 109,754 users who visited the Island. As a result of experiments based on these data, the highest recommendation accuracy was observed when each factor was personalized and reflected in the recommendation as we suggested.
The main contributions of this paper can be summarized as follows:
  • We investigated a way to quantify a user’s personal preference for each travel-related aspect (diversity, popularity and distance) based on their tour history.
  • We proposed a novel tour recommendation method that is able to consider each user’s personalized taste for various aspects of POIs as well as their predicted ratings on them.
  • We conducted extensive experiments to evaluate the proposed method. The results show that our idea of considering user tastes for various aspects is really effective in recommending potential POIs, which makes our method outperform several baseline methods.
The remainder of the paper is structured as follows: Section 2 describes related studies. In Section 3, we introduce our data and define some notations. Section 4 describes the proposed method in detail. Section 5 details the experimental environment, and Section 6 reports the experimental results. Finally, Section 7 summarizes the conclusions and introduces future research topics.

2. Related Work

As recommendation systems have become popular, tour recommendations have also become one of the important research areas [6,7,8,9,10,11,12]. Most of the studies produced recommendations based on social media data, such as Flickr, Yelp and Foursquare [13]. In the case of Flickr, information about images, locations and times taken in geotagged photos were mainly used [6,7,11,14], and for Yelp and Foursquare, the ratings on POIs were mainly used [8,12]. In one study [11], the users’ preferences were identified by measuring the time duration at the POI based on the location and time information included in the users’ photographs. In studies [6,7], the places visited were identified using location data where the photos were taken, and users’ preferences for each travel destination category were identified based on the category distribution of those places. Ref. [8] used the ratings given by users to POIs to understand users’ preferences.
For a successful travel destination recommendation, it is important to understand how important popularity, diversity and distance are for each user. However, most existing studies miss one or more of the aforementioned factors. Table 1 summarizes the factors that related studies consider as part of personalization, and those that do not. For popularity, most studies give high scores to famous travel destinations regardless of whether the users value popularity as an important factor [6,7,9,11,12]. In the case of the distance to the destination, most studies only used this for calculating the time budget. For example, in [7,11], the user’s movement speed was predefined (4 km/h or 5 km/h) and the destination was selected so that the time taken to reach the location did not exceed the user’s time budget. In [6], the authors would only recommend places based on setting the distance budget between POIs and making sure the distance between POIs did not exceed the distance budget. However, to the best of our knowledge, there have not been any studies reflecting the users’ preferences towards distance factor, whether they preferred closeness to a POI or not.
Of the existing research, only Aurigo [12] personalized both distance and diversity aspects and reflected both aspects in their recommendations. However, since Aurigo received direct input about the distance criterion and preference of category from users, personalization was possible. This method can therefore only be used in an environment in which direct interaction with the user is possible. We propose a method to personalize all of the popularity, distance and diversity metrics, and reflect them in recommendations without requiring an interactive system.
In the context of collaborative filtering, an autoencoder-based rating prediction and recommendation has been studied [15,16,17], which is also related to this work. An autoencoder is a deep neural architecture trained to produce its output as similar with the input as possible. In the recommender systems area, the autoencoder is used to reconstruct a dense vector with predictions on the missing entries by feeding a sparse vector [15,16]. The early work [15] proposed user-specific autoencoder for recommendation, while AutoRec [16] used just one autoencoder that was trained based on the entire dataset. There are two variants of AutoRec depending on input types: user-based AutoRec uses user rating vectors as inputs and item-based AutoRec uses item rating vectors as inputs. HybridAE [17] suggested a hybrid recommender system based on an autoencoder, which incorporated user-item rating matrix and contents information for the solution to the lack of information.

3. Preliminaries

3.1. Data Description

We collected data from Jeju Island, one of the most famous tourist cities in Korea. Information and reviews on travel destinations in Jeju were collected from TripAdvisor [18] and Naver. We crawled 156 registered destinations and collected 7718 users’ 29,020 reviews from TripAdvisor. Based on 156 TripAdvisor destinations, we collected 109,754 users’ 270,806 reviews posted on Naver Map. The dataset includes the anonymous user ID, rating and date. Ratings were scored on scales of 1 to 5 and 0.5 to 5, respectively, for TripAdvisor and Naver. Table 2 shows the statistics of the dataset we collected.

3.2. Preliminaries and Notations

The notations used in our work are summarized in Table 3. We define a list of (user, POI, category), tour history, rating matrix, and diversity matrix for each user as follows:
  • (User, POI, category) list: We have a set of m users, a set of n POIs, and a set of i categories, respectively, defined as: U = { u 1 , u 2 , …, u m }, P = { p 1 , p 2 , …, p n }, C = { c 1 , c 2 , …, c i }. Each POI p n is associated with just one category.
  • Tour history: The tour history of each user is defined as H u = (( p 1 , r u p 1 , t p 1 u , n p 1 , e p 1 , c p 1 ), …, ( p n , r u p n , t p n u , n p n , e p n , c p n )). Here, the components of each tuple are POI p n visited by user u, the rating r u p n left on p n , the date the rating was given t p n u , the latitude n p n and longitude e p n of the POI and the POI category c p n .
  • Rating matrix: We aggregate the rating data included in the tour history of all users and combine the ratings left by all users on the POIs into a single sparse matrix R = ( r u p ) m × i . Element r u p has a rating if user u left a rating on POI p.
  • Diversity matrix: We aggregate the category information included in each user’s tour history by calculating the number of visits for each category by all users and creating a sparse matrix D = ( d u c ) m × n . The number of times user u visited category c is represented as d u c , computed by:
    d u c = u x U p H u δ u = u x · δ ( c = c p )
    where δ (u = u x ) is an indicator function that returns 1 when u x and u are equal, and returns 0 when they are not. If u has never visited c, a value of 0, meaning empty, is assigned to d u c .

4. Method

4.1. Overview

Our main aim was to identify each user’s personalized preferences for popularity, diversity and distance, and recommend travel destinations based on these data. Each personalized score was defined as p- D i v , p- P o p and p- D i s , and the way in which these values were calculated are explained in detail in the next sections. Then, we calculated the final score S ( u , p ) for the POI p of user u by obtaining a normalized weighted sum of (1) predicted the rating of u on p, (2) popularity of p and (3) distance from u to p, where each of the personalized scores p- D i v , p- P o p and p- D i s were used as weights. Finally, we suggested the user’s top N destinations based on the highest S ( u , p ) . The conceptual overview of our recommendation system is illustrated in Figure 1.

4.2. Diversity Personalization

This section describes the adaptation of personal preference for diversity of POIs in the POI recommendation. We defined a value p- D i v ( u , p ) : it is high for a POI p from a specific category if u only visited a certain type of destination; conversely, if u visited diverse types of destinations, the value p- D i v ( u , p ) would be evenly high for various categories. Then, to reflect the characteristic in the prediction, we also predicted each user’s rating for each POI and multiplied the predicted rating by p- D i v ( u , p ) as a weight.
We used an autoencoder to predict the user’s rating and the value of p- D i v . Each column vector of the sparse matrix D composed of the number of visits by category was used as an input–output to train the autoencoder, and the dense matrix D ^ was derived by using the aggregated reconstructed output from the trained autoencoder. The loss function of the autoencoder was calculated as follows:
L ( D ) = ( 1 m ) u = 1 m ( d u d u ^ ) 2
where d u is u’s (sparse) category preference vector and d u ^ is the predicted (dense) category preference vector. We defined each element d u c of D ^ as the value of p- D i v ( u , p ) for u’s category c. We consider that u views diversity to be an important factor in POI selection if the distribution of a user’s p- D i v ( u , p ) is evenly distributed across all categories. If p- D i v ( u , p ) is only high for certain categories, it can be regarded as the user preferring a specific category rather than a diversity of categories.
Next, we predicted the ratings for unvisited POIs. We trained the autoencoder using each column vector of R . Then, using the trained autoencoder, we derived a dense matrix R ^ containing predicted ratings. The loss function used for training was as follows:
L ( R ) = ( 1 u = 1 m δ r u 0 ) u = 1 m ( r u r u ^ ) 2 · δ r u 0 · t u
where δ ( r u p 0) is an indicator function that returns 1 if u leaves a rating on p, otherwise, 0. By doing so, we limited the loss only to observed elements in the vectors. Note that, in Equation (2), we did not use the term δ ( r u p ) so that the zeros included in D could also be considered in the loss function. This is because missing entries in D are meaningful in that they convey information that a specific category has not been visited. However, we chose to use the indicator function for learning R since it is an extremely sparse matrix that includes too many zeros, which might dominate the small number of observed ratings [19].
In addition, we defined a weight vector t u that has a larger value if the ratings left by a user was more recent. This was done so that the more recent ratings would have a greater influence on model training. Each element t u , p of this vector was computed as follows:
t u , p = 0.5 + t p u m i n ( t u ) m a x ( t u ) m i n ( t u ) × 0.5
where m a x ( t u ) and m i n ( t u ) are the maximum and minimum values of the time at which u left a rating and are used for normalization.
Finally, we applied the value we acquired by multiplying each element r u p ^ of R ^ with u’s p- D i v ( u , p ) from D ^ ’s ( u , c ) element for the final recommendation.

4.3. Popularity Personalization

In this section, we explain how the popularity of a POI is personalized and considered in the recommendations. First, the absolute popularity score of each POI was obtained, and then the p- P o p score, which indicates how important each user considers the popularity aspect to be, was calculated. Then, the two values were simply multiplied and reflected in the final recommendation.
The nonpersonalized popularity score of a POI p, denoted as P o p ( p ) , was calculated using the number of nonduplicate visitors to p and the average of the ratings left by them. The specific formula was as follows:
P o p ( p ) = r p C n t ( p ) + C n t ( p ) p y P C n t ( p y ) × 5
where each r p and C n t ( p ) was computed by:
r p = u U p y H u δ p = p y · r u p
Cnt ( p ) = u U p y H u δ p = p y
Equation (6) calculates the sum of the ratings left by users for p. Equation (7) counts the number of ratings that users have left in p. Since the average of the ratings, which is the first term of Equation (5), can have a value between 1 and 5, the second term is multiplied by 5 to match the ranges of the two terms.
Next, we calculated the p- P o p score, which implies how much the user considers popularity when selecting a POI. We first plotted the average popularity distribution of POIs visited by each user. The results are shown in Figure 2, in which the left graph displays the average popularity distribution of POIs visited by TripAdvisor users, and the graph on the right is that of Naver users. Using these graphs, we can see that some users only visit famous POIs, whereas others do not. To capture this aspect, we defined the personalized popularity weight p- P o p for user u as the average popularity of POIs visited by u as follows:
p - P o p ( u ) = P o p ( u ) C n t ( u )
where each P o p ( u ) and C n t ( u ) was computed by:
P o p ( u ) = u x U p y H u δ u = u x · p o p ( p y )
C n t ( u ) = u x U p y H u δ u = u x
Here, Equation (9) is the sum of the popularity of POIs visited by u, and Equation (10) counts the number of POIs visited by u. The higher the value of p- P o p ( u ) , the more important u values popularity in a POI selection. Finally, we reflected the value obtained by multiplying the popularity value P o p ( p ) of a POI by p- P o p ( u ) in the final recommendation.

4.4. Distance Personalization

We took into account users’ personal preferences on whether or not they preferred short distances to travel destinations when providing recommendations. We first assumed that the center of the coordinates of the POIs visited by user u in the past was the starting point for u. Then, the nonpersonalized distance score between each POI not visited by u was calculated. Next, we derived the p- D i s weight, which indicates how important each user considers the distance aspect to be. The result of multiplying the two values was reflected in the final recommendation.
First, the nonpersonalized distance score, denoted as D i s ( u , p ) , from the starting point of u to each nonvisited POI p was calculated as follows:
D i s ( u , p ) = m a x ( H ( u , p y P ) ) H ( u , p )
where H ( u , p ) is a function that calculates the distance from u to p using the Haversine Formula (the Haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes [20]). In order to give a higher score for a closer distance, we calculated “the distance to the further POI that u could go to” minus “the distance to each p”. As a result, the closer the distance between u and p, the higher the value of D i s ( u , p ) .
Next, we computed p- D i s . We first plotted the average distance distribution of POIs visited by each visitor. The result is shown in Figure 3, in which the graph on the left shows the average distance distribution of POIs visited by TripAdvisor users and the graph on the right shows the ones visited by Naver users. From the graph, we can confirm that some users only visit POIs nearby while others do not. To capture this aspect, we calculated the personalized distance weight p- D i s of u by using the average of the POIs visited by u as follows:
p - D i s ( u ) = m a x ( a v g ( u x U ) ) a v g ( u )
where a v g ( u ) indicates the average distance between POIs visited by u. The higher the value of p- D i s ( u ) , the higher the distance is considered as an important factor in u’s POI selection. Finally, the value obtained by multiplying the distance value D i s ( u , p ) of the POI by p- D i s ( u ) was reflected in the final recommendation.

4.5. Final Score Computation

With the aggregation of the scores obtained from the previous sections, we calculated the final score S ( u , p ) for the POI p for user u. We first normalized the previously acquired nonpersonalized values P o p ( p ) and D i s ( u , p ) into a scale from 0 to 5, in line with the rating scale. Then, the final score S ( u , p ) was calculated as the weighted sum of the scores for each aspect as follows:
S ( u , p ) = α · r u p ^ · p - D i v ( u , p ) + β · P o p ( p ) · p - P o p ( u ) + γ · D i s ( u , p ) · p - D i s ( u )
where α , β , γ are hyperparameters that control the importance of diversity, popularity and distance aspects, respectively. We found the optimal values for α , β , γ through a grid search.

5. Experimental Settings

5.1. Dataset

As mentioned in Section 3.1, we conducted experiments using data from TripAdvisor and Naver. Of the 29,020 reviews collected from TripAdvisor and 270,806 reviews collected from Naver, we removed reviews that were either:
  • Reviews with missing ratings or user IDs.
  • Reviews left by users with less than five tour histories; the user was also excluded from the user list.
  • Reviews on a POI with fewer than 10 reviews; the POI was also excluded from the POI list.
Among the remaining reviews, we split, respectively, the past 80% and the recent 20% as training and test data, according to the time the reviews were written. Since the sentiment of the user was already expressed in the review, we did not analyze the review text itself.
We plotted the distribution of the number of visitors by POI and category. The results are shown in Figure 4 and Figure 5. The left graph shows the data from TripAdvisor and the right shows the distribution from the Naver data. In all cases, we could observe a typical power law distribution. Details on each category ID are shown in Table 4.

5.2. Evaluation Metrics

The accuracy of a recommender system means the ratio of the recommended items appear in the ground truth. In the literature, various metrics have been developed to measure the recommendation accuracy. Among many others, we adopted precision, recall, normalized discounted cumulative gain (NDCG) and mean reciprocal rank (MRR) [21]. Precision and recall are the metrics that quantify how many times a model “hits” the ground truth items. Unlike the previous evaluation metrics, NDCG and MRR consider the rank of the correct item in the recommended list provided to users.
Formally, for a user u, let R e c u denote a ranked list of N items recommended to u by an algorithm and T e s t u be a set of ground truth items in a test data. In order to evaluate each R e c u , the four metrics are computed as:
P r e c i s i o n u @ N = | T e s t u R e c u | | R e c u |
R e c a l l u @ N = | T e s t u R e c u | | T e s t u |
D C G u @ N = k = 1 N 2 y k 1 log 2 ( k + 1 )
M R R u @ N = 1 r a n k f i r s t ( u )
where D C G u @ N in Equation (16) denotes the discounted cumulative gain for each user and y k stands for the relevance score of the kth ranked item in R e c u to user u ( y k = 1 if the item is correct and 0 otherwise). Then, the normalized DCG ( N D C G u @ N ) is computed by dividing D C G u @ N with a DCG obtained by an ideal ranking algorithm. r a n k f i r s t ( u ) in Equation (17) indicates the ranked position of the first correct item among those in R e c u [22].

5.3. Implementation Details

For the autoencoders in our framework, we used a SELU function and an identity function as the activation functions of the hidden layers and the output layer, respectively. We used Xavier’s network initialization approach [23]. We set the minibatch size to 64, the learning rate to 0.001 and the dropout rate to 0.5. Our autoencoder for the rating prediction had the structure i n p u t 90 70 90 o u t p u t . The autoencoder for our category diversity prediction had a relatively simple structure of i n p u t 5 o u t p u t .

6. Results and Analysis

We first plotted a loss curve to check whether the autoencoder was properly trained on the diversity matrix D and the rating matrix R . The results are shown in Figure 6 and Figure 7. Figure 6 is the loss curve from the autoencoder trained on R , and Figure 7 is the loss curve from the autoencoder trained on D . We observed that all curves converged at appropriate epochs.
Next, we measured the accuracy of the proposed method. The experimental results are summarized in Figure 8 and Table 5, Table 6 and Table 7. Each method listed in the Algorithm column of each table is a baseline for comparison with the proposed method. First, the above four baselines are random recommendations and algorithms that consider only one of the three aspects. For example, the random method randomly recommends N of the unvisited POIs. The popularity baseline method recommends the top N POIs based on personalized popularity and the rating method recommends the top N POIs based on the rating predictions of the autoencoder. The distance baseline recommends the top N POIs based on the personalized distance score. The next seven baselines are methods that consider more than one aspect. For example, Popularity + Rating adds up only the popularity and rating scores. Popularity + Rating + Diversity, is a recommendation method that considers popularity, rating and diversity. In the case of diversity, since POIs of the same category have the same value, it is not appropriate to consider the diversity score alone, so we ensured that diversity was considered along with the rating. The proposed method considers all aspects.
We confirmed that the proposed method showed higher accuracy than all the baseline algorithms for both datasets, and for all cases of top-1, top-2, and top-3 recommendations. Among the baselines, all methods considering popularity performed relatively well, and rating, distance and rating+diversity showed relatively low accuracy when used alone. However, since the performance of the proposed method that takes into all of these factors into account was the highest, it can be interpreted that each factor was combined to produce a positive synergistic effect. The proposed method improved by 82%, 24% and 10% the top-1, top-2, and top-3 recommendations compared to the popularity recommendation in the Naver data, respectively, and 34%, 17% and 20% improvement in the TripAdvisor data, respectively. In terms of recall, the recommendation accuracy was improved by 129%, 35% and 10% in the Naver data, and 39%, 17% and 22% in the TripAdvisor data.

7. Conclusions and Future Work

In this paper, we proposed a tour recommendation method, where the main idea was to personalize the user’s preference for each element. We first built a rating matrix and a diversity matrix for training an autoencoder. Then, a personalized diversity score was derived based on the reconstructed outputs from the autoencoder. Then, the popularity and distance scores of the POI were obtained, as were the degree to which users considered popularity and distance to be important factors in the POI selection, multiplied as weights, and reflected in the final recommendation. The proposed method was evaluated using TripAdvisor and Naver data and showed higher recommendation accuracy than other baselines in all cases.
We believe that there are many other aspects that could be considered in travel destination recommendation aside from the diversity, popularity, and distance factors we considered. In our future work, the weather conditions and the peak season period of each POI will also be considered. Based on the individual POI recommendations, we will work on travel routes recommendations. We also plan to research travel recommendations based on user composition (couples, families, friends and individual) to provide more precise recommendation than current general recommendation.

Author Contributions

Conceptualization, J.A.S. and D.-K.C.; methodology, J.L. and D.-K.C.; software, J.A.S.; validation, J.L. and D.-K.C.; formal analysis, D.-K.C.; investigation, J.L.; resources, J.A.S.; data curation, J.L.; writing—original draft preparation, J.L. and D.-K.C.; writing—review and editing, S.-C.L. and J.A.S.; visualization, J.L. and D.-K.C.; supervision, S.-C.L.; project administration, S.-C.L.; funding acquisition, S.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by (1) the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (no. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)), (2) the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (no. 2021R1A2C1094863), and (3) the DGIST R&D program of the Ministry of Science and ICT of KOREA (22-IT-10-03).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. UNWTO Tourism Highlights (2020), Edition 2020. Available online: https://www.e-unwto.org/doi/pdf/10.18111/9789284421152 (accessed on 9 February 2022).
  2. Ge, Y.; Liu, Q.; Xiong, H.; Tuzhilin, A.; Chen, J. Cost-aware travel tour recommendation. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 983–991. [Google Scholar]
  3. Xiang, Z.; Wang, D.; O’Leary, J.T.; Fesenmaier, D.R. Adapting to the internet: Trends in travelers’ use of the web for trip planning. J. Travel Res. 2015, 54, 511–527. [Google Scholar] [CrossRef]
  4. Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, Chapel Hill, NC, USA, 22–26 October 1997; pp. 175–186. [Google Scholar]
  5. Ricci, F.; Rokach, L.; Shapira, B. Introduction to Recommender Systems Handbook, 3rd ed.; Springer: Boston, MA, USA, 2011; pp. 1–35. [Google Scholar]
  6. Lim, K.H. Recommending tours and places-of-interest based on user interests from geo-tagged photos. In Proceedings of the 2015 ACM SIGMOD on PhD Symposium, Melbourne, Australia, 31 May 2015; pp. 33–38. [Google Scholar]
  7. Lim, K.H.; Chan, J.; Karunasekera, S.; Leckie, C. Personalized itinerary recommendation with queuing time awareness. In Proceedings of the 40th international ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Japan, 7–11 August 2017; pp. 325–334. [Google Scholar]
  8. Zhang, C.; Liang, H.; Wang, K.; Sun, J.D. Personalized trip recommendation with poi availability and uncertain traveling time. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 18–23 October 2015; pp. 911–920. [Google Scholar]
  9. Chen, D.; Ong, C.S.; Xie, L. Learning points and routes to recommend trajectories. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 2227–2232. [Google Scholar]
  10. He, J.; Qi, J.; Ramamohanarao, K. A joint context-aware embedding for trip recommendations. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering, Macao, China, 8–11 April 2019; pp. 292–303. [Google Scholar]
  11. Lim, K.H.; Chan, J.; Leckie, C.; Karunasekera, S. Personalized tour recommendation based on user interests and points of interest visit durations. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 1778–1784. [Google Scholar]
  12. Yahi, A.; Chassang, A.; Raynaud, L.; Duthil, H.; Chau, D.H. Aurigo: An interactive tour planner for personalized itineraries. In Proceedings of the 20th International Conference on Intelligent User Interfaces, Atlanta, GA, USA, 29 March–1 April 2015; pp. 275–285. [Google Scholar]
  13. Xiang, Z.; Gretzel, U. Role of social media in online travel information search. Tour. Manag. 2010, 31, 179–188. [Google Scholar] [CrossRef]
  14. Kurashima, T.; Iwata, T.; Irie, G.; Fujimura, K. Travel route recommendation using geotags in photo sharing sites. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 579–588. [Google Scholar]
  15. Ouyang, Y.; Liu, W.; Rong, W.; Xiong, Z. Autoencoder-based collaborative filtering. In Proceedings of the International Conference on Neural Information Processing, Montreal, QC, Canada, 6 June 2014; pp. 284–291. [Google Scholar]
  16. Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, New York, NY, USA, 18–22 May 2015; pp. 111–112. [Google Scholar]
  17. Strub, F.; Gaudel, R.; Mary, J. Hybrid recommender system based on autoencoders. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 11–16. [Google Scholar]
  18. Okazaki, S.; Andreu, L.; Campo, S. Knowledge sharing among tourists via social media: A comparison between Facebook and TripAdvisor. Int. J. Tour. Res. 2017, 19, 107–119. [Google Scholar] [CrossRef]
  19. Chae, D.K.; Kang, J.S.; Lee, J.T.; Kim, S.W. CFGAN: A generic collaborative filtering framework based on generative adversarial networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 38–40. [Google Scholar]
  20. Robusto, C.C. The cosine-haversine formula. Am. Math. Mon. 1957, 64, 38–40. [Google Scholar] [CrossRef]
  21. Beel, J.; Langer, S.; Genzmehr, M.; Gipp, B.; Breitinger, C.; Nürnberger, A. Research paper recommender system evaluation: A quantitative literature survey. In Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, Hong Kong, China, 12 October 2013; pp. 15–22. [Google Scholar]
  22. Chae, D.K.; Kim, S.W.; Lee, J.T. Autoencoder-based personalized ranking framework unifying explicit and implicit feedback for accurate top-N recommendation. Knowl.-Based. Syst. 1957, 64, 38–40. [Google Scholar] [CrossRef]
  23. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Figure 1. System overview.
Figure 1. System overview.
Electronics 11 01120 g001
Figure 2. Average of popularity values for each user.
Figure 2. Average of popularity values for each user.
Electronics 11 01120 g002
Figure 3. Average of distance (km) to POIs visited for each user.
Figure 3. Average of distance (km) to POIs visited for each user.
Electronics 11 01120 g003
Figure 4. The number of visitors for each POI.
Figure 4. The number of visitors for each POI.
Electronics 11 01120 g004
Figure 5. The number of visitors for each category.
Figure 5. The number of visitors for each category.
Electronics 11 01120 g005
Figure 6. Loss curve (rating prediction).
Figure 6. Loss curve (rating prediction).
Electronics 11 01120 g006
Figure 7. Loss curve (diversity prediction).
Figure 7. Loss curve (diversity prediction).
Electronics 11 01120 g007
Figure 8. Accuracy comparisons.
Figure 8. Accuracy comparisons.
Electronics 11 01120 g008
Table 1. Comparisons of related work and our method. X: does not reflect that aspect in recommendations; △: reflects the aspect, but does not consider personalization of that aspect; O: personalizes and reflects that aspect into recommendations.
Table 1. Comparisons of related work and our method. X: does not reflect that aspect in recommendations; △: reflects the aspect, but does not consider personalization of that aspect; O: personalizes and reflects that aspect into recommendations.
MethodsPopularity PersonalizationDiversity PersonalizationDistance Personalization
Based on user interests and visit durations [11]
POI availability and uncertain traveling time [8]XX
User interests from geotagged photos [6]X
Based on queuing time [7]
Learning points and routes [9]
Aurigo [12]X
Ours
Table 2. Data statistics.
Table 2. Data statistics.
Dataset#Users#POIs#Ratings#Categories
TripAdvisor771815629,02010
Naver109,754109270,8068
Table 3. Notations.
Table 3. Notations.
SymbolDescription
U a set of users
ua user
P a set of POIs
pa POI
C a set of categories
ca category
H u tour history of user u
n p latitude of POI p
e p longitude of POI p
c p category of POI p
r u p rating of POI p by user u
t p u date of rating left by user u at POI p
d u c diversity score of category c by user u
R rating matrix
D diversity matrix
Table 4. Categories of POIs.
Table 4. Categories of POIs.
Category IdNaverTripAdvisor
0theme parklandscape
1museumtheme park
2landscapeeco park
3eco parkmuseum
4art museumshopping
5shoppingscenic drive
6activitiesactivities
7structurestructure
8 art museum
9 place of worship
Table 5. Top 1 recommendation accuracy comparison.
Table 5. Top 1 recommendation accuracy comparison.
DatasetAlgorithmPrecisionRecallnDCGMRRDatasetAlgorithmPrecisionRecallnDCGMRR
NaverRandom0.01740.0120.01740.0174TripAdvisorRandom0.00670.0040.00670.0067
Popularity0.10210.04620.10210.1021Popularity0.13360.10380.13360.1336
Rating0.01950.00950.01950.0195Rating0.00920.00440.00920.0092
Diversity0.03270.01590.03270.0327Diversity0.010.00480.010.01
Distance0.01760.01250.01760.0176Distance0.01670.01310.01670.0167
Popularity+Rating0.14870.08150.14870.1487Popularity+Rating0.13360.10380.13360.1336
Popularity+Distance0.11830.05640.11830.1183Popularity+Distance0.16110.13230.16110.1611
Rating+Distance0.0190.00960.0190.019Rating+Distance0.01590.00910.01590.0159
Popularity+Diversity0.15670.08900.15670.1567Popularity+Diversity0.13520.10470.13520.1352
Distance+Diversity0.01960.01040.01960.0196Distance+Diversity0.02340.01430.02340.0234
Popularity+Distance+Rating0.15440.08620.15440.1544Popularity+Distance+Rating0.16690.13670.16690.1669
Proposed0.18580.10560.18580.1858Proposed0.17860.14420.17860.1786
Table 6. Top 2 recommendation accuracy comparison.
Table 6. Top 2 recommendation accuracy comparison.
DatasetAlgorithmPrecisionRecallnDCGMRRDatasetAlgorithmPrecisionRecallnDCGMRR
NaverRandom0.0190.02250.02330.0281TripAdvisorRandom0.00670.00890.00960.0117
Popularity0.11720.11980.1340.1542Popularity0.11890.1920.17240.1811
Rating0.02080.02120.02450.0297Rating0.00920.00920.0110.0134
Diversity0.03270.03470.03940.0474Diversity0.01250.01440.01480.0171
Distance0.01930.02470.02460.028Distance0.01880.02690.02530.0267
Popularity+Rating0.13550.15190.16910.1968Popularity+Rating0.1240.19820.17730.1857
Popularity+Distance0.12640.13440.14860.1724Popularity+Distance0.12230.20020.18840.207
Rating+Distance0.01950.01910.0230.0284Rating+Distance0.01840.02220.02230.0263
Popularity+Diversity0.12670.14960.16590.1948Popularity+Diversity0.12440.19780.17790.1866
Distance+Diversity0.02010.02260.02480.0291Distance+Diversity0.02670.03570.03390.0376
Popularity+Distance+Rating0.12240.13880.15890.1893Popularity+Distance+Rating0.13150.21480.19920.2116
Proposed0.14570.16200.18680.2177Proposed0.13860.2250.20990.2229
Table 7. Top 3 recommendation accuracy comparison.
Table 7. Top 3 recommendation accuracy comparison.
DatasetAlgorithmPrecisionRecallnDCGMRRDatasetAlgorithmPrecisionRecallnDCGMRR
NaverRandom0.01940.03410.02960.0357TripAdvisorRandom0.01280.02440.02090.0243
Popularity0.11830.19540.16450.1837Popularity0.09570.22730.18870.1953
Rating0.02180.03440.03020.0368Rating0.00890.01460.01270.0161
Diversity0.03160.05150.04590.0562Diversity0.01310.02410.0190.0218
Distance0.02090.03930.03190.0356Distance0.01750.03580.02960.0314
Popularity+Rating0.12170.2070.18790.2179Popularity+Rating0.09960.23710.19480.2008
Popularity+Distance0.11680.1890.16790.1931Popularity+Distance0.10410.2530.21320.2205
Rating+Distance0.01990.03150.02790.0348Rating+Distance0.0170.03330.02660.031
Popularity+Diversity0.10680.18980.17700.2110Popularity+Diversity0.10070.24030.19680.2016
Distance+Diversity0.02190.03650.03130.0366Distance+Diversity0.02530.05270.04130.0451
Popularity+Distance+Rating0.10740.18480.17340.2078Popularity+Distance+Rating0.11320.27650.22880.2341
Proposed0.12960.21440.20360.2383Proposed0.11520.27790.23480.2429
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lee, J.; Shin, J.A.; Chae, D.-K.; Lee, S.-C. Personalized Tour Recommendation via Analyzing User Tastes for Travel Distance, Diversity and Popularity. Electronics 2022, 11, 1120. https://doi.org/10.3390/electronics11071120

AMA Style

Lee J, Shin JA, Chae D-K, Lee S-C. Personalized Tour Recommendation via Analyzing User Tastes for Travel Distance, Diversity and Popularity. Electronics. 2022; 11(7):1120. https://doi.org/10.3390/electronics11071120

Chicago/Turabian Style

Lee, Jongsoo, Jung Ah Shin, Dong-Kyu Chae, and Sang-Chul Lee. 2022. "Personalized Tour Recommendation via Analyzing User Tastes for Travel Distance, Diversity and Popularity" Electronics 11, no. 7: 1120. https://doi.org/10.3390/electronics11071120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop