Cultural heritage, such as architecture, handcrafts, and traditions, has been increasingly catalogued in many regions around the world. However, as of now, this rich data has been accessed almost exclusively by experts (e.g., historians, geographers, anthropologists). In this paper, we propose an approach to leverage this data for the promotion of tourism in a specific region. We use the region of the French Pyrenees as a case study. However, our proposals are intended to be suitable for any other region of the world.
During the 18th and 19th centuries, dreams of nature and new experiences gave start to a thriving tourism industry in the region of the Pyrenees. Many tourists would come for leisure and enjoyment, but others would also seek healing and therapies. These mountains suddenly experienced a surge in general public interest and therefore attracted an ever-growing number of western visitors. In the wake of this growth, different hotels, resorts and other building were erected to host these people in the exotic nature of the Pyrenees, known for its hot springs, mild weather, interesting local culture, and in the 20th century for its snow and white-water rivers too. These important activities greatly enriched the tangible cultural heritage (architecture, furniture, etc.) in the Pyrenees and also the intangible part of it (traditions, social events, cuisine, etc.)
As of today, an important part of this inheritance has been catalogued and registered in several different databases across the south-west of France, but not all of it. We take part in a multidisciplinary European Project (FEDER TCVPyr) that aims to create a comprehensive catalogue of this cultural heritage related to tourism in the French Pyrenees. This catalogue is yet to be finished, but at the end a database containing all of the most important points of interest (POIs) georeferenced for tourists in the region should be available. Examples of POIs include natural hot springs, casinos, hotels, villas, winter sport stations, other natural wonders, and even cultural events and traditions related to specific locations. In fact, this data has an enormous potential to be used not only by tourists, but also by experts (e.g., historians and anthropologists), tourism providers, and the general public (tourists, local population, etc.)
In this context, we present two main contributions related to leveraging this cultural heritage data: (i) an open-source framework capable of exporting cultural heritage from our database to well-known open data providers such as Wikipedia and OpenStreetMap; and (ii) an open-source algorithm and framework capable of recommending a sequence of cultural heritage POIs to be visited by tourists according to their preferences, as well as the context and strong links between POIs.
Currently, the French government employs at least two different software (RenablLP2 and Gertrude), each one having different data models (format) to manage and store cultural heritage data. In addition, different (widely unknown) systems are also used to make this information available to the general public and experts. In our first main contribution, we aim to reduce this heterogeneity, especially for the general public, by unifying this data into a single model capable of being published in already established and widely accessed open data providers. We present an application that adopts this model and automatically publishes cultural heritage POIs into Wikipedia according to the principles of publication and reuse of machine-readable data proposed in [1
The second main contribution of this paper is the contextualized recommendation of itineraries (i.e., sequences of POIs to be visited). This recommendation takes into account the user profile, his location, his device features, and also some other parameters such as how much time is available for the tourist to visit the POIs and what is his means of transportation. A recommendation could also be enriched with information extracted from open data. For example, if an exhibition is happening close to the user’s location, the user’s itinerary may be enriched with a link to a ticketing service.
In the literature, this process of itinerary recommendation (or generation) is usually separated into two steps. First, a score representing the pertinence of each POI is estimated. Then, an itinerary is built according to the score and other contextual elements (e.g., distance) [2
We propose a hybrid approach for the recommendation of itineraries. We use a content-based approach to calculate the score of POIs according to the profile of the user (modelling his preferences) and his context. But we also refine and adjust the score of each POI using a collaborative-filtering approach that takes into account its popularity with other tourists and also its relevance when preceded by the previous POI in the itinerary.
Collaborative filtering is achieved through the incorporation of the pheromone deposition and evaporation system in our approach. This principle is inspired by the ant colony optimisation algorithm (ACO). The difference of our approach compared to the traditional ACO is that our objective is not to find an optimal path to go to a given destination, but to implement a co-influence communication [4
] between tourists based-on pheromones (i.e., passage frequency between POIs). In fact, this is one of the original aspects of our proposition. ACO algorithms have been widely used in e-commerce recommendations [5
], but they are not common in the recommendation of POIs.
In the end, our approach generates a list of POIs that are not only pertinent to the preferences of the user (tourist), but also spatially, temporally, and sequentially relevant. Moreover, adjusting the score according to its popularity and sequence relevancy fights overspecialisation [8
] and increases the novelty and serendipity [9
] of our recommendations.
In Section 2
, we discuss related work in the areas of open data publication and recommender systems and itinerary generation. In Section 3
, we detail the architecture of the platform we conceived to support both the publishing of our data as open data and the recommendation of POIs. Section 4
presents the unified data model we propose to gather cultural heritage information from heterogeneous sources. In this section, we also propose a framework to publish cultural heritage information as open data. Next, we detail our approach to generating and recommending itineraries. Finally, in Section 5
we conclude the paper and sketch future work.
2. Related Work
The first contribution of our work consists of an approach to publish cultural heritage data as open data. Much has been done in the area of open data publication. For example, recent work tries to overcome open data publication and reuse challenges [10
]. Ref. [10
] focuses on open data quality measurement. Ref. [11
] details categories of open data producers (e.g., public administration) and re-users (e.g., citizens). Ref. [1
] points out 24 challenges related to the publication of open government data (OGD). Some of these challenges are particularly interesting to our work, namely: (i) too many OGD initiatives, (ii) not enough [accessible] resources, (iii) no standard process or policy for OGD publication, (iv) there is not always a centralised OGD portal available, and (v) lack of suitable software tools for OGD publication. Ref. [12
] describes a first experiment of cultural heritage data publication on Wikipedia where three hundred cultural heritage POIs were converted to Wikipedia articles. However, the export process to Wikipedia is entirely manual and the texts originally written by experts were rewritten for the purpose of presenting the scientific information to the general public. Focusing on open data and cultural heritage data, some recent work aims to link cultural heritage data with linked open data (LOD) [13
]. These projects’ main purpose is cultural heritage corpora enrichment via LOD content and vice versa.
The second contribution of our work focuses on recommending touristic itineraries to the user. As part of itinerary generation, it is essential to model data such as users, context, and itineraries. In what follows, we consider that the user/tourist model and user/tourist profile are synonymous.
The user model defines the user profile. This profile generally includes the information that characterise her/him (e.g., gender, age category, socio-professional activities), her/his preferences (e.g., thematic and historical preferences) and/or age group (e.g., child, adult, elderly person). Some of this information is provided directly by the user (e.g., gender, age category, preferences), others can be deduced from her/his interactions (e.g., her/his “Likes”, her/his age group). Preferences can be derived from notes the user assigns to POIs or comments published. Depending on the POIs that the user has visited, it is also possible to deduce which group of users she/he belongs to. In [15
], for example, the user profile is characterised by her/his name, office location, and social networks (e.g., friends on Facebook), and her/his preferences are deduced from POIs that he/she, or his/her friends, have visited and shared on social networks (e.g., Facebook). The approaches proposed in [16
] consider only the preferences of the user. These preferences are derived from the visitor’s review comments. The works mentioned here make use of comments shared by members of social networks (Yelp (https://www.yelp.com/writeareview/
), TripAdvisor (https://www.tripadvisor.com/
), and Foursquare (https://foursquare.com/
)). They use a learning-based system to determine whether or not a user has enjoyed a visit from her/his review comments. This allows the determination of the user’s preferences and fills out her/his profile.
An itinerary is formed by an ordered list of POIs. The itinerary indicates the POIs visit order, the estimated duration of the visit ffor each POI, and the route between couples of POIs. An itinerary generation system helps the user prepare her/his trip. The majority of itinerary generation systems are based on the resolution of the Orienteering Problem (OP), which is a scoring system described in the operational research literature that was implemented for the first time by Tsiligirides [18
]. In our case, OP consists in collecting points that are assigned to POIs. The objective is to maximise the number of points collected while respecting constraints (e.g., the duration of the whole route). According to Ayala et al. [3
], OP is a combination of the “travelling salesman’s problem” and of the ”knapsack problem”, both of which are operational research problems. Several propositions have been made to extend OP, for example: TOP, TDOP and TDTOPTW where T stands for Team
(i.e., taking into account a team, each POI must be visited by, at most, one team member), TD stands for Time-Dependent
(i.e., taking into account the travel time between POIs), and TW stands for Time-Window
(i.e., taking into account opening and closing hours of POIs). The itinerary generation process is often composed of two phases: POI scoring and itinerary construction. Some approaches incorporate a third phase, which is the adaptation of the generated itinerary. In this case, the user can intervene to modify the itinerary (e.g., add or remove a POI, change the order of POIs to be visited), and the system must be able to adapt to this modification with respect to the user context (e.g., the visit duration) [2
]. In most cases, itinerary generation consists of evaluating the relevance score of a set of POIs and then constructing an itinerary composed of POIs with the highest scores.
Differences between itinerary generation approaches are mainly at the POI scoring stage. POI recommendation approaches can be categorised into three categories according to how the scoring is performed: content-based approaches, collaborative filtering approaches, and hybrid approaches.
Content-based approaches use POIs and/or user information to match POI characteristics to the user’s preferences and context. Several techniques can be used to define this correspondence: activation-propagation techniques [19
], agent-based systems [20
], probabilistic approaches [20
], biology-inspired approaches (e.g., based on neural networks [21
]), etc. One advantage of content-based approach is that it does not suffer from cold start problems when a new POI is added. That is, it is capable of recommending a new item even if no one has interacted with it so far. [22
] considers that recommendation systems have different types of cold start problems: problems with new users (who do not have a history and/or profile), problems related to new POIs (which have not yet received any evaluation), and finally, problems related to specific users (who have different preferences from others).
Collaborative filtering approaches use information about other users with similar characteristics (e.g., age, socio-professional activities, visit evaluation feedback) to recommend POIs to a given user. Several techniques are used by these approaches, such as clustering techniques, allowing, for example, the grouping of users according to their profile [23
] or according to their friendship links on social networks, or even ant colony algorithms dedicated to the calculation of traces of pheromones left by users [24
]. In the latter case, the recommendation considers only the popularity of a POI. The advantage of collaborative filtering approaches is that they allow the recommendation of POIs without requiring specific information about them. However, cold start is a real challenge in this category since it is necessary to have information about other users and new items tend not to be recommended.
Hybrid approaches combine the content-based and collaborative filtering approaches. Different forms of combination have been considered [25
]: (i) separate implementation of content-based and collaborative filtering approaches and then the combination of recommendation results, (ii) integration of some content-based treatments into a collaborative filtering approach, (iii) integrating some collaborative filtering-based treatments into a content-based approach, (iv) proposing a general approach that integrates both content-based and collaborative filtering approaches. In the first category, we can mention the approaches of [15
], which are differentiated by the techniques employed for content processing and collaborative filtering. Refs. [15
] use a linear function, while [26
] uses a prediction function to combine the recommendation results. The approach we propose is in the fourth category of hybrid approaches (i.e., category (iv)). As part of itinerary generation, the POI recommendation strategy we propose is conducted iteratively. At each iteration, we recalculate the potential of a candidate POI, i.e., its global relevance score, taking into account the POIs already integrated into the itinerary being constructed.
POI recommendation systems and itinerary generation may or may not be context-sensitive. Those that are context-sensitive are called CARS, or context-aware recommender systems. According to Schmidt et al. [27
], contexts can be classified into three categories: the context of the user (e.g., location, time, budget, social affinity, social status), computer context or resource context (e.g., network connection, communication cost, available bandwidth, type of processor), and physical context (e.g., traffic conditions, temperature, weather, brightness).
Two surveys in the area of recommender systems applied to the field of tourism [28
] have been recently published and should receive special attention. These surveys present classifications of recommender systems in terms of (i) interface, (ii) data source types, (iii) formalisation and algorithms, (iv) methods of evaluation of the systems. According to these surveys, there are two categories of recommender systems in term of interface: web application and mobile application. The use of a mobile application is interesting compared to web applications because they allow a system to be accessed from any place with an internet connection. Moreover, a mobile application has access to the user’s current location and is therefore able to suggest POIs around him [28
]. Regarding the data source types, a few types are identified: geo-tagged photographs or social medias, location-based social networks or GPS trajectory traces [29
]. According to the formalisation and algorithms, as indicated above, there are three main categories of recommender systems: content-based systems, collaborative systems, and hybrid ones. Half of the works presented in [28
] implement hybrid systems. Several recommendation systems integrate artificial intelligence techniques such as multi-agent systems [30
], optimisation techniques (ant colony optimisation [33
], genetic algorithms [34
], iterated local search [2
], greedy randomised adaptive search methods [36
]), automatic clustering (k-nearest neighbours approach [37
], k-means algorithms [39
], fuzzy c-means [41
]), management of uncertainty (Bayesian networks, fuzzy logic [42
], rule-based approaches [44
], and knowledge representation (ontologies [45
]). For the evaluation of these systems, four methods are proposed in [29
]: (1) real-life evaluations (based on precision, recall, and the harmonic mean of both precision and recall), (2) heuristic-based evaluations (based on total POI recommended, POI popularity or tourist interest), (3) crowd-based evaluations (using qualitative measures that focus on user experiences), and (4) online controlled experiments (design-based variants and algorithm-based variants).
There is, to our knowledge, little research dedicated to itinerary generation/recommendation in the domain of cultural heritage data. The authors of [48
] explain that existing works generally apply the approaches presented above to cultural POI recommendation in the context of multimedia systems. They propose an approach based on the analysis of multimedia objects, notably images taken from the Web, to recommend visit itineraries. In another example, [49
] use a social graph (i.e., affinities between users) in their system of recommending works of art exhibited in museums.
To highlight Pyrenean cultural heritage POIs while avoiding cold start problems and getting as close as possible to user preferences, we plan to set up a hybrid approach that integrates both the content-based approach and the collaborative filtering approach. This approach allows the compiling of a list of pertinent POIs while suggesting with parsimony one or more original POIs to complement those that closely correspond to the user’s preferences. In order to do so, we integrate an ant colony algorithm in our solution. This algorithm is quite common in the recommendation of e-commerce products but is far from popular in POIs itinerary recommendation. Unlike Wang et al. [50
], who use an ant colony optimisation algorithm to calculate a crowd-aware trip (i.e., adaptation of POI visiting order according to crowdedness knowledge), we propose to make use of ant pheromones to compute the social dimension of a POI accuracy score.
In the following section, we present the two main contributions of this paper: (i) an open-source framework to export cultural heritage to well-known open data providers; and (ii) an open-source algorithm and framework to recommend a sequence (itinerary) of cultural heritage POIs to be visited by tourists.
In this paper, we presented our research that aimed at leveraging cultural heritage data to promote tourism in the French Pyrenees. In order to do so, we worked on two fronts: (i) the ability to export this data from their original databases and data models to well-known open data platforms, and (ii) the proposition of an open-source algorithm and framework to recommend a sequence of cultural heritage points of interests (POIs) to be visited by a tourist. The idea is to assist in the opening up of cultural heritage data (often catalogued but largely unknown or difficult to find) to the general public as well as to experts.
We first showed how we created a unified cultural heritage database from heterogeneous sources. In order to do so, we proposed a unified data model that follows the cultural heritage research community standards. This unified data model also enabled us to propose and develop an application to publish information from our database to well-known open data providers such as Wikipedia.
The second important part of our work, and one we gave more attention to, consisted in the recommendation of a sequence of cultural heritage POIs. We proposed a hybrid approach that benefits from content-based and collaborative filtering strengths. The content-based aspect takes into consideration the profile of a user (e.g., his location, means of transportation, availability time-window, preferences, etc.) to recommend POIs that are pertinent. Collaborative filtering is implemented using an ant colony algorithm that adjusts or refines the recommendation, taking into account the popularity of POIs according to other users and also the relevance of a POI in a specific sequence (i.e., how relevant a POI is after another given POI). To the extent of our knowledge, ant colony algorithms have been commonly used in some recommender systems, but not so much when it comes to recommending an itinerary composed of a sequence of POIs. This hybrid approach reduces some well-known types of cold start problems and increases the novelty and serendipity of recommendations. Although we used the Pyrenees region as a case study, this recommendation approach could be adopted in any other region, regardless of the concentration of POIs being low (e.g., somewhere with little cultural heritage) or very high (e.g., the Paris region). The scoring function of the recommendation process always makes it possible to propose a number of relevant POIs whose visit duration will respect the tourist’s time constraints.
In future work, we plan to test this first version of our recommendation application on a varied and larger panel of users. This experiment will make it possible to define initial values for the coefficients of the different relevance measurement formulas. We will also compare the results of our system with those of route generation based solely on geographical and temporal criteria. In addition, we are working on a new version of the recommendation prototype that incorporates both the physical and resource user context elements, which has not yet been the subject of experimentation. This version is intended to dynamically adapt an itinerary according to the context. This new prototype should be also able to enrich our recommendations with open data resources (e.g., Datatourisme) in order to offer users cultural events close to their location and preferences. Finally, we also plan to generalise our approach by feeding the recommender system with heritage POIs (e.g., Pyrenean POIs exported by us or by others) directly published on Wikipedia (i.e., without going through our local databases).