Knowledge Graphs for Online Marketing and Sales of Touristic Services

: Direct online marketing and sales are nowadays an essential part of almost any business that addresses an end consumer, such as in tourism. On the downside, the data and content required for such marketing and sales are typically distributed, and difﬁcult to identify and use, especially for small and medium enterprises. Further, a combination of content management and semantics for automated online marketing and sales is becoming practically feasible now, especially with the global adoption of knowledge graphs. A design and feasibility pilot of a solution implementing semantic content and data value chain for online direct marketing and sales, basing on knowledge graphs, and efﬁciently addressing multiple channels and stakeholders, is provided and evaluated with the end-users. The implementation is shown to be suitable for the use on the Web, social media and mobile channels. The proof of concept addresses the tourism sector, exploring, in particular, the case of touristic service packaging, and is applicable globally. The typically encountered challenges, particularly, the ones related to data quality, are identiﬁed, and the ways to overcome them are discussed. The paper advances the knowledge of employment of knowledge graphs in online marketing and sales, and showcases its related innovative practical application, co-created by the industry providing marketing and sales solutions for Austria, one of the world’s leading touristic regions.


Introduction
As end-consumers tend to spend more time online, in social networks, mobile apps, etc., and at the same time, the costs for using online booking and marketing platforms increase, online direct marketing and sales are becoming increasingly important, and even indispensable, for addressing potential customers. On the downside, the challenge is that the production of high-quality, attractive and engaging content, such as images, video clips, texts, for online channels requires creativity, worktime, skills and equipment, which most, especially smaller, businesses do not possess. Furthermore, the typical businesses do not have the content and data available for supporting their marketing; and they typically need to research: • which contents have already been published by other similar businesses online (e.g., because they want to be unique in their communication and avoid posting the same kind of contents many others have already posted), the "closed" data, the touristic service package production system is to be able to create the most optimal travel experience for the traveler. Further, the service packages are to be efficiently published and made bookable to the end consumers via intelligently selected most suitable communication and booking channels: especially the online channels with rapidly growing user audiences, such as the social media and the mobile apps. The TourPack project has delivered the settings for our work, and in this paper we demonstrate its outcomes, specifically, addressing the data management and marketing aspect of this system. This paper is structured as follows. The addressed problem and typical tourism domain user scenario examples are in Section 2. In Section 3, the current state-of-the-art is presented. Our approach to the solution and architecture is found in Section 4. Section 5 contains implementation details, and Section 6 describes the evaluation. Section 7 concludes and summarizes the paper.

Description of Problem and User Scenario
The internet, web-based communication and booking channels are becoming increasingly important in today's competitive world. Organizations of all sizes, commercial and not-for-profit, regularly face the challenge of communicating with their stakeholders using a multitude of contents, as well as of channels, e.g., websites, videos, PR activities, events, email, forums, online presentations, social media, mobile applications, and (more recently) structured data. The need to be present at different communication channels directly, and the impacts the channels are bringing on the marketing and the organizational processes have been investigated in practice [9,10]. Hence, management of the marketing-and booking-related data and being able to combine it on the fly and spread the message over various channels became a basic requirement for a marketer. The importance of marketing over multiple channels is also recognized in research, and approaches that involve multiple stakeholders of the marketing process are being investigated [11]. Naturally, the touristic service, marketing would extend to heterogeneous types of services (e.g., in food and drink, wellness, shopping, culture), and in the same personalized manner would deliver packaged offers for experiences matching the availability and expectations to the end-users.
For tourists, the linked data-based service system for direct marketing would be crucial when finding and consuming the most relevant services on the fly. A typical end-user scenario, reflecting a hotel's and a hotel guest's every-day business, is as follows: In the afternoon, a guest is up for some outdoor activities. He/she remembers seeing a "Sports" section in the app. Quickly browsing through the outdoor and sport activities in the app he/she finds a place he/she wants to go to. Since it is the guest's first visit in this town he/she has no idea how to find this place. So he/she clicks the "Details" button in the description of the place and finds a direct link to OpenStreetMap which directs him/her to the desired place.

State of the Art and Current Knowledge
Addressing our showcase domain of tourism, data-centric information channels for Austria currently provide machine-processable information such as mountain bike routes or public transport schedules, and restaurants with specific food preferences. Online services allow the booking of hotels (solutions are provided by companies like Kognitiv, Easybooking), ski passes (e.g., provided by SkiData), concert tickets (e.g., provided by oeticket), etc. Making touristic services easy to publish for the service providers and easy to find and book for the tourists are the key challenges for the production of a complete online service tourist offer package. Considering the abundance and variety of travel services and the restricted time travelers typically possess on vacations or on business trips, the touristic service search, selection and combination require a lot of effort from the service consumer. The methods to access and process the applicable structured Web data are currently evolving, while most of them focusing on the scenarios of consumption of linked data from semantic repositories [12] and not yet investigating the use of all available distributed data to a full extent. In addition to addressing the holistic solution of the set research question (in Section 1), our solution advances the state of the art in the individual state of the art cases, as described in the paragraphs below.
Overall within the touristic domain, service composition is a typically addressed problem, requiring ongoing adaptation and co-development of new technology and approaches e.g., for touristic planning [13]. The creation of travel information services with spatial, social and temporal dimensions can be built already on Web 2.0 data, namely, as travel mashups [14] or a broker platform [15]. With the appearance of Linked Data and the global use of schema.org, the input coming from data processing components e.g., making decisions on yield management would be more and more present in service composition approaches.
The availability of the data is a basic enabler for the sufficiently high data quality, in particular, of the customer data, and in addition, the sharing of the data is essential for the modern marketing [16]. There is also too much of marketing data and information available to choose from for the consumer [17]. Our approach aims to leverage on all the richness of the available data, and at the same time employ the most relevant for the consumer data, information and content sources. The data is either available explicitly, or can be derived with the available algorithms e.g., for mining of user-profiles from social media, and the marketing content recommended based on this profile data [18]. The possibility to generate high-quality user profile data from online interactions has been demonstrated, in particular via a touristic website [19]. Another advantage of our approach is being able to analyze the data and content usage across multiple channels, which is essential for tracking and better understanding the customer journey [20]. Our design's aim is also to achieve the best choice outcome while minimizing the cost of the decision-making process, which is seen as essential for customer satisfaction [21].
Furthermore, consumers are more and more interested in communication via different (and multiple) channels. The ability to answer customer demands wherever they are, and using the channel and device of their choice, will make a huge impact in their experience and consequently in the business. The fact that customers want access to all the services [22] creates the necessity of an integrated strategy. Mobile services must be integrated in the business process, not seen as a separate endeavor. The latter, in particular, are a valuable source of personalized access, customer and business data, which should also be employed in a connected manner [23].
To demonstrate the importance of the mobile experience, Google [24] took a deeper look at users' expectations and reactions towards their site experiences on mobile devices. Most interestingly, 61% of people said that they would quickly move onto another site if they did not find what they were looking for right away on a mobile site. The bottom line is that without a mobile-friendly site (that could be extended to mobile access to services) one will be driving users to the competitors. Having a great mobile site is no longer just about making a few more sales. It has become a critical component of building strong brands, nurturing lasting customer relationships, and making mobile work.
Regarding the mobile experience, many customers prefer interactions via online channels rather than face to face, a fact that is currently supported by the increased number of mobile devices within the customer's reach. A combination of various online platforms and mobile applications including social networks would increase the opportunity for consumers to purchase goods and services online [25]. An appropriate mobile strategy integrated into the online multichannel world will also benefit the management and customer service for the tourism business. Further, the technology would implement the known good practices of communications with the customers via the mobile phone channels [26]. Given the requirements described above, our solution is integrating both mobile and social media interaction modes, in a holistic manner.

Approach and Enabling Technology
Our approach to the service marketing and integration aims to close the gaps of the approaches introduced above and create solutions for the dynamic integration of touristic web services on the fly that will enable the creation of enhanced integrated services. Consequently, based on linked data, the service offers will be distributed via the most appropriate channels. We have defined a system architecture ( Figure 1), and data management requirements regarding the components of the system.
As visible in Figure 1, the solution takes in the closed data from touristic service providers as well as linked (open and closed) data, and forms valid touristic service packages. The information processing and representation are facilitated by schema.org-supporting plug-ins that implement Knowledge Graphs in content management systems, such as, e.g., in Wordpress and Drupal. After this, these packages are disseminated: both in an individual (tailored to a user profile) way, or via social media, pre-selected to tailor the typical audience of the touristic service provider, or, the kind of audience this service provider wants to receive. There are two main components involved in the definition of the relevant packages. First, SPARQL Query Builder is responsible to construct a query to be executed on the knowledge graph stored in a Triplestore through a SPARQL endpoint. The query will be constructed based on the provided profile. For an individual package, a user profile will be collected through a mobile application, while for a group package, a profile will be collected through a CMS (e.g., WordPress) plugin. Technically, a profile consists of a list of types of services or names of service providers or keywords which are relevant to users. The second main component is Package Builder which is responsible for providing the relevant package to users. A list of services will be obtained from the knowledge graph through the SPARQL Query Builder component. The services will be ranked according to the defined profile. For example, if a user requested restaurants closer to his/her hotel, then the list of obtained restaurants will be ordered based on the distance from the hostel to every restaurant. If the user requested outdoor activities that are open until late in the night, then the providers will be listed based on closed time.
We support automatic generation, clustering and packaging of semantically annotated touristic service offers from a variety of sources. The Application Programming Interfaces (APIs) and components perform information extraction, clustering and publishing in order to: • obtain the extracted data in a linked data format, (semi-)automatically associating metadata; • generate service representations in a linked data format according to ontological models; • interlink, cluster, package and disseminate services in an automatic way; • provide a semantic service and an online interface for easy publishing and access to the above-mentioned functionalities.
The effort already has run pilots, such as with the Touristic Association of Innsbruck, and already implemented semantic dissemination support by implementing schema.org support on their website (Website of Touristic Association of Innsbruck: http://www.innsbruck.info) and publishing the touristic data of the Innsbruck region as linked open data [27]. In cooperation with the province of Salzburg (SalzburgerLand), the touristic data of Salzburg are published in a linked open data format with schema.org, and are usable (SalzburgerLand Data Hub: http://data.salzburgerland.com) in particularly as the corresponding Knowledge Graph (Press release of the region of Salzburg on the Knowledge Graph: https://newsroom.salzburgerland.com/daten/knowledge-graph/), and the appropriate mobile app support exists. We are deploying our solution also with direct touristic service providers: starting with hotels, and extending to further touristic services. Including relevant touristic services in the platform, we have made a classification of all types of touristic service providers. These are extracted from the websites of Austrian regional touristic associations. The complete list of these types, accompanied by real-life examples of Innsbruck's touristic service providers is presented in Table 1. Herewith, the service providers of all these types have a straightforward marketing contents provisioning via our system. As schema.org became a de-facto standard supported by Google, Yandex, Bing and Yandex, our solution can be applied globally with increasing ease.

Implementation
For the technical format for information collection, processing and modeling, we heavily rely on linked data and the de facto standard schema.org, and are involved in co-defining its extensions, particularly at W3C. We use schema.org for modeling and communication of touristic service packages, including its actionable components for the booking part (see Figure 2 for our implementation).
For the knowledge repository construction, the datasets from Innsbruck Tourism, Kognitiv Hotels and Offers and Salzburg Tourism were put into a semantic format, which is used for linking data in the tourism area. Through the Redlink platform (Redlink Semantic Platform: https://redlink.co/ semantic-platform), we provide these data in a cloud and linked data -empowered data repository for storing the semantically structured data. Redlink's Content Analysis solution offers fact extraction, topic classification, and fact linking from textual and media documents in different languages. Users are able to send text and get back facts and entities that have been identified automatically. In addition, the service provides references to existing datasets (public datasets like Dbpedia and Freebase, as well as custom user-created vocabularies like a product database) about the entities identified in the content. Users can create their own custom configurations with some simple steps, e.g., for adding advanced natural language processing features or commercial datasets.
Based on open source technology, Apache Marmotta (Apache Marmotta: http://marmotta.apache. org), we have extended the querying capabilities towards some requirements (e.g., implementing the GeoSPARQL extension [28]). The data can be accessed by the partners via interfaces using REST/Web Service access and via SPARQL. Through the Redlink platform, it can be enriched via annotation engines and thus linked to a greater set of entities available in the LOD world. In addition to this, several datasets from "Open Government Data" in Austria for the respective regions, have been added. The resulting dataset has amounted to 2200 lodgings, 9834 locations, and 2114 offers-all available for construction of the package.
Rules for automatic information dissemination, particularly on social media, in the context of Online Communication and Marketing Tool (ONLIM) (http://onlim.com) component, have been modeled, and the ontology to automate the online communication has been developed and published online [29]. ONLIM is an online tool based on semantic technologies that aim to facilitate managing various information dissemination channels (such as chatbot systems, social media platforms) by means of publishing posts and tracking the feedback given by other users. ONLIM supports several social media platforms such as Facebook, Twitter, YouTube, LinkedIn, Xing and Flickr. It also allows users to schedule their posts to enable more effective social media management and marketing. In contrast to other similar tools, ONLIM also has an automatic post generation feature that creates posts for publication in social media from external sources. Content, data and services that are annotated with schema.org can be converted to the posts by ONLIM automatically. In addition, these posts can be automatically forwarded to all appropriate channels to disseminate the information about products and services, as well as their packages. The modeled rules for information dissemination perform content adaptation, e.g., addressing that the content is expected in various modes at various channels, as well as communication adaptation e.g., posts should be more frequent on some platforms rather than on the others.
We furthermore have compiled a list of third-party touristic services and their possible uses within the TourPack platform. Additionally, various APIs available for TourPack have also been mutually introduced to identify possible connection points for the TourPack platform implementation.
Further, we had made an analysis and visualization of the data collected from various machine and human sources, and forming the packages out of this data. The packages have been created in a way to take into account the different types of touristic service provided (e.g., a user would likely need one dinner and one hotel room and not two or more, etc., see the created detailed split in Table 1), the time-based availability, user profile and further relevant criteria.
Research on data quality, and particularly, the data in schema.org format, has been another one of our major occupations. The schema.org data provides a very large source of open semantically-structured data, which is not always of a sufficiently high quality to generate the packages directly applicable for the bookings, e.g., it can state that there are more hotels in Tyrol, an Austrian province, than in Austria [30].
For the presentation layer, the user interfaces are being rendered in two forms: social media platforms postings and mobile app postings. In order to define meaningful pilot applications, several possible use case scenarios have been defined to fulfill the requirements of the project consortium and focus on the target user groups. Along with the definition, several application mockups have been created iteratively using state of the art technology. These interfaces include web-based, as well as mobile, use-cases and were carefully evaluated by stakeholders and project partners.
The social media postings are automatically generated from the data using the ONLIM component, and are content-wise addressed to audiences which the touristic service provider already has on its social media platforms e.g., as subscribers, or as the ones it wants to attract. An example of such automatically generated touristic services packages is displayed in Figure 3: the offered social posts are displayed on the right-hand side, and can be scheduled in a calendar. On the other hand, mobile apps can generate personalized messages, and technically can be implemented with varying mobile platforms and varying user interfaces. The key possibility here is to deliver a personalized user experience to the user. The implemented iOS-based example of the mobile app client, as described up to now, is depicted in Figure 4. It allows us to search by date and select various services, and group them into individualized packages.

Evaluation
We have conducted field trial user studies with the set up and procedure described as follows. The goals of the evaluation study were: • Assess the technical and business feasibility of the approach and implementation, • Identify the key challenges observed by the human evaluators w.r.t. the linked data usage, as well as the system in general.
Evaluation set up has been, correspondingly, technical and empirical. From the technical stand-point, the system is implementable, as it has been implemented with the technology applied. The end-consumer related business use cases are also available and are implemented by companies Kognitiv and m-Pulso with their pilot customers. The use cases are as follows: • During the course of the project Kognitiv analyzed the possibility to package accommodation related services during the accommodation booking process. While it became feasible from a technology perspective the main blocker to realize/offer a true package is legal and business-related packaging requires a travel agency license in order to combine the different offerings into a single offer for the booking guest. However, to offer clients added value, the technology developed as part of TourPack was incorporated into the Kognitiv extranet and its booking engine in a more loosely coupled way. Namely, with the TourPack technology, the hotelier can select available context-relevant information and offerings that he or she wants to promote during the booking process, and this increases sales and revenues, which is positive. Figure 5 displays the visualization of the services that are possible to package with the booking of the accommodation for the user.

•
As m-Pulso already has a number of customers in the touristic sector (hotels, spas, ski resorts, and others), the delivered contributions are directly relevant for its product portfolio. The main goal of most customers in this area is to generate more bookings and to strengthen their customer loyalty. While the existing product achieved very good results in the second area, the generation of more bookings is still a critical issue. The developed packaging service developed fits this need perfectly and will be rolled out to several pilot customers to be evaluated within larger user groups in a production environment. The human evaluators have been students at the University of Innsbruck at the Master study level, who have been taking "Semantic Web" course at the University of Innsbruck, but not possessing prior experience of the TourPack project. The TourPack system has been presented and explained to them during the course, also the related presentations and papers have been provided for the study at home. At the end of the course, the following question was posed to the students: "Consider a touristic service packaging application, creating bookable touristic service packages basing on schema.org data and other (linked) open data. Name at least six main challenges that you expect to appear in building and deployment of such an application". There has been 33 responding students to this question: 32 male and 1 female, mostly Austrian, but also some from abroad. The students have provided their replies in writing, indicating up to six challenges each, in a free form, providing their own ideas and descriptions.
In Table 2, we present the type challenges most frequently identified by the students, summarized from their texts with human manual post-processing. We indicate how many mentions each of them has had, as well as the corresponding detailed remarks. This table provides an overview of the most important issues to address in future research and development, in order to make the usage of distributed linked data and schema.org data broader.

Group of Challenges Mentions Remarks, Examples
Handling heterogeneity and quality of external data 35 The mentions contain the issues of incomplete data, incorrect data, inconsistent data, and challenges with combining different data sources. Needs for data clearing, verification and interlinking are emphasized. Generating correct semantic annotations 27 The mentions indicate the difficulties connected with the need to generate semantic annotations: finding the vocabularies and terms to use (linked data vocabularies and schema.org), producing correct annotations, choosing the right language dialects. Service-related challenges 20 The mention lists the issues associated with the development of service technology applications: usable service annotations, frameworks and deployment choices, coordination of communication, selection and discovery of services for the composition, quality of service annotation and service interoperability. Ontology and data versioning 11 The mentions refer to the characteristics of dynamics of the schema and data, and the necessity to cope with the change: updating the data according to the newest ontology and real-world developments, synchronizing different versions of the datasets, dealing with the outdated data. Data management issues 10 The mentions refer to the challenges of set-up and management of data infrastructures: for data acquisition, storage, querying. Take up issues 8 Take up challenges includes the need to for the service to gain the critical user mass, as well as the possible hindering factors due to the early stage of the semantic technology use in the sector (annotations, query interfaces may be lacking or not enough broadly spread). Missing items in vocabularies and schema.org 6 These mentions point out that some needed for the semantic annotation concepts and properties may be not available in the available ontologies. Security and misuse 6 Aspects of transfer of the correct data, safe payments, authentication, authorization. User interface issues 6 The mentions refer to the complexity of design of the efficient user interface, and connecting it correctly to the changing in time APIs, data, contents, and services. Pricing and payment issues 4 The aspects of correct setting of the price and distribution of the payments, under the complex conditions of international settings and multiple stakeholders. Internationalization 3 Support of different natural languages is a necessity. Legal factors, governance 2 Any other law-related aspects of the execution of the TourPack application. Hardware and mobile issues 2 The requirements for efficient lower layer technical infrastructures are mentioned, such as computational power, mobile communication/network availability.

Conclusions
We have presented an approach to disseminating touristic marketing content, particularly service packages, based on Knowledge Graphs as an enabler for tourism businesses and tourists. The businesses can be productive by providing new experiences and finding new direct dissemination and booking channels, while leveraging their own existing touristic contents.
The approach has been implemented and evaluated. The results indicate the general feasibility of the solution, and a need for further improvement of the state of the art for enabling such applications. Three types of challenges have been most frequently mentioned in the feedback (see Table 2), and deserve further development, namely: (1) handling heterogeneity and quality of external data, (2) generating correct semantic annotations, (3) service-related challenges. A possible approach to resolving these challenges is described below.

Handling Heterogeneity and Quality of External Data
An approach to handle heterogeneity challenges is to represent data from different sources uniformly. First, information from those sources will be analyzed and modeled into a uniform representation, namely vocabulary, which contains classes (including their properties) and how they are related to each other. Then, data format and structure from every source will be inspected and aligned with the vocabulary to produce a data mapping. This mapping is then consumed by a wrapper that is responsible to generate relevant annotations. This approach has been applied to annotate a variety of tourism-related information in the regions of Mayrhofen, Seefeld, and Fügen [31]. A set of 1,6 million triples was generated by March 2017 and accounted for numerous topical information: accommodations and offers, events, infrastructures, organization, press-release including blog posting, information related to the region, and ski-areas. The generated annotations have been successfully identified by search engines, namely Google, indicated by rich snippets produced in its search results.
A solution to ensure high-quality annotations is by providing a guideline on how to enter data into a system correctly such that complete annotations could be generated. First, essential properties of the vocabulary will be categorized as "required" or "recommended". Every required property needs to be associated with a correct value and therefore must be provided through the relevant data entry field. For example, an event must have a postal address value for its location, and the guideline will dictate that this address cannot be empty whenever a new event is entered into the system. The same case will be provided for every recommended property, where the guideline will recommend possible values to ensure the generated annotations will not raise errors or warnings messages.
On the level of linked data and vocabulary selection, solutions for automated assistance in the selection of the most "appropriate" vocabulary terms to annotate are being developed [32].

Generating Correct Semantic Annotations
While schema.org became a very practically important development around linked data on the Web, the practitioners still have difficulties with producing correct schema.org annotations. Hence, the data quality of structured content on the web suffers. With the work on semantify.it [33], a platform that supports users in the schema.org annotation creation and publication process is provided. Based on domain-related template files (Domain Specifications) semantify.it provides form-like annotation editors and saves, if the user wants, the resulting annotation files in a database to then be published to websites over a JavaScript snippet or through a plugin for content management systems. Besides that, semantify.it offers a validation functionality to check for semantic validity against predefined Domain Validation files as described in [34].

Service-Related Challenges
Automatic service packaging is, as mentioned before, highly dependent on available (web) services. To use a service in an automated way the semantics of the service, of its endpoint, parameters and return values, has to be known. Schema.org offers a means to annotate web services called schema.org actions. The vocabulary features properties for the status of the action, the executing agent the target endpoint, containing properties for input parameters, the result of the action and many more. If annotated web services can be found, the determination of their relevance and the combination of several web services becomes much easier or feasible in the first place. Contrary to the distribution of schema.org amongst static data, web services are not yet sufficiently annotated with schema.org.
Further, with content and data, there are a few particularities hampering their potential (re-)use in the marketing data value chain. Namely, selection and reuse of the content and data is possible only manually, unless the license information is published in a machine-readable form. Realization of machine-readable, semantic representations of data licensing, in particular, is still not fully solved, and semantic formats for licensing data, as well as the tools processing data license annotations, are under-defined or non-existent. Semantic standards for licenses are being developed right now, and include efforts such as ODRL (Open Digital Rights Language (ODRL): https://www.w3.org/ ns/odrl/2/), and a project that is deriving extended semantic models and tools for data licensing is DALICC (Data Licenses Clearance Center (DALICC): https://dalicc.net) [35]. Future work would also include models for content crowdsourcing, customer engagement, as well as quality evaluation metrics. Scalability deployment aspects related to hardware and networks are to be researched.
Funding: This work has been partially funded by the Austrian Research Promotion Agency (FFG) in the project TourPack within the program "Future ICT", as well as in the project WordLiftNG within the Eureka, Eurostars Programme.