DTCRSKG: A Deep Travel Conversational Recommender System Incorporating Knowledge Graph

: In the era of information explosion, it is difﬁcult for people to obtain their desired information effectively. In tourism, a travel recommender system based on big travel data has been developing rapidly over the last decade. However, most work focuses on click logs, visit history, or ratings, and dynamic prediction is absent. As a result, there are signiﬁcant gaps in both dataset and recommender models. To address these gaps, in the ﬁrst step of this study, we constructed two human-annotated datasets for the travel conversational recommender system. We provided two linked data sets, namely, interaction sequence and dialogue data sets. The usage of the former data set was done to fully explore the static preference characteristics of users based on it, while the latter identiﬁed the dynamics changes in user preference from it. Then, we proposed and evaluated BERT-based baseline models for the travel conversational recommender system and compared them with several representative non-conversational and conversational recommender system models. Extensive experiments demonstrated the effectiveness and robustness of our approach regarding conversational recommendation tasks. Our work can extend the scope of the travel conversational recommender system and our annotated data can also facilitate related research. vali-dation, C.C., Y.L. and G.X.; formal analysis, H.F.; investigation, H.F.; resources, H.F.; data curation, H.F.; writing—original draft preparation, H.F.; writing—review and editing, Y.L.; visualization, H.F.; supervision, Y.L.; project administration, H.F.; funding acquisition, H.F. All authors


Introduction
With the information explosion, it is difficult for a user to find travel information that is in line with their interests and suitable for their travel plan to enjoy high-quality travel, which is driving an urgent need for a personalized travel recommender system to provide more ingenious travel suggestions and contribute to the success of the service provider. Traditional travel recommendation methods are mainly divided into two categories: collaborative filtering (CF) recommendations [1] and content-based recommendations [2]. The CF methods mostly use travel interaction records and the data sparsity limits their performance. On the other hand, the content-based recommendation methods can alleviate the data sparsity problem by using richer auxiliary information, such as textual descriptions, content tags, and social and geographical information. Recently, more attention has been paid to deep neural networks, such as DeepMF [3], NCF [4], WideDeep [5], and DeepFM [6].
However, existing travel recommender systems are mostly based on static recommendation models, which primarily predict a user's preference toward a travel service the current and detailed preferences of the user, respond to the feedback by users on the suggestion, and provide explanations for the recommended item. Unfortunately, there has been little work done on travel conversational systems.
Our research aimed to propose a deep conversational recommender system incorporating travel knowledge graph (TKG) that can complete a dynamic context-based travel recommendation task. Dialogue is introduced as a supplement to an interaction sequence, alleviates data sparsity, captures current user preferences, or even finds user preferences. Our deep model represents user preferences by encoding historical conversations and historical interaction sequences. In addition, we incorporated knowledge to make the conversation recommendation process more fluid and fit the travel scenario with spatiotemporal constraints. Meanwhile, we built a travel conversational recommendation dataset in Chinese to facilitate our study since the rich semantic content of Chinese dialogues can provide many clues for feature extraction in a deep approach. Overall, we provide resources and baseline models for using NLP technology to travel recommendations. To our knowledge, our work is the first to produce a deep travel conversational recommendation system.
In summary, our work has the following contributions: We constructed two human-annotated Chinese datasets for the travel conversational recommender system. We provided two linked data sets, namely, interaction sequence and dialogue data sets. The usage of the former data set was to fully explore the static preference characteristics of users based on it. At the same time, the latter identified the dynamic changes in user preference from it. Both datasets were collected in Chinese since the rich semantic content of Chinese dialogues can provide many clues for feature extraction in a deep approach. The dataset will be released to the public for research purposes. We fill the current research gap by incorporating travel knowledge graph into a deep conversational recommender system. In this area, our research is one of the first models to model user preferences by encoding historical conversations and historical interaction sequences. We conducted a detailed comparison and analysis between our model and the current SOTA models in terms of various challenges the travel recommender system faces. The discoveries from the study can promote the development of travel recommendations.
In a system-driven CRS, the system mainly asks questions or offers options about user preferences to recommend. This requires the user to appropriately point out the appropriate query candidates and be familiar with each item they want. Mixed-driven CRS allows both the system and user to lead the conversation by asking questions or via chit-chat. The system constantly interacts with the user in a multi-turn conversation while discussing the different topics (e.g., greeting or philosophy) to lead to the final recommendation. The natural language-based response requires the generated language to be proper, correct (even fluent), and meaningful, involving helpful information about the recommended target. Unlike the previous two, a user-driven CRS focuses on scenarios that require query understanding where the user has explicit claims or query objectives.

Personalized Travel Recommender System
A personalized travel recommender system needs to capture individual travel preferences accurately. However, the data involved in personal travel preferences are sparse and vulnerable to change in the current context. Recently, in addition to traditional machine learning methods [19][20][21], personalized travel recommender systems have gradually moved to a deep learning approach [22][23][24][25]. However, these deep learning approaches are mostly static models that extract highly abstracted features from historical interaction data. Therefore, these approaches are still unable to solve the problems associated with data scarcity, cold starts, and dynamic preferences in travel recommendations.
Along with the breakthrough and rapid development of NLP technology, there is a trend in tourism that moves away from the earlier question and answer systems [26] to conversational systems [27,28]. CRS has the potential to become the new desired travel recommendation framework due to its ability to dynamically elicit or discover current user preferences, as was demonstrated early on by [29]. Unfortunately, they did not have the full advantage of advanced deep learning methods for travel conversational recommendations then.

Knowledge-Graph-Based Recommender System
In recent years, knowledge-graph-based recommender systems have attracted considerable interest since they can alleviate data sparsity, mitigate cold starts, and provide a better understanding of recommendations with the knowledge graph as side information. Existing methods can be roughly divided into path-based methods [30][31][32], embeddingbased methods [33][34][35][36][37], and hybrid methods [38,39]. Path-based methods utilize the connectivity similarity defined in heterogeneous information networks to enrich user or item representation. Embedding-based methods obtain more precise entity representations by leveraging the information in a graph structure via KG embedding. Finally, hybrid methods integrate the semantic representation of entities and relations and the connectivity information learned by the graph neural network framework [38]. However, KG-based recommender systems are unable to infer real-time interests due to their static property. Therefore, integrating knowledge into the conversational recommender system is a straightforward solution.

The Proposed Approach
In this section, we first formulate the personalized travel conversational recommendation task. Then we introduce our solution to this task.

Problem Formulation
Given a user u we assume that they have a historical interaction sequence which is a chronologically-ordered sequence of points of interests (POIs) that u has interacted with. Each POI may be a hotel, a restaurant, or an attraction. Each conversation consists of a list of utterances, which is denoted by where ut n is the utterance at the nth turn. Based on these basic concepts and notations, the task of a personalized travel conversational recommendation is defined as: given a user u, user historical interaction sequence P u , historical utterances {ut 1 , . . . , ut k−1 }, and associated entities from TKG, the target of the task is to predict the poi k that satisfies user u given the conversation context and their past interaction records.

Model Architecture
We proposed a Deep Travel Conversational Recommender System, abbreviated as DTCRSKG, to complete the aforementioned task. The architecture of the proposed model is illustrated in Figure 1. Inspired by TGCRS [15], we used two BERT-based modules to encode historical utterances and historical interaction sequences, respectively. In addition, we integrated knowledge into BERT to make our model deeply understand the underlying semantic information about user preferences contained in interaction sequences and conversations in the travel domain. In the following sections, we introduce the details of our model. n Based on these basic concepts and notations, the task of a personalized travel conversational recommendation is defined as: given a user u , user historical interaction sequence u P , historical utterances 1 1 { ,..., } k ut ut − , and associated entities from TKG, the target of the task is to predict the k poi that satisfies user u given the conversation context and their past interaction records.

Model Architecture
We proposed a Deep Travel Conversational Recommender System, abbreviated as DTCRSKG, to complete the aforementioned task. The architecture of the proposed model is illustrated in Figure 1. Inspired by TGCRS [15], we used two BERT-based modules to encode historical utterances and historical interaction sequences, respectively. In addition, we integrated knowledge into BERT to make our model deeply understand the underlying semantic information about user preferences contained in interaction sequences and conversations in the travel domain. In the following sections, we introduce the details of our model.  Figure 1. The architecture of the proposed DTCRSKG model. It contains two major parts, i.e., the TK-BERT4Rec part that learns user interaction representation and the TK-BERT part that learns dialogue representation.

Dialog Encoding
We utilized a travel-knowledge-infused BERT (TK-BERT) to encode historical utterances in the dialog encoding module to capture more information about user preferences in the interactive dialogue. TK-BERT was derived from K-BERT [40] and pre-trains BERT with TKG. As shown in Figure 1, TK-BERT consists of a knowledge layer, an embedding layer, a matrixing layer, a mask transformer encoder, and a TKG. First, unlike K-BERT, only key entities in the input utterance sentence

Dialog Encoding
We utilized a travel-knowledge-infused BERT (TK-BERT) to encode historical utterances in the dialog encoding module to capture more information about user preferences in the interactive dialogue. TK-BERT was derived from K-BERT [40] and pre-trains BERT with TKG. As shown in Figure 1, TK-BERT consists of a knowledge layer, an embedding layer, a matrixing layer, a mask transformer encoder, and a TKG. First, unlike K-BERT, only key entities in the input utterance sentence ut = {w 1 , w 2 , w 3 , . . . , w n } are selected to query their corresponding triples from the travel knowledge graph in the knowledge layer. Here, key entities are the ones corresponding to keywords in the sentence. A knowledge query can be formulated as: where KQuery is an abbreviation for the knowledge query operation and is a collection of queried triples. Then, these knowledge triples are injected into the input sentence by placing them in their corresponding positions and a sentence tree is generated. The sentence tree can have multiple branches and its depth is set to 1. The structure of a sentence tree is illustrated in Figure 2.
layer. Here, key entities are the ones corresponding to keywo knowledge query can be formulated as: where KQuery is an abbreviation for the knowledge is a collection of querie knowledge triples are injected into the input sentence by placing t ing positions and a sentence tree is generated. The sentence branches and its depth is set to 1. The structure of a sentence tree Next, the embedding layer was designed to convert the sent ding representation, which is the sum of token embedding, positi ment embedding. Among them, the token embedding is obtaine table. Furthermore, to maintain the sentence's structural informa kens), hard-positioning is replaced by soft-positioning [40] in t Taking the sentence tree in Figure 2 as an example, 21 r an tween 2 w and 3 w , and 51 r and 51 w with 52 r and 52 w are and 6 w . If the original hard-position sequential encoding s lowed, the input sentence is semantic structure of the sentence information can be mainta BERT, segmentation embedding is used to identify different sente tences are included. Next, the embedding layer was designed to convert the sentence tree into an embedding representation, which is the sum of token embedding, position embedding, and segment embedding. Among them, the token embedding is obtained via a trainable lookup table. Furthermore, to maintain the sentence's structural information (i.e., the order of tokens), hard-positioning is replaced by soft-positioning [40] in the position embedding. Taking the sentence tree in Figure 2 as an example, r 21 and w 21 are inserted between w 2 and w 3 , and r 51 and w 51 with r 52 and w 52 are inserted between w 5 and w 6 . If the original hard-position sequential encoding scheme of BERT is followed, the input sentence is changed to {w 1 , w 2 , r 21 , w 21 , w 3 , w 4 , w 5 , r 51 , w 51 , r 52 , w 52 , w 6 }, resulting in a wrong semantic situation where w 51 is the subject of r 52 . With the soft-position sequential encoding scheme, the position ordinal number of r 52 is replaced by six instead of ten and the position ordinal number of w 3 is replaced by three instead of five. In this way, the original semantic structure of the sentence information can be maintained. Finally, similar to BERT, segmentation embedding is used to identify different sentences when multiple sentences are included.
After using soft positioning, the position numbers of both r 51 and r 52 are six, which makes them close in the calculation of self-attentiveness, but in reality, they may be unrelated. w 21 is only related to w 2 and not to w 51 or w 52 . Therefore, the representation of w 21 should not be influenced by w 51 or w 52 . On the other hand, the [CLS] labels used for classification should not bypass w 2 to get the information of w 21 because this will bring the risk of semantic change. To prevent false semantic changes, from the sentence tree, a matrixing layer (borrowing from the seeing layer [40]) is constructed to calculate a visible matrix (VM) that indicates whether there is a direct semantic association between each two hard-position-encoded symbols. The visibility matrix is shown in Figure 3, where, for example, w 2 is visible with w 12 and both w 3 and w 4 are not visible with w 12 . Last, VM is used to control the visible area of each token when implementing the mask self-attentive mechanism in the transformer encoder. After the mask transformer encoder, the system produces the final embedded representations of the input utterance sentence ut. w . Last, VM is used to control the visible area of menting the mask self-attentive mechanism in the transformer transformer encoder, the system produces the final embedded r put utterance sentence ut .

Sequence Encoding
To fully exploit the representation of a user's preferences fro historical interactions, in the sequence encoding module, a sequ model named TK-BERT4Rec was adapted to encode the user inte tail, we introduced BERT4Rec [41] as a base model that is essent mendation model based on BERT. Hidden representations in a s plored without the strict constraints of sequence order due to it representation capability. However, the model uses only behavi formation about items (e.g., category information for attracti learned information remains relatively limited. Similar to TKknowledge layer and modified the embedding layer of the trad prove the knowledge representation ability further. The role of t select the triples corresponding to the item entities in the intera knowledge graph. Similar to the dialogue sentence tree, an extend tree is generated by populating these knowledge triples with the of the interaction sequence. Since there is no sentence-like seman between the entity referents of the interaction sequence, but on between the words, each token of the interaction sequence tree is As shown in Figure 1, K-BERT4Rec includes four parts: a k bedding layer, transformer layers, and a TKG. Similar to a dial given a historical sequence u P and a TKG, the knowledge layer To make use of the sequential information of the input sequence

Sequence Encoding
To fully exploit the representation of a user's preferences from a limited sequence of historical interactions, in the sequence encoding module, a sequential recommendation model named TK-BERT4Rec was adapted to encode the user interaction sequence. In detail, we introduced BERT4Rec [41] as a base model that is essentially a sequential recommendation model based on BERT. Hidden representations in a sequence can be fully explored without the strict constraints of sequence order due to its bidirectional encoding representation capability. However, the model uses only behavioral information, not information about items (e.g., category information for attractions), and the potential learned information remains relatively limited. Similar to TK-BERT, we added a new knowledge layer and modified the embedding layer of the traditional BERT4Rec to improve the knowledge representation ability further. The role of the knowledge layer is to select the triples corresponding to the item entities in the interaction sequence from the knowledge graph. Similar to the dialogue sentence tree, an extended interaction sequence tree is generated by populating these knowledge triples with the corresponding positions of the interaction sequence. Since there is no sentence-like semantic structure relationship between the entity referents of the interaction sequence, but only an order relationship between the words, each token of the interaction sequence tree is numbered sequentially.
As shown in Figure 1, K-BERT4Rec includes four parts: a knowledge layer, an embedding layer, transformer layers, and a TKG. Similar to a dialogue encoding module, given a historical sequence P u and a TKG, the knowledge layer outputs a sequence tree. To make use of the sequential information of the input sequence, we summed the corresponding item embedding and positional embedding as the output of the embedding layer. In the transformer layer, we stacked hidden representations in L layers together into a representation matrix to simultaneously compute the attention function in all positions. Each transformer layer consists of a multi-head self-attention sub-layer and a position-wise feedforward network. The former linearly projects the representation matrix into subspaces and then applies the attention function in parallel to produce the output representation. The latter handles non-linear projections through two affine transformations with a Gaussian error linear unit activation in between. After L layers that hierarchically exchange informa-tion across all positions in the previous layers, the system produces the final embedded representations for all POIs in P u .

Prediction
The representation e u of user u is where e sem u is the embedding to represent the historical interaction sequence that is produced by the sequence encoding module and e dem u is the embedding to represent the historical utterance that is produced by the dialogue encoding module. Given the user representation, the probability that an item poi would be recommended to a user u is where e poi is the item embedding for poi through the embedding layer. All the POIs are ranked according to the softmax value. The item poi with the largest probability value is selected for recommendation.

Data Curation
Several datasets have been released to facilitate the study of conversational recommender systems in recent years. Among them, ReDial [12], GoReDial [42], DuRecDial [14], and TG-ReDial [15] were created by human annotation with pre-defined recommendation targets. Unfortunately, none of them are to do with tourism. It is worth noting that Multi-WOZ [43], CrossWOZ [44], and KdConv [45] are dialogue datasets that are related to the field of tourism. However, they all lack well-labeled user interaction sequences.
To fill the current research gap, we developed two conversational recommendation datasets for tourism named CwConvRec and KdConvRec. In our dataset, each conversation can belong to only the tourism domain. To generate the conversation, we obtained all single-domain conversations from CrossWoz and KdConv. To simulate the recommendation scenario, we also extracted POIs (e.g., attractions) from CrossWoz and KdConv to form a visiting record. The entire visiting record was split into several coherent visited subsequences, where each of the POIs was ensured to share at least one common feature (e.g., categories) with another. Since the original dataset lacks feature information of POIs, categories were introduced as features in this study. The categories included ancient ruins, historical buildings, museums, art galleries, parks and gardens, wildlife parks, theme parks, and natural landscapes. Each visited subsequence corresponds to a unique conversation, and each user participates in several conversations. To build knowledge-graph-driven datasets, CwConvRec and KdConvRec needed to be able to provide turn-level knowledge annotations. Although CrossWoz does not contain any ready-made knowledge annotations, it has a domain database with the POIs attribute fields and their attribute values. Therefore, we first constructed an attribute-graph-based knowledge graph for CwConvRec from the travel database in CrossWoz, by combining the poi, attribute, and attribute value into a triplet. Then, we matched entity mentions in a given conversation through a mention dictionary that can be represented as two-tuple: where M = {m 1 , m 2 , . . . , m k } is the set of all mentions in the knowledge graph already obtained above and E = E m 1 , E m 2 , . . . , E m k is the set of entities corresponding to the mentions in M. If the term obtained by splitting the conversation utterance precisely matches a mention in the dictionary, we take it as a mention candidate. Each identified mention and its associated entities form a set of mention-entity pairs. Last, we evaluated the probability of each link from a mention to an entity by computing the weighted sum of features of each possible mention-entity pair. The features include the entity's name length, the link's a priori probability, and the entity relatedness. Since the knowledge form in KdConv contained both unstructured text (e.g., information about the attraction) and structured graphs (e.g., Forbidden City-Surrounding Attraction-South Luogu Lane), the tourism knowledge graph in KdConvRec was directly inherited from KdConv. In the quality control process for the human-annotated data, each utterance was assigned an annotator and an inspector. We developed a unified annotation specification before annotation to ensure the consistency of the data. Every annotator must perform a real-time inspection and every inspector must complete a full sample inspection and sampling inspection. The detailed statistics of CwConvRec and KdConvRec are shown in Table 1, and examples for the two datasets are illustrated in Figures 4 and 5. of features of each possible mention-entity pair. The features include the entity's name length, the link's a priori probability, and the entity relatedness. Since the knowledge form in KdConv contained both unstructured text (e.g., information about the attraction) and structured graphs (e.g., Forbidden City-Surrounding Attraction-South Luogu Lane), the tourism knowledge graph in KdConvRec was directly inherited from KdConv. In the quality control process for the human-annotated data, each utterance was assigned an annotator and an inspector. We developed a unified annotation specification before annotation to ensure the consistency of the data. Every annotator must perform a real-time inspection and every inspector must complete a full sample inspection and sampling inspection. The detailed statistics of CwConvRec and KdConvRec are shown in Ta

Baselines
To evaluate the effectiveness of the proposed approach, we compare it with the following state-of-the-art baselines.
-Popularity: It ranks items according to popularity measured by the number of interactions. -timeSVD [46]: This model encodes both users and items with low-rank vectors using matrix decomposition and considers that user preferences may change over time as well. It is a dynamic matrix factorization-based recommendation model. -SASRec [47]: This model adopts the transformer architecture to encode user interaction history without using conversation data. It is a transformer-based sequential recommendation model. -BERT4Rec [41]: It adopts the deep bi-directional transformer architecture to encode the user interaction history without using conversation data. It is a BERT-based sequential recommendation model. -TGCRS (SASRec+BERT) [15]: This model adopts SASRec to encode user interaction history and BERT to encode conversation data. This is currently the state-of-the-art conversational recommender system model.

Evaluation Metrics
In this study, we adopted NDCG and hit rate as evaluation metrics for recommendation performance.
-Normalized discounted cumulative gain (NDCG) [48]: This matrix is commonly used as an evaluation indicator of the ranking results to evaluate the accuracy of the ranking; the NDCG score has also been widely used in evaluating recommender systems.
A recommender system usually returns a list of items for a user, and assuming the list length is K, the gap between the sorted list and the user's real interaction list can be evaluated with NDCG@K, where a higher score denotes better performance. -Hit rate: Like [47], we calculated the hit rate, which represents the fraction of times that the ground-truth next item was among the top item list. The proportion of test cases that have the correctly recommended items in a top K position in a ranking list can be evaluated with HR@K. A higher score denotes better performance.

Baselines
To evaluate the effectiveness of the proposed approach, we compare it with the following state-of-the-art baselines.
-Popularity: It ranks items according to popularity measured by the number of interactions. -timeSVD [46]: This model encodes both users and items with low-rank vectors using matrix decomposition and considers that user preferences may change over time as well. It is a dynamic matrix factorization-based recommendation model. -SASRec [47]: This model adopts the transformer architecture to encode user interaction history without using conversation data. It is a transformer-based sequential recommendation model. -BERT4Rec [41]: It adopts the deep bi-directional transformer architecture to encode the user interaction history without using conversation data. It is a BERT-based sequential recommendation model. -TGCRS (SASRec+BERT) [15]: This model adopts SASRec to encode user interaction history and BERT to encode conversation data. This is currently the state-of-the-art conversational recommender system model.

Evaluation Metrics
In this study, we adopted NDCG and hit rate as evaluation metrics for recommendation performance.
-Normalized discounted cumulative gain (NDCG) [48]: This matrix is commonly used as an evaluation indicator of the ranking results to evaluate the accuracy of the ranking; the NDCG score has also been widely used in evaluating recommender systems.
A recommender system usually returns a list of items for a user, and assuming the list length is K, the gap between the sorted list and the user's real interaction list can be evaluated with NDCG@K, where a higher score denotes better performance. -Hit rate: Like [47], we calculated the hit rate, which represents the fraction of times that the ground-truth next item was among the top item list. The proportion of test cases that have the correctly recommended items in a top K position in a ranking list can be evaluated with HR@K. A higher score denotes better performance.

Performance Comparison and Analysis
In this study, we chose K = 1, 5, and 10 to illustrate the different metrics results at K. The result of the evaluation on the CwConvRec dataset and the KdConvRec dataset are presented in Tables 2 and 3, respectively. For each method, the results are obtained on the best model.   Tables 2 and 3, we can see that the performances of the conversational recommender system models significantly outperformed any other non-conversational models, especially regarding the NDCG and hit rate of the recommendation task. This was thanks to the ability of these models to take full advantage of both the historical interaction sequence and historical utterances by combining the merits of the BERT part and the sequential recommendation part. Our proposed DTCRSKG model achieved better performance compared with the TGCRS model on almost all measurements. On the one hand, we used the BERT4Rec model with better performance on sequence recommendation to mine deeper behavioral relationships, and on the other hand, we incorporated the tourism knowledge graph into the BERT model so that the encoding of the conversation contained more knowledge in the travel domain and improved the understanding and representation of the conversation.

As shown in
However, the improvement degree of our model was lower than expected and there was even a slight deterioration in performance in NDCG@10. Maybe this was due to the relatively short length of the interaction sequence in the tourism dataset, and the implicit information contained in the historical interaction sequence was also minimal. It is not easy to guarantee the correctness of long sequence ordering. Meanwhile, the knowledge graph may still bring noise that interferes with the user features represented by conversation encoding. We take the sentence tree in Figure 6 as an example. We can find that the triple for the [西湖 (West Lake)] entity is {西湖 (West Lake), 位于 (locate), 杭州 (Hanzhou)}. However, if the West Lake mentioned in the conversation is located in Fuzhou instead of Hangzhou, then the knowledge introduced becomes noise.

EER REVIEW
11 of 17 Figure 6. A sentence tree example. The triple with "West Lake" in this sentence tree is: "West Lake-"locate"-"Hangzhou". However, in fact, we are talking about the West Lake in Fuzhou.

Ablation Experiments
To better understand the proposed DTCRSKG model, namely, TK-BERT+TK-BERT4Rec, we did several sets of ablation experiments using our datasets with the metrics mentioned above. The results are shown in Tables 4 and 5. We found that the model BERT+BERT4Rec was improved regardless of which encoding module incorporated the knowledge. Our proposed TK-BERT+BERT4Rec enhanced the performance to a greater extent than the model BERT+TK-BERT4Rec due to there being more noise in the latter. Among the tested models, the DTCRSKG produced the most improvement.

Case Analysis
We used the RASA chatbot as a carrier for the case analysis. The chatbot receives input sent by the user and will display a recommendation in the form of text. The system is executed in several steps: (1) the user sends text to the chatbot; (2) after receiving the text from the user, the NLU (natural language understanding) component identifies the user's intention and transfers the data for processing; (3) if the user intends on requesting for an attraction recommendation, the DM (dialog management) component will run the recommendation model and the NLG (natural language generation) component displays the recommendation result; (4) if there is no intent to ask for an attraction but only intends to ask for, say, time or location, the NLG component will generate an informational result for an attraction; (5) if the result satisfies the user or they have no other needs, the conver- Figure 6. A sentence tree example. The triple with "West Lake" in this sentence tree is: "West Lake"-"locate"-"Hangzhou". However, in fact, we are talking about the West Lake in Fuzhou.

Ablation Experiments
To better understand the proposed DTCRSKG model, namely, TK-BERT+TK-BERT4Rec, we did several sets of ablation experiments using our datasets with the metrics mentioned above. The results are shown in Tables 4 and 5. We found that the model BERT+BERT4Rec was improved regardless of which encoding module incorporated the knowledge. Our proposed TK-BERT+BERT4Rec enhanced the performance to a greater extent than the model BERT+TK-BERT4Rec due to there being more noise in the latter. Among the tested models, the DTCRSKG produced the most improvement.

Case Analysis
We used the RASA chatbot as a carrier for the case analysis. The chatbot receives input sent by the user and will display a recommendation in the form of text. The system is executed in several steps: (1) the user sends text to the chatbot; (2) after receiving the text from the user, the NLU (natural language understanding) component identifies the user's intention and transfers the data for processing; (3) if the user intends on requesting for an attraction recommendation, the DM (dialog management) component will run the recommendation model and the NLG (natural language generation) component displays the recommendation result; (4) if there is no intent to ask for an attraction but only intends to ask for, say, time or location, the NLG component will generate an informational result for an attraction; (5) if the result satisfies the user or they have no other needs, the conversation is over. The system interface is shown in Figure 7.
In Figure 8, we present a sample to illustrate how our model and TGCRS work in practice. For both models, the user ends up with satisfactory recommended attractions during the conversation given the user interaction sequence, dialogue history, and related knowledge. The last recommended attractions "Longqing Gorge (龙庆峡)" or "Yudu Mountain (玉渡山)" in S7 not only have the same category label "natural landscapes" with attraction "Hantuo Mountain (海坨山)" that is mentioned in S5 but also with the attraction in the user interaction sequence, such as "Hundred Flowers Mountain (百花山)" and "Horn Gorge Primeval forest park (喇叭沟森林公园)". It can be seen that both knowledge-aware conversational recommendation models could use more correct knowledge to meet the requirements of travel recommendations in a spatio-temporal constrained environment, such as categories, opening time, tour time, and address. Since the category information of the attractions in the interaction sequence is not well utilized, the TGCRs model easily misidentifies Horn Gorge Primeval Forest Park as a park-like attraction and, therefore, fails to quickly predict the user's preference for natural scenery in the absence of user preference information in the current historical dialog. It was not until the user explicitly expressed his preference in S6 that the first correct recommendation for the natural scenery category was given. In contrast, our model incorporated category knowledge into the encoding of the interaction sequence to better understand the user's preference information in the interaction sequence. It predicted that the user may prefer natural attractions, such as Haitou Mountain in S5, before they explicitly expressed their preference.   However, there are still some unsatisfactory points. First, since the limited sequence of user interactions tended to provide little information about user preferences, the conversational recommender system did not initially predict user preferences for natural landscapes in S1-S3. The system did not start giving the correct recommendations until S6 when the user directly stated that they liked the attraction mentioned in S5 and needed additional recommendations. This invariably increases the time for conversational recommendations. The sample also indicated that the accuracy of our proposed conversational recommender system needed to be further improved. Second, when the user mentioned an entity or relation unknown to the system, the system was unable to reply correctly, as in S12 and S13. Even though the proposed conversational recommender system could also produce a knowledge-grounded recommendation, the used knowledge was relatively limited and inappropriate. The knowledge incorporated in the dialogue recommendation system is yet to be resolved to allow for a complement or update.

Conclusions
This study constructed two Chinese conversational recommendation datasets, Cw-ConvRec and KdConvRec, for the travel recommender system. We also proposed a deep travel conversational recommender system model DTCRKG as a benchmark for model comparisons. Since both historical dialogues and interaction sequences were well encoded with tourism domain knowledge, the learned user preference representation features were more in-depth and the performance of the model was the best among those tested, which was also verified by our case analysis. In addition, we also found that dialogue data, as a complement to the sequence data, could alleviate data sparsity and provide a reason for the recommendation. Dialogue mode is effective for travel recommender systems, especially regarding cold starts and dynamics. Our work can expand and promote the development of a travel recommender system. In the future, we will explore the semi-supervised methods to amplify the size of dataset annotation and complement TKG or other knowledge graphs to support the proposed model better. Meanwhile, we will investigate semi-supervised learning and GNN-related technologies for a unified model, hopefully solving the noise and sequence length problem.