Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph

Su, Xueping; He, Jiao; Ren, Jie; Peng, Jinye

doi:10.3390/app122010226

Open AccessArticle

Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph

by

Xueping Su

^1,2,†,

Jiao He

^2,†,

Jie Ren

^2,† and

Jinye Peng

^1,*,†

¹

College of Information and Technology, Northwest University of China, Xi’an 710127, China

²

School of Electronics and Information, Xi’an Polytechnic University, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2022, 12(20), 10226; https://doi.org/10.3390/app122010226

Submission received: 1 September 2022 / Revised: 30 September 2022 / Accepted: 30 September 2022 / Published: 11 October 2022

(This article belongs to the Special Issue Application of Artificial Intelligence in Visual Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Facing the massive tourism data, the recommendation system mines the user’s interest to provide a personalized information service. The Knowledge Graph is introduced into a recommendation system, as auxiliary information can effectively solve the problems about data sparse and cold-start. Therefore, this paper proposes a new algorithm of personalized Chinese tourism recommendation based on the Knowledge Graph. First of all, because lack of the public Chinese tourism Knowledge Graph, a complete Chinese tourism Knowledge Graph is built. Secondly, a new B-TransD (Bernoulli-TransD) knowledge representation model is proposed to reduce the probability of false negative triples. Finally, the method of user interest model based on the attribute information of users and tourist attractions is proposed to improve the performance of the recommendation system. Experiments are conducted on a data set containing 9100 tourist attractions. The experimental results demonstrate that the proposed algorithm achieves significant improvement over the existing algorithms.

Keywords:

Chinese tourism knowledge graph; knowledge representation; personalized recommendation; collaborative filtering

1. Introduction

According to the problems of data sparsity and cold start in the recommendation system, relevant information is introduced to complete missing information in the traditional method, which mainly includes user attribute, item attribute, user social network, context and so on [1,2,3,4]. User-based recommendation algorithm and collaborative filtering algorithm build user’s interest model according to user’s behavior information. Although the above algorithms have low dependence on expert knowledge, the number of users and items in the commercial recommendation system is very large, in addition users have limited access to items, which leads to sparse user behavior information on items and reduce the performance of the algorithm. Knowledge Graph is a structured semantic knowledge base, which contains abundant entity information and the relationship among entities. On the one hand, the semantic information of Knowledge Graph provides auxiliary information to describe the relevant information of users and tourist attractions, to enhance the density of data and improve the accuracy of the recommendation algorithm. On the other hand, the semantic association of Knowledge Graph provides different relation connection, and it uses knowledge reasoning to mine user’s deep interests according to context information, as well as spready mine the different semantic relations of the project, to improve the diversity of recommended results [5]. To sum up, this paper proposes a new algorithm of personalized toursim recommendation based on Knowledge Graph.

The main contributions of this paper are be summarized as follows:

In response to the lack of the public Chinese tourism Knowledge Graph, this paper introduces a Knowledge Graph into the tourism field, analyzes knowledge characteristics of tourist attractions, realizes the division of the entities and the relationship among tourist attractions, and constructs a complete Chinese tourism Knowledge Graph.
Aiming at the problem of a high error rate in constructing negative triples, this paper introduces Bernoulli sampling strategy and proposes a new knowledge representation model of B-TransD (Bernoulli-TransD) to reduce the probability of false negative triples.
Because the traditional recommendation algorithm only relies on the user’s historical behavior, but does not consider the attribute information of users or tourist attractions, this paper proposes a new method of user interest based on the attribute of the tourist attraction, and excavates the user preferences for tourist attraction, building the user interest model to improve the performance of the recommendation system.

The rest of the paper is organized as follows: a related literature survey is summarized in Section 2, the details of the proposed framework and Knowledge Graph are introduced in Section 3, experimental results are presented in Section 4, and the study is concluded in Section 5.

2. Related Work

The Knowledge Graph was proposed by Google in 2012 to enhance the function of the search engine and improve the quality of search results. As a new type of auxiliary information, the Knowledge Graph can effectively solve the problems about the data sparse and cold-start of traditional recommendation systems [6,7]. The main methods of the Knowledge Graph applied in the recommendation system include: a recommendation generation based on ontology, recommendation generation based on open link data, and recommendation method based on knowledge representation learning.

The recommendation generation is based on ontology. The advantages of this method are that it combines the user and environment, provides user-centered route planning based on the hierarchical relationship of concepts in ontology, overcomes the shortcomings of standard modeling, and improves the accuracy of recommendation. It can also fine-grain the context of relation, enhance the relevance of data, and provide more fine-grained analysis of the user preferences. However, the disadvantages lie in the high construction cost and needing the participation of experts, the limited number of general large-scale ontologies, needing to update ontology knowledge base, and the complexity of the update process [8].
Recommendation generation is based on open link data. The method has the advantages of sufficient data, wide range, strong data association, accurate expression form, logical reasoning ability, and can mine and analyze effective information to improve the recommendation performance. However, the disadvantage is that the quality of the open link data can affect the quality of the recommended results [9,10].
The recommendation method is based on knowledge representation learning. The advantages of this method are that users or a tourist attraction is represented as a low-dimensional dense vector and the computational complexity is effectively reduced, as well as the computational efficiency and the recommendation performance being improved. However, it relies on the knowledge representation technology, which has low efficiency in the input module and the perception of a dynamic structure [11,12,13].

To sum up, although the recommendation system fused with the Knowledge Graph has developed rapidly in recent years, it still faces many challenges. The following scientific problems still need to be further explored: (1) accurate modeling of user preferences; (2) how to mine the relevant knowledge to improve the performance of the recommendation. In view of this, this paper designs a Knowledge Graph that integrates the correlation between users and tourist attractions, excavates the users’ preferences for the attribute of a tourist attraction, constructs the user interest model, and proposes a new algorithm for personalized travel recommendation based on a Knowledge Graph to improve the accuracy and diversity of a personalized travel recommendation.

In summary, this paper proposes a new algorithm of a personalized tourism recommendation based on the Knowledge Graph.

3. Methodology

3.1. Tourism Knowledge Graph

In the personalized travel recommendation algorithm based on the Knowledge Graph, the construction of the tourism Knowledge Graph is the foundation. The more complete the knowledge of the Tourism Knowledge Graph, the higher accuracy of recommendation. In this paper, the general construction method of the domain Knowledge Graph is adopted, and knowledge from multiple data sources is fused to construct the tourism Knowledge Graph from top to bottom.

3.1.1. Collecting Tourist Data

At present, there is a lack of public data sets of the Chinese tourism Knowledge Graph, but a large amount of data information about a tourist attraction can be obtained from the Internet. Therefore, this paper uses the Scrapy in python for data crawling. Table 1 is an example of the collected samples, which contains the following information: Geographical location, type, dynasty, ticket price, official website, average rating, opening time, etc.

3.1.2. Building Tourism Knowledge Graph

Ontology is a description of the nature and laws of things, and is an abstract model of the real world established by describing the concepts in the Knowledge Graph. There are generally two ways to build domain ontology: top-down and bottom-up. The top-down construction method first determines the data model of the Knowledge Graph, then fills in the concrete data according to the model, and finally forms the Knowledge Graph. This method is suitable for the industry Knowledge Graph. For an industry, data content and data organization modes are easier to determine [14]. Because the tourism KG involves a large amount of data, and rich information is included between scenic spots and scenic spots, the accuracy and diversity of the travel recommendation system can be improved by building the Graph from top to bottom. The Bottom-up construction method is to collect concrete data in the form of triples, and then refine the data model according to the data content. Because the public domain involves huge amounts of data and includes all aspects of knowledge, the general public domain Knowledge Graph uses this method. The effect is large and complete. In the early stage of construction, it is difficult to set the overall structure of the data, and the data frame model can be formed by summarizing and refining the features according to the content of the data. In summary, this paper adopts a top-down method, and the specific steps are as follows:

Defining the entities and related concepts in the field of tourism. The attribute of tourist attractions include location, type, dynasty, ticket price, official website, average score, opening time and so on. Among them, type, score, location will affect the recommendation result. For example (Figure 1), a tourist attraction may belong to different types. Therefore, this paper defines tourist attractions, the type of tourist attractions, the average score of tourist attractions, and the location of tourist attractions as entities.
Name entity recognition. Name entity recognition originally uses a rule-based and dictionary-based method. The domain experts make effective rules to identify entities in the text, but this method is time-consuming, labor-intensive, and poorly applicable. Deep learning can automatically learn features that are effective for the target, and can achieve good results without relying on feature engineering and domain knowledge. After experimental comparison, this paper uses Bidirectional long short-term memory network and a conditional random field model (Bi-LSTM+CRF) for entity recognition.
Defining entity attributes and value ranges. The attributes of the entity in ontology can be divided into two categories: object attribute and data attribute. Object attribute is used to describe the relationship of objects, and its value domain is an entity object. Data attribute is used to describe the inherent attribute of the entity, and it has transitivity, that is, the data attribute owned by the upper entity and inherited by the lower entity, and the value range is generally String, as shown in Table 2.
Creating an example of the ontology. Through steps 1, 2, and 3, complete the construction of the ontology database of tourist attractions, then instantiate the ontology, fill the Knowledge Graph data layer, and complete the construction of the tourism Knowledge Graph.

3.1.3. Construction Method of Graph Based on Neo4j

Neo4j is a non-relational database based on graph storage, which visually displays structured data in graphs. When processing data, the speed is much faster than that of relational databases, and data can be retrieved efficiently [15]. Therefore, this paper uses Neo4j to construct a tourism Knowledge Graph (as shown in Figure 2 and Figure 3). Neo4j contains two basic data types: nodes and edges. Nodes represent entities in the Knowledge Graph, and edges represent the attribute relationships in the Knowledge Graph.

3.2. B-TransD Knowledge Representation Model

Knowledge Representation is a distributed representation method, which uses machine learning for training, takes into account the relationship between objects, transforms the objects into related lowdimensional vectors, and calculates the Euclidean distance or cosine distance between the objects to obtain the semantic similarity. Bordes [13] and others proposed the TransE knowledge representation model, which uses distance to measure the semantic relationship between entities. The closer the entities are in the Knowledge Graph, the closer they are in semantics. The TransE model is simple and can directly establish the semantic relationship between entities. It also maintains good performance in a large-scale Knowledge Graph. However, TransE model [16] can only represent one-to-one relationships, and cannot express one-to-many, many-to-one, and many-to-many complex relationships. In order to solve this problem, many scholars have extended the model, such as TransR [17], TransH [18], and TransD [19].

In the TransD model, r and t are vector representations of the head and tail entity in the entity space,

h_{r}

and

t_{r}

separately represent the vector of the head and tail entity in hyperplane, and

M_{r}

is the mapping Matrix from the entity space to hyperplane r. Their relationship is as follows:

h_{r} = h M_{r}, t_{r} = t M_{r}

(1)

In the relationship space, through constant adjustment, the head entity

h_{r}

is shifted along the relationship vector r to obtain the tail entity vector

t_{r}

, so that

h_{r} + r

and

t_{r}

are as equal as possible. Euclidean distance is used to measure the distance between vectors. The loss function of TransD model is defined as follows:

f_{r} (h, t) = {∥ h_{r} + r - t_{r} ∥}_{2}

(2)

where

{∥ ∥}_{2}

is the 2norm of the vector. In actual training, all vectors in the model are normalized, so that

{∥ h ∥}_{2} \leq 1

,

{∥ t ∥}_{2} \leq 1

,

{∥ r ∥}_{2} \leq 1

,

∥ h M_{r} ∥_{2} \leq 1

,

∥ t M_{r} ∥_{2} \leq 1

.

In the Knowledge Graph, the triples based on objective facts are positive triples

(h, r, t)

. The common sampling method of negative triples is to replace the head or tail entities in the correct triples with other entities at random, and obtain the negative triples

(h^{^{'}}, r, t^{^{'}})

and the wrong triplet [20,21]. The triples in the Knowledge Graph of tourist attractions are multi-relational triples, and the traditional negative triples sampling method may introduce the wrong entities. For example, there are two triples

(h, r, t_{1})

and

(h, r, t_{2})

in the Knowledge Graph. For the

(h, r, t_{1})

triplet, when

t_{1}

is randomly replaced by

t_{2}

, the wrong triple

(h, r, t_{2})

is generated. In fact,

(h, r, t_{2})

is a correct triple, which leads to the wrong negative triple [22,23]. In the Knowledge Graph, the triples based on objective facts are positive triples. The common sampling method of negative triples is to replace the head or tail entities in the correct triples with other entities at random, and obtain the negative triples. For example, given triples (Yao Ming, born in Shanghai), negative examples should be generated by randomly selecting one of the entities of a similar type in Shanghai, such as Beijing, Xi’an, etc. A negative ratio is Yao Ming, born, male. To solve these problems, this paper introduces the Bernoulli sampling strategy and proposes a new knowledge representation model B-TransD (Bernoulli-TransD). In the process of constructing negative triples, two statistics R and N are obtained, the mean value N is the number of tail entities connected by the head entity and the mean value R is the number of head entities connected by the tail entity, so as to calculate Probability P.

P = \frac{R}{R + N}

(3)

When constructing a negative triple, the probability of replacing the head entity is, and the probability of replacing the tail entity is

1 - P

. According to the mapping property of the relationship, the different replacement probability for the head and tail entity is set:

(1): When the relationship type is one-to-many, $R \leq 1.5$ and $N < 1.5$ , more head entities are replaced, and the error probability of marking the positive triples formed by replacing tail entity as negative triples is reduced.
(2): When the relationship type is many-to-one, $R < 1.5$ and $N \leq 1.5$ , more tail entities are replaced and the error probability of marking the positive triples formed by the replacement head entity as negative triples is reduced.
(3): When the relationship type is many-to-many, $R \leq 1.5$ and $N \geq 1.5$ , consider the number of head entities and tail entities connecting the relationship, and choose the entity with relatively low error rate to replace. In model training, positive and negative triples are effectively separated in the vector space by maximum interval. In this paper, the distance-based sorting error function is used as the optimal objective function for model training.

$L = \sum_{(h, r, t) \in T} \sum_{(h^{^{'}}, r^{^{'}}, t^{^{'}}) \in T^{^{'}}} {[f_{r} (h, t) + Υ - f_{r^{^{'}}} (h^{^{'}}, t^{^{'}})]}_{+}$

(4)
(4): In the objective function, ${[]}_{+}$ is a hinge loss function, ensuring that each cumulative subterm is not a negative number. T is a set of positive triples in the Knowledge Graph, $T^{^{'}}$ is a set of negative triples, and $v a r U p s i l o n$ is the distance of separating positive and negative triples, generally set $Υ = 1$ , the loss of the correct triplet is close to 0, while the loss of the wrong triplet tends to infinity. For the objective function, the stochastic gradient descent algorithm is used to obtain the minimized the loss function and update the model parameters by an iterative solution. The specific training steps are as follows: (1) randomly traversing a positive triplet from the Knowledge Graph, and constructing the corresponding negative triplet; (2) normalizing the entities in the positive and negative triples into the embedding vectors, setting the learning rate $ε = 1$ to calculate the gradient direction by learning each sample and updating the parameters of the iterative model; (3) repeating steps (1) and (2) to reach the maximum number of iterations, and obtaining the minimization loss function.

$s i m (I_{i}, I_{j}) = \frac{1}{1 + \sqrt{\sum_{k = 1}^{n}} {(e_{k i} - e_{k j})}^{2}}$

(5)

where $s i m (I_{i}, I_{j})$ represents the similarity between entity $I_{i}$ and entity $I_{j}$ , and $e_{k i}$ and $e_{k j}$ are vector representations of entity $I_{i}$ and entity $I_{j}$ . Second, generating the semantic similarity Matrix of tourist attractions—tourist attractions. Finally, we construct the user interest model based on user behavior data, fuse the user interest model and the user-attraction scoring matrix to calculate the weight of the attraction similarity, obtain the final attraction similarity matrix by combining the weight to predict the score of the user object, and select the high score attractions as a list of recommended attractions to push to the user. Its flow chart is shown in Figure 4.

3.3. Personalized Travel Recommendation Algorithm Based on Knowledge Graph

The traditional recommendation algorithm has the advantages of a simple algorithm, easy to understand and implement, but there are problems of the sparseness of scoring matrix data and cold start. Aiming at the traditional recommendation of algorithms ignoring the text semantic information in similarity calculation, they do not consider or consider less the attributes of users or tourist attractions, and only rely on users’ historical behavior. In this paper, the KG-CF (Knowledge Graph Collaborative Filtering) recommendation algorithm is proposed [24,25,26]. Firstly, the algorithm uses rich semantic information of the Knowledge Graph to alleviate the sparseness of the data, embeds the tourist attractions information into the low-dimensional vector space, and expresses it with a distributed vector to obtain a tourist attractions similarity matrix. Secondly, the algorithm constructs a user interest model and the user-attraction rating matrix. Then, the algorithm combines the similarity weight with the attractions–attraction similarity matrix to calculate, obtains the attraction prediction score, and generates a recommendation list. The specific recommendation steps are as follows (as shown in Figure 3): first, analyze the user’s behavior data, construct the user-attraction scoring matrix, map the attractions in the matrix to the entities in the Knowledge Graph, use the B-TransD model to learn to represent the entities and attributes in the Knowledge Graph, obtain a set of low-dimensional dense real-valued vectors to represent the entities

I_{i} = {(e_{1 i}, e_{2 i}, . . ., e_{n i})}^{T}

and the entities with similarity when facing different relationships, use the Euclidean distance formula to calculate the distance between entities, and reduce their value range to (0, 1]. The calculation formula for the similarity between entities is as follows:

3.3.1. Modeling User Interests

Personalized recommendation is based on the user’s preferences, from the vast amount of information to find out the information that meets the user’s interests, and recommend it to the user. Because of the different interests of each user, the recommended objects are also different. Therefore, prior to a personalized recommendation, constructing a user interest model and applying it to the recommendation process can improve the quality of the recommendation system [27]. In the recommendation based on the Knowledge Graph, the entities corresponding to the items to be recommended in the Knowledge Graph are usually integrated into the recommendation system. These recommendation methods only describe the user’s interest from the entity level, and do not make use of the attribute information of the item, and cannot deeply dig into the user’s preference for the attribute of the item [28,29]. In the Knowledge Graph of tourist attractions, there are many attributes of tourist attractions besides tourist attractions’ entity. If making full use of these attributes’ information, they can express the user’s interest in a more fine-grained manner.

Aiming at the problem that entity attributes are not considered in the recommendation based on the Knowledge Graph, this paper proposes a new method of user interest modeling based on the entity attributes in the Knowledge Graph. By using the Knowledge Graph of tourist attractions, we are mining users’ interest in the attributes of tourist attractions from the attribute level, constructing a user interest model, expressing user interest in a more granular manner, and integrating it into the recommendation system to further improve the recommendation performance. The user’s attention to the tourist attractions is usually focused on the attribute of the tourist attractions, and the interest degree of the user is calculated, that is, the vector representation of the user under the attribute of the tourist attractions is calculated. According to the B-TransD model, the user interest vector is calculated. First define a set of triples for user attributes:

T_{u} = \{(h, r, t | h \in V_{u}^{+})\}

(6)

Among them,

t_{a}

is the set of historically visited tourist attractions of the user and is the corresponding entity of tourist attractions in the Knowledge Graph.

In order to obtain more fine-grained user interest, it is necessary to understand the user’s preference for the attributes of tourist attractions. Preference indicates that users have different degrees of attention to different attributes. To obtain user preference, firstly, it is necessary to calculate the attention degree of users to attributes, that is, to calculate the weight of each attribute in the historically visited tourist attractions. Given the vector of the tourist attractions, attributes, and attribute values, the weight of the attribute

r_{a}

in tourist attractions v is as follows:

w r_{a} = \frac{e x p ({(h M_{r_{a}})}^{T} r_{a})}{\sum_{(h, r, t) \in T_{u}^{e x p ((h M_{r}) T_{r})}}}

(7)

Among them, h and

r_{a}

are the vector forms of the tourist attractions and attributes obtained by B-TransD training, and

M_{r_{a}}

is the mapping matrix in B-TransD. Calculating the weight of each attribute of the user separately, the weighted sum of all attribute values in the set of historically visited attractions can represent the user interest vector. The expression is as follows:

c_{u} = \sum_{(h_{a}, r_{a}, t_{a}) \in T_{u}} w_{r_{a}} t_{a}

(8)

Among them,

t_{a}

is the amount of attribute values obtained by the B-TransD training. User interest is represented by vectors of related items, rather than independent feature words, and the fixed vector dimensions reduce the size of the parameters.

3.3.2. Weight Calculation of User- Attraction Scoring Matrix

The user- attraction scoring matrix is constructed according to the scores of all users on the attractions, and the users’ scoring can reflect the degree of their preference for attractions. The degree of the user’s preference for each attraction is expressed as the weight of the attraction in all the rated attractions of the user through the scoring matrix. Assuming that in the recommendation algorithm, the user set is represented as

U = U_{1}, U_{2}, . . ., U_{m}

, the set of the tourist attractions is represented as

I = I_{1}, I_{2}, . . ., I_{n}

, and the constructed user-attractions scoring matrix is represented as the matrix

R_{m \times n}

,

w p_{u j}

represents the weight of the rated attraction j among all the rated attractions of user u,

R_{u j}

represents the weight of the rated attractions j by user u,

N_{u}

represents the set of attractions that user has rated, and the calculation method is the rating of the attraction divided by the sum of the ratings of all the rated attractions of the user. The calculation formula is as follows:

w p_{u j} = \frac{R_{u j}}{\sum_{j \in N_{(u)}} R_{u j}}

(9)

3.3.3. Fusion Weight Calculation

In the proposed KG-CF recommendation algorithm, the user’s interest in tourist attractions is contained in the weight. The calculation of similarity weight mainly comes from two parts: (1) calculating the weight of each user’s evaluated tourist attractions in all the rated tourist attractions according to the user-attraction rating matrix, and recording it as

w p

; (2) using the Knowledge Graph of tourist attractions, mining the user’s interest in the attributes of tourist attractions, and constructing the user interest model and recording it as

w s

; finally, using the ratio

η \in (0, 1)

to fuse the weight

w p

and

w s

to obtain the corresponding weight value, and the value range is (0,1). The fusion weight formula is as follows:

W = η * w p + (1 - η) * w s

(10)

3.3.4. List of Attractions Recommended

According to predicting the user’s rating of the attraction to form a recommendation list, by combining B-TransD model and similarity weight calculation, the final attraction–attraction similarity matrix is obtained, which is essentially constructed from the nearest neighbors of each attraction in the set of attraction scored by users, in which the attractions rated by the user are excluded, and then calculate the similarity between the attractions to be predicted and each attraction in the set of user-rated attractions. The calculation formula of the prediction score is as follows:

P_{u i} = \frac{\sum_{j \in N (u), S {(i, k)}^{s i m (I_{i}, I_{j}) \times S_{u, j}}}}{\sum_{j \in N (u), S {(i, k)}^{s i m (I_{i}, I_{j})}}}

(11)

Among them,

s i m (I_{i}, I_{j})

is the similarity of attraction

I_{i}

and attraction

I_{j}

,

S_{u, j}

is the user’s rating of attraction

I_{j}

,

N (u)

is the collection of attractions rated by user u, and

S (i, k)

is the top k most similar attractions to

I_{i}

. The user’s rating of attraction

I_{j}

is the higher, and the similarity between attraction

I_{j}

and

I_{j}

is the higher, the value of

P_{u i}

is the greater.

4. Experiment

4.1. Experimental Data Set

The data of the Chinese tourism knowledge graph constructed in this paper comes from 235,975 comments on the data of users of Ctrip.com. The entities and attributes in the knowledge graph exist as independent nodes, with a total of 46,600 nodes, which are divided into training set, validation set and test set. The experimental setting are shown in Table 3.

4.2. Evaluation Metrics

In this paper, the performance evaluation of the Bi-LSTM+CRF algorithm mainly adopts the accuracy rate (Precision), the recall rate (Recall) and the comprehensive index F1 value. The prediction and the recall are defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

R e c a l l = \frac{T P}{T P + F N}

(13)

In the formula,

T P

is the number of samples that are positive and the prediction is also positive,

F P

is the number of samples that are negative and the prediction is positive, and

F N

is the number of samples that are positive and the prediction is also negative [30,31].

The F1 value is the weighted average of recall and precision. Recall and precision indicators may be contradictory in some application scenarios. The F1 value is an evaluation indicator that combines the two indicators, which comprehensively reflects the overall situation. The formula is as follows:

F 1 = 2 \cdot \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(14)

In this paper, link prediction is used to evaluate the training efficiency of the knowledge representation model. Link prediction predicts the missing head entity or tail entity in the correct triplet. In this scenario, for each missing entity, we use a ranking method to find its candidate entities from the knowledge graph of tourist attractions, sort the candidate entity, and then judge the training effect based on the ranking of the missing entities among the candidate entities. For a triplet

(h, r, t)

, given

(h, r)

to predict the tail entity t, and given

(r, t)

to predict the head entity h, we use Formula (2) to calculate the predicted triple score, and sort the scores in ascending order, two indicators are used to measure the quality of model training, the Mean Rank and the top 10 Hit rate (Hits@10). The Mean Rank represents the average rank of all correct entities in the test set. The value is the lower, the rank of the entity is the higher and the effect of model training is the better. The top 10 hit rate indicates the ratio of the number of correct entities in the top 10 to the total number of entities. Hits@10 is the higher, and the training performance of the model is the better.

Consider that the user may choose the attraction to be in the first few items in the recommended list in practical applications. Therefore, this paper uses the evaluation indicators commonly used in the recommendation system: one is to calculate the Mean Absolute Error (MAE) based on the user’s true rating of the attraction to determine the accuracy of the predicted attraction rating, and the other is to judge the recommendation effect based on the hit rate.

The mean absolute error MAE is a measure of the error between two values [32]. The user’s rating and the attraction rating predicted by the recommendation algorithm are incorporated into the calculation formula, and then the average value of the error of all of the predicted score attractions is calculated. The accuracy of the prediction depends on the average error. The smaller the MEA value is the better; the formula is as follows:

E_{M A E} = \frac{1}{n} \sum_{i \in U, j \in I} | p_{i, j} - r_{i j} |

(15)

In the formula, the set U and the set I are the user set and the set of attractions, respectively.

P_{i j}

is the evaluation score of the attraction j by the user i,

r_{i j}

is the true score of the attraction j by the user i, and the number of

p_{i j}

is n.

p r e c i s i o n @ K

represents the mean value of the probability that the first K items of each user’s recommendation list appear in the positive samples of the test set. The value of

p r e c i s i o n @ K

is the greater, the accuracy is the higher and the recommendation effect is the better.

p r e c i s i o n @ K = \frac{1}{K} \sum_{i = 1}^{K} \frac{| L_{i} \cap R_{i} |}{| L_{i} |}

(16)

where,

L_{i}

represents the recommended result list of user i, and

R_{i}

is the set of favorite attractions of user i.

4.3. Analysis of Experimental Results

The entity recognition experiment uses three models for comparison, namely the Hidden Markov Model (HMM), the Bi-directional Long Short-Term Memory (Bi-LSTM), and the Bi-directional Long Short-Term Memory Condition Random Field (Bi-LSTM+CRF). The Wikipedia Chinese corpus is used as the data set, and the labeling method uses the BIOES tag. In the experiment, the data are divided into a training, test and validation set by 32,620, 9320, 4660 (7:2:1). The training set is used to train the Chinese named entity recognition model, and the test set is used to measure the final effect of the model. The results of different models in Table 4.

As shown in Table 4, compared with using only Bi-LSTM model in Chinese NER, the Bi-LSTM+CRF model effectively improves the performance, the precision of Chinese NER was increased by 0.74%, the recall was increased by 0.73%, and the F1-score was increased by 0.78% in the Wikipedia Chinese corpus dataset.

According to the experimental results of the HMM model, it can be observed that the model, as a traditional statistical-based method, has a poor effect on Chinese-named entity recognition. Compared with the Chinese-named entity recognition model based on BiLSTM, the Chinese-named entity recognition model based on Bi-LSTM+CRF effectively improved the effect of the overall Chinese named entity recognition model.

This paper uses the TransD model and the improved B-TransD model to train the knowledge graph of Chinese tourist attractions respectively, and compares the results of the two training models by experiments. In order to obtain the better performance, it is necessary to adjust the parameters of the model, and finally select the parameters with the best training effect as the model parameters. Setting the interval of the learning rate

ε

of gradient descent is

0.1, 0.01, 0.001

, the interval of distance

γ

is

1, 2, 4

, and the dimension d of the vector space is

50, 100, 150, 200, 250, 300

, as in Figure 5 and Figure 6.

Figure 5 shows the Hit@10 results under different embedding dimensions, Figure 6 shows the Mean Rank results under different embedding dimensions. It can be observed from the figure that the optimal embedding dimension is 50.

This paper combines the above parameter values to carry out experiments, and each experiment has a maximum iteration of 500 times, and finally determines the optimal configuration of the model parameters as

ε = 0.001, γ = 1, d = 50

. The training results are shown in Table 5.

It can be observed from Table 5 that the average ranking of the B-TransD model is higher than that of the TransD model, and the hit rate of top 10 is higher than that of the TransD model. It can be observed that the B-TransD model has more advantages for data with a one-to-many or many-to-one relationship, which can better retain the semantic information in knowledge graph and achieve a more accurate entity and relation vector. In this paper, the results of the B-TransD training are used to represent the knowledge graph of tourist attractions. The user interest model constructed on this basis can also better retain the structural relationship in the knowledge graph, laying the foundation for subsequent attraction recommendation.

The weight calculation of the KG-CF recommendation algorithm is to fuse the weights of two parts in a certain proportion. The difference of fusion proportion has an impact on the performance of recommendation. In order to determine the best fusion proportion, this paper conducts several experiments. The experimental results demonstrate (as shown in Figure 7), whose performance is optimal when the recommendation number is 30 and the fusion ratio is 0.7. The best parameters of the knowledge representation are as follows: learning rate

ε = 0.001

, distance between positive and negative triples

γ = 1

, and the dimension of vector space

d = 50

.

Coordinated filtering is currently the most mainstream algorithm among recommendation algorithms, which have been widely used in the industry. Its biggest advantage is that it is easy to realize in engineering and can be easily applied to products. Besides that, the recommendation algorithm based on the TransE model is widely used. Therefore, this paper sets the fusion ratio to 0.7 for KG-CF under different neighbor numbers. We compare the proposed method with the collaborative filtering algorithm (item-CF) and the recommendation algorithm based on TransE (TransE-CF). The

p r e c i s i o n @ K

results are shown in Table 6.

As shown in Table 6, the recommendation algorithm in this paper is generally better than the comparison algorithm in the precision and average running time, which is that the rich semantic information of the knowledge graph can effectively alleviate the sparseness of the data. At the same time, the knowledge representation model is used to embed the attractions into the appropriate low-dimensional vector space and represent them by the distributed vectors, which reduces the computation time. Overall, the KG-CF recommendation algorithm improves the precision of the recommendation and provides interpretability for the recommendation results, thereby improving the performance of the recommendation algorithm and increasing the diversity of the recommendation results (as shown in Figure 8). The results of the recommendation is a sequence of places, including scenic spots and related scenic spots such as Figure 8. For example, if users search for Terracotta Warriors, the recommendation system will recommend relevant scenic spots in the same region (Tsui Wah Shan, Huaqing Pool and Lingtong museum), contemporaneous (Mausoleum of emperor qinshihuang, Epang Palace and Xianyang Palace) and with the same score (Bell tower and drum tower, Big Wild Goose Pagoda and Tang Paradise) as Terracotta Warriors.

5. Conclusions

This paper proposes a new personalized tourist recommendation algorithm based on the Knowledge Graph, which effectively solves the sparsity and cold-start problems in the traditional recommendation system. The algorithm constructs a knowledge graph of Chinese tourist attractions, vectorizes the entity attributes in the knowledge graph, designs a user interest model that integrates the entity attributes in the knowledge graph, and introduces a Bernoulli sampling strategy in the construction of negative triples to reduce the error probability of constructing the negative triples. The experimental results demonstrate that the proposed algorithm improves the precision of the traditional recommendation algorithm, and reduces the average running time. In addition, it was experimentally verified that the algorithm proposed in this paper can effectively solve the sparsity and cold start problems of traditional recommendation systems, which is also applicable to music, books, and many other fields. This recommendation algorithm is applicable to the question answering system. The tourism question answering system can accurately answer the user’s questions and display them in a concise language, reflecting the system’s high precision performance, and the system is applicable to multiple regional tourism scenes. However, different product domains have different relevance, extracting the labels of users and projects, obtaining information such as user-intensive knowledge or characteristics, and then migrating to the target task.

Author Contributions

Conceptualization, J.H.; Data curation, X.S.; Methodology, J.P.; Visualization, J.R.; Writing—original draft, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 61902301, The Natural Science Basic Research Key Program funded by Shaanxi Provincial Science and Technology Department (2022JZ-35), Shaanxi Natural Science Basic Research Project under Grant 2022JM-394 and 2022JQ-711, and Xi’an Science and Technology Bureau Science and Technology Innovation Leading Project under Grant 21XJZZ0020 and 21XJZZ0022.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Wang, H.; Zhang, F.; Wang, J.; Zhao, M.; Li, W.; Xie, X.; Guo, M. Exploring High-Order User Preference on the Knowledge Graph for Recommender Systems. ACM Trans. Inf. Syst. 2019, 37, 1–26. [Google Scholar] [CrossRef]
Noia, T.D.; Ostuni, V.C.; Tomeo, P.; Sciascio, E.D. SPrank: Semantic Path-Based Ranking for Top-N Recommendations Using Linked Open Data. Acm Trans. Intell. Syst. Technol. 2016, 8, 93–105. [Google Scholar] [CrossRef]
Feng, J.; Xia, Z.; Feng, X.; Peng, J. RBPR: A hybrid model for the new user cold start problem in recommender systems. Knowl.-Based Syst. 2021, 214, 106732. [Google Scholar] [CrossRef]
Kolahkaj, M.; Harounabadi, A.; Nikravanshalmani, A.; Chinipardaz, R. Incorporating multidimensional information into dynamic recommendation process to cope with cold start and data sparsity problems. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 9535–9554. [Google Scholar] [CrossRef]
Wang, D.; Cui, P.; Zhu, W. Structural Deep Network Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
He, X.; Ke, X. Research Summary of Recommendation System Based on Knowledge Graph. In Proceedings of the 2021 3rd International Conference on Big Data Engineering, Shanghai, China, 29–31 May 2021; pp. 104–109. [Google Scholar]
Sun, Z.; Yang, J.; Zhang, J.; Bozzon, A.; Huang, L.-K.; Xu, C. Recurrent knowledge graph embedding for effective recommendation. In Proceedings of the Twelfth ACM Conference on Recommender Systems, Vancouver, BC, Canada, 2 October 2018; pp. 297–305. [Google Scholar]
Cao, Y.; Wang, X.; He, X.; Hu, Z.; Chua, T.-S. Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In Proceedings of the WWW ’19: The Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 151–161. [Google Scholar]
Wang, Y.; Deng, J.-Z. User similarity collaborative filtering algorithm based on KI divergence. J. Beijing Univ. Posts Telecommun. 2017, 40, 110–114. [Google Scholar]
Chen, W.; Wen, Y.; Zhang, X. An Improved TransE-Based Method for Knowledge Graph Representation. Comput. Eng. 2020, 46, 63–69. [Google Scholar]
Song, H.-J.; Kim, A.-Y.; Park, S.-B. Learning Translation-Based Knowledge Graph Embeddings by N-Pair Translation Loss. Appl. Eng. 2020, 10, 3964. [Google Scholar] [CrossRef]
Wang, H.; Zhang, F.; Wang, J.; Zhao, M.; Li, W.; Xie, X.; Guo, M. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Proceedings of the CIKM ’18: The 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1024–1035. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2787–2795. [Google Scholar]
Chen, J.; Li, B.; Wang, J.; Zhao, Y.; Yao, L.; Xiong, Y. Knowledge Graph Enhanced Third-party Library Recommendation for Mobile Application Development. IEEE Access 2020, 8, 42436–42446. [Google Scholar] [CrossRef]
Do, P.; Phan, T.H.V.; Gupta, B.B. Developing a Vietnamese tourism question answering system using knowledge graph and deep learning. Trans. Asian -Low-Resour. Lang. Inf. Process. 2021, 20, 1–18. [Google Scholar]
Dai, Y.; Wang, S.; Xiong, N.N.; Guo, W. A survey on knowledge graph embedding: Approaches, applications and benchmarks. Electronics 2020, 9, 750. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 99–106. [Google Scholar]
Yang, B.; Lei, Y.; Liu, J.; Li, W. Social collaborative filtering by trust. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1633–1647. [Google Scholar] [CrossRef] [PubMed]
Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransG: A Generative Mixture Model for Knowledge Graph Embedding. Comput. Sci. 2016, 963–972. [Google Scholar]
Li, C.; He, K. CBMR: An optimized MapReduce for items based collaborative filtering recommendation algorithm with empirical analysis. Concurr. Comput. Pract. Exp. 2017, 29, e2181. [Google Scholar] [CrossRef]
Tang, X.; Wang, T.; Yang, H.; Song, H. Akupm: Attention-enhanced knowledge-aware user preference model for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1891–1899. [Google Scholar]
Bommisetty, R.M.; Prakash, O.; Khare, A. Keyframe extraction using Pearson correlation coefficient and color moments. Multimed. Syst. 2020, 26, 267–299. [Google Scholar] [CrossRef]
Qi, G.L.; Gao, H.; Wu, T.X. The research advances of knowledge graph. Technol. Inf. Eng. 2017, 3, 4–25. [Google Scholar]
Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W.Y. Collaborative Knowledge Base Embedding for Recommender Systems. In Proceedings of the 22nd ACM SIGKDD Intermational Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar]
Zhu, S.; He, X.; Yu, h. Overview of cross-language retrieval technology based on knowledge graph. In Proceedings of the 2019 International Conference on Aviation Safety and Information Technology (ICASIT 2019), Kunming, China, 17–19 December 2019; pp. 148–151. [Google Scholar]
Nickel, M.; Murphy, K.; Tresp, V.; Gabrilovich, E. A Review of Relational Machine Learming for Knowledge Graphs. Proc. IEEE 2016, 104, 11–33. [Google Scholar] [CrossRef]
Kethavarapu, U.P.K.; Saraswathi, S. Concept based dynamic ontology creation for job recommendation system. Procedia Comput. Sci. 2016, 85, 915–921. [Google Scholar] [CrossRef] [Green Version]
Oramas, S.; Ostuni, V.C.; Noia, T.D.; Serra, X.; Sciascio, E.D. Sound and music recommendation with knowledge graphs. ACM Trans. Intell. Syst. Technol. 2017, 8, 21. [Google Scholar] [CrossRef]
Wang, X.; Wang, D.; Xu, C.; He, X.; Cao, Y.; Chua, T.S. Explainable reasoning over knowledge graphs for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5329–5336. [Google Scholar]
Wang, H.; Zhang, F.; Xie, X.; Guo, M. DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the WWW ’18: The Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1835–1844. [Google Scholar]
Wang, H.; Zhang, F.; Hou, M.; Xie, X.; Guo, M.; Liu, Q. SHINE: Signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 5–9 February 2018; pp. 592–600. [Google Scholar]

Figure 1. Examples of classification of tourist attractions types.

Figure 2. Part of Chinese tourism Knowledge Graph.

Figure 3. Shaanxi tourism knowledge graph as an example.

Figure 4. Recommended flowchart.

Figure 5. Hit@10 under different embedding dimensions.

Figure 6. Mean Rank under different embedding dimensions.

Figure 7. MAE Values at different fusion ratios.

Figure 8. Examples of recommended results’ diversity.

Table 1. Example of data collection.

Attraction Name	Geographical Location	Type	Dynasty	Ticket	Official Website	Rating	Opening Time
Terra Cotta Warriors	Lintong District, Xi’an City, Shaanxi Province	Historical site	Qin	150 yuan	http://bmy.com.cn/	4.6	8:30–17:30
Huaqing Pool	Lintong District, Xi’an City, Shaanxi Province	Mountain site	Republic of China	110 yuan	http://www.hqc.cn	4.7	9:00–17:00
Bell Tower and Drum Tower	Beilin District, Xi’an City, Shaanxi Province	Historical site	Ming	50 yuan	hhtp://www.xazgl.com	4.5	8:30–20:30
Giant Wild Goose Pagoda	Yanta District, Xi’an City, Shaanxi Province	Historical site	Tang	50 yuan	http://www.xadayanta.com	4.5	9:00–17:00
Tsui Wah Shan	Chang’an District, Xi’an City, Shaanxi Province	Mountain landscape	Modern Times	45 yuan	http://www.cuihuashan.com	4.3	8:00–19:00

Table 2. Entity attributes of tourist attractions’ ontology.

Entity Class	Name	Attribute Class	Range	Description
Tourist attractions	Ticket price	Data	String	Price of admission to tourist attractions
	Official site	Data	String	Official sites of tourist attractions
	Opening time	Data	String	Opening hours of tourist attractions
	Subordinate to the dynasty	Data	String	Establishment Time of tourist attractions
	Opening time Location	Object	Location of attractions	The location of the attraction
	Types	Object	Types of tourist attractions	Natural landscape or cultural landscape
	Average score	Object	Attractions rating	Visitors rate the site
Types	Type name	Data	String	Natural landscape or cultural landscape
Score	Average score	Data	String	Visitors rate the site
Location	Regional ascription	Data	String	The location of the attraction

Table 3. The data statistics of knowledge graph.

Entity				Attribute
Attraction	Natural landscape	Cultural landscape	Location of attractions	Scoring	In the news
9100	6200	2900	1100	9100	27,300

Table 4. The experimental results of Chinese NER.

Metrics Entity	Precision			Recall			F1-Score
Metrics Entity	HMM	Bi-LSTM	Bi-LSTM+CRF	HMM	Bi-LSTM	Bi-LSTM+CRF	HMM	Bi-LSTM	Bi-LSTM+CRF
I-ORG	0.0477	0.8997	0.9013	0.0464	0.7332	0.8168	0.0470	0.8080	0.8570
B-PER	0.0076	0.9247	0.9339	0.0075	0.8230	0.8723	0.0076	0.8708	0.9021
O	0.8678	0.9809	0.9861	0.8708	0.9921	0.9935	0.8693	0.9865	0.9898
I-LOC	0.0242	0.8025	0.8588	0.0251	0.8302	0.8595	0.0246	0.8161	0.8592
B-ORG	0.0092	0.8568	0.8500	0.0073	0.6627	0.7574	0.0081	0.7474	0.8011
B-LOC	0.0190	0.8069	0.8827	0.0172	0.8704	0.8767	0.0181	0.8375	0.8797
I-PER	0.0174	0.9309	0.9429	0.0183	0.8420	0.8809	0.0179	0.8842	0.9108
Avg	0.7720	0.9682	0.9756	0.7746	0.9689	0.9762	0.7753	0.9680	0.9758

Table 5. Training results.

	Mean Rank	Hits@10
TransD	65	85.7
B-TransD	58	87

Table 6.

p r e c i s i o n @ K

results.

Table 6.

p r e c i s i o n @ K

results.

	$precision @ 20$	$precision @ 30$	$precision @ 50$	Average Running Time (s)
TransE-CF	0.58	0.68	0.55	181.9
TransE-C	0.62	0.72	0.65	176.2
KG-CF	0.65	0.86	0.77	160.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Su, X.; He, J.; Ren, J.; Peng, J. Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph. Appl. Sci. 2022, 12, 10226. https://doi.org/10.3390/app122010226

AMA Style

Su X, He J, Ren J, Peng J. Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph. Applied Sciences. 2022; 12(20):10226. https://doi.org/10.3390/app122010226

Chicago/Turabian Style

Su, Xueping, Jiao He, Jie Ren, and Jinye Peng. 2022. "Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph" Applied Sciences 12, no. 20: 10226. https://doi.org/10.3390/app122010226

APA Style

Su, X., He, J., Ren, J., & Peng, J. (2022). Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph. Applied Sciences, 12(20), 10226. https://doi.org/10.3390/app122010226

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Tourism Knowledge Graph

3.1.1. Collecting Tourist Data

3.1.2. Building Tourism Knowledge Graph

3.1.3. Construction Method of Graph Based on Neo4j

3.2. B-TransD Knowledge Representation Model

3.3. Personalized Travel Recommendation Algorithm Based on Knowledge Graph

3.3.1. Modeling User Interests

3.3.2. Weight Calculation of User- Attraction Scoring Matrix

3.3.3. Fusion Weight Calculation

3.3.4. List of Attractions Recommended

4. Experiment

4.1. Experimental Data Set

4.2. Evaluation Metrics

4.3. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI