Analysis of Customer Satisfaction in Tourism Services Based on the Kano Model

: Understanding customer needs is of great signiﬁcance to enhance service quality and competitive advantage. However, for the tourism industry, it is still unclear how to mine service improvement strategies from tourist-generated online reviews. This paper aims to develop a data-driven approach to conduct a ﬁne-grained dimension analysis of customer satisfaction with tourism services. First, this paper uses Latent Dirichlet Allocation to explore the key dimensions of tourist satisfaction from online reviews. Next, based on the Chinese sentiment dictionary, tourists’ emotional attitudes towards each service dimension can be identiﬁed. Then, the backpropagation neural network is used to measure the complex relationship between tourists’ sentiment orientations towards different dimensions and their satisfaction. Finally, according to the improved Kano model, multi-dimensional attribute classiﬁcation is realized to support the strategic analysis of tourism service quality improvement. The proposed method is empirically veriﬁed through a real tourism review dataset. The results exhibit the theoretical and practical implications of our method.


Introduction
Customer satisfaction is a psychological state achieved by customers based on their subjective judgment of the degree to which products or services fulfill their requirements [1].The inherent characteristics of tourism services determine that customer satisfaction is an important reference to evaluate its quality.Firstly, tourism services are experience-oriented services, and the quality of service largely depends on the experience and feelings of tourists themselves, which is difficult to be evaluated by consistent testing standards [2].Secondly, tourists participate in and experience the whole process of the service, and have a tangible perception of its advantages and disadvantages.Thirdly, customer satisfaction has a strong correlation with recommendation intention [3], and improving customer satisfaction is an important driving force for the economic development of tourism enterprises.Some existing studies have shown that effective product improvement or Research and Development analysis can be conducted by modeling or measuring customer satisfaction, supplemented by other relevant models [4,5].Consumer surveys or experiments are a typical way of measuring customer satisfaction [6].However, this approach requires rigorous design and appropriate procedures to ensure the quality of participants' responses.Undoubtedly, this is costly in money and time, and the data can become quickly outdated.For the tourism service industry, tourists' consumption has the characteristics of one-time, mobility and hedonism, which makes it more difficult to carry out customer satisfaction surveys.Therefore, how to use the accumulated online tourism data resources to measure the customer satisfaction of tourism services is worth further research.
User-generated content (UGC) is the public content spontaneously generated by users in the Web2.0 environment, which includes various forms of data such as audio, video, text, pictures, etc. UGC contains a wealth of people's attitudes, opinions and other information, and the research on its mining is becoming deeper and deeper.Among them, text data have become one of the most important elements reflecting market sentiment, as well as one of the main forms of tourism's Big Data [7].Online travel reviews objectively reflect the tourists' real perception of the attractions and services of the tourist destination, which is one of the most important ways of online word-of-mouth communication and is the most authentic portrayal of tourists' views, feelings, and sentiments.It is also the true expression of tourists, which contains the tourists' attitude towards the tourism service elements of tourism destinations.Therefore, travel online reviews contain rich information, which is of great value for managers and researchers to understand customer satisfaction [8].In addition, compared with survey research or experimental research, online review data have the characteristics of public availability, low cost, spontaneous generation, great insight, and large number of participants [9], which make online review data more suitable for constructing comprehensive customer satisfaction models.With the above advantages in mind, online travel reviews are a potentially powerful data resource for understanding customer satisfaction with tourism services.Some studies in the field of tourism have confirmed that customers' travel sentiments have a significant impact on their satisfaction.For example, a survey on customer satisfaction of diving services shows that high customer satisfaction is closely related to sentiments, such as excitement, pleasure, awe, surprise, etc.All themes in the theoretical framework of diver satisfaction determined in this study are regulated by sentiments [10].Geetha et al. [11] studied the relationship between customer sentiment and customer ratings in online reviews of the hotel industry, and found consistency between customer ratings and customer feelings for advanced and economic hotels, and that customer sentiment explained significant differences in customer ratings for the two hotel categories.In fact, the overall satisfaction of customers is a comprehensive consideration of the various tourism services provided.Similar to the research of Bi et al. [7], this paper regards customer satisfaction with tourism services as the result of the comprehensive effect of the perception (emotion) of customers in the multiple dimensions of tourism services they experience, which is consistent with our intuition.However, it is still unclear how to design service improvement strategies according to the association between customer satisfaction and multiple dimensions of service quality perception.This paper focuses on developing a data-driven approach to conduct a fine-grained dimension analysis of customer satisfaction with tourism services.We will achieve this aim by answering three sub-issues: (1) how to explore the key quality factors affecting customer satisfaction from the online travel review data of tourist attractions?
(2) how to measure the complex impact of tourists' emotional attitude in each key factor on overall customer satisfaction?
(3) based on the influence of each key factor obtained, how to classify their attribute types and gain insight into how these key factors play a role in improving customer satisfaction, especially when faced with severe challenges, such as the COVID-19 pandemic?
To solve these problems, this paper establishes a framework for constructing a customer satisfaction model from online travel reviews.In this framework, firstly, the Latent Dirichlet Allocation (LDA) model is used to extract important tourism service features that customers care about from online reviews of tourist attractions.Then, based on the Chinese sentiment dictionary, emotional attitudes (positive or negative) about these important characteristics can be identified in the review data.Then, to measure how customer sentiment of each feature affects customer satisfaction, a measurement method based on BPNN is proposed.On these bases, combined with the Kano model, the dimension of customer satisfaction is classified.Finally, this paper uses the review data of tourist attractions in Guizhou province, since the outbreak of COVID-19, to verify the proposed method and analyze the customer satisfaction of tourism services in Guizhou.Notably, Guizhou is a leading tourism province in China, and rich tourist review resources have accumulated on online platforms, influencing its choice as a typical example for this paper.Similarly, tourism services in other regions can be analyzed according to our method guideline, which exhibits generalization.The rest of this paper is organized as follows.Section 2 reviews the relevant literature.Section 3 presents our proposed research method.Section 4 explains our method through empirical research.Section 5 demonstrates the theoretical and practical implications, and presents the limitations of our study with directions for future research.

Literature Review
In this section, we will review the research on customer satisfaction measurement, then introduce the Kano model.

Research on Customer Satisfaction Measurement
With the accumulation of massive online review data, many studies are emerging to analyze customer satisfaction directly or indirectly from online reviews.These studies can be mainly divided into three main streams: (1) to explore the key attributes affecting customer satisfaction from online reviews and conduct sentiment analysis; (2) research on the relationship between product/service attribute performances and customer satisfaction; (3) research on customer satisfaction model based on online reviews.

Attribute Extraction and Sentiment Analysis Based on Online Reviews
With the rapid development of the Internet, massive UGC has accumulated continuously.UGC contains rich and real customer perspective information and is regarded as a data resource with a strong potential for understanding and managing customer demands [12].However, online review data exist in the form of free text.As a kind of unstructured data, text data cannot be directly analyzed.Therefore, the conversion of text review data into structured data that can be directly utilized is the basis for subsequent analysis.Customer opinion mining from online reviews is one of the most pursued areas of research, which is mainly carried out based on attribute extraction and sentiment analysis [13].
Attribute extraction refers to extracting topics that users pay frequent attention to from online reviews, as well as keywords related to topics.There are two main categories of attribute extraction methods: (1) The statistical model-based methods, such as association rule mining [14], Hidden Markov Model [15], LDA model [8], etc.Among them, the LDA model becomes one of the most widely used models.For example, the study of Tirunillai and Tellis used the improved LDA model to propose a unified framework for extracting tourism service attributes from online reviews [16].Another study utilized the LDA model to identify the key dimensions of customer satisfaction from 266,554 pieces of online review data [8].(2) The rule-based method, which formulates the corresponding extraction rules according to the characteristics of the review text and the research goal to realize the extraction of attributes.For example, Kang and Zhou [17] proposed an unsupervised rule-based approach to identify subjective and objective characteristics from online reviews.Rana and Cheah et al. [18] defined a rules-based sequential pattern for online review mining and proposed a rules-based two-stage extraction model for dimensional extraction.
Sentiment analysis can mine the sentiment information hidden in online reviews to help understand customers' emotional attitudes toward product and service attributes.Sentiment analysis methods are mainly divided into lexical sentiment analysis, such as dictionary-based sentiment analysis and corpus-based sentiment analysis [19] and sentiment analysis based on machine learning, such as support vector machine (SVM), Naive Bayes, unsupervised machine learning [20], etc.

Research on the Influence of Attribute Performance on Customer Satisfaction
On the basis of attribute and sentiment mining, the subsequent research mainly focuses on analyzing the impact of tourism service attribute performances on customer satisfaction in online reviews.It is worth noting that since online reviews express customers' views and feelings, attribute performances in related studies refer to customers' feelings on product/service attributes, which are usually expressed by customers' emotional attitudes towards attributes.In the field of tourism, the several present studies mainly focus on the hotel industry.For example, a recent study examined how cultural traits affect the role of attribute-level experience on tourist satisfaction, and used a deep learning algorithm to propose an attribute-level sentiment analysis model to extract tourists' attribute-level experience from online reviews.An empirical study based on nearly 50,000 online reviews collected by TripAdvisor found that positive/negative attribute experience has different impacts on customers with different cultural traits [21].In addition, to understand the customers' demands of five-star hotels, Bi et al. [7] proposed an online review mining method from the perspective of attribute importance-performance analysis (IPA).In this method, LDA is first used to find several useful hotel attributes from online reviews, then SVM is used to analyze customers' performance feelings on these attributes in the reviews, and then an integrated neural network model is used to calculate the importance of attributes.Finally, an IPA diagram is constructed according to the results to analyze customer demand.From the perspective of service improvement, Zhang et al. [22] based on the existing research on the relationship between service performance and customer satisfaction, and considering the influence of consumer expectations and subjective opinions of management, proposed an online review-driven method to determine the priority of hotel service resource allocation.In this method, the LDA model is used to extract service attributes, and the recursive neural tensor network is used to divide the attribute sentiments involved in reviews into five categories.Then, the traditional PRCA model is improved to analyze the asymmetric relationship between attribute performances and customer satisfaction.On this basis, the customer mention frequency is calculated, the customer satisfaction function is constructed, and finally, the improvement strategy analysis is realized under the framework of the Kano model.
These studies have made outstanding contributions to the tourism field to gain insight into customer satisfaction from online reviews.However, there are still limitations in these research methods.First of all, for the research that extracts service attributes from online reviews for performance analysis, there is no quantitative measurement of the impact of these attributes on customer satisfaction, such as the following study [7].Second, even if the influence of service attributes on customer satisfaction is measured, these studies are usually based on the assumption that customer satisfaction (online ratings) follows a Gaussian distribution, and the relationship between satisfaction and all attribute tendencies follows additive independence, such as the following study [22].In fact, in many real situations, customer satisfaction follows a positive skew, asymmetric, bimodal (or J-shaped) distribution [23,24].At the same time, because the attributes automatically mined from reviews are not as rigorous as those in a well-designed questionnaire, there may be more complex multilinear or nonlinear relationships between different attributes and customer satisfaction.Third, some studies do not pay attention to the categories of service attributes, such as the following study [21].However, some studies have confirmed that service attributes can be divided into different categories, and the attributes of different categories will affect customer satisfaction in different ways [25,26].For example, performance attributes will cause dissatisfaction when they are not implemented, but satisfaction when they are implemented; however, a higher degree of realization of the reverse attributes will lead to an increase in tourist dissatisfaction.Recently, the empirical study of Xu et al. [27] also showed that the attribute type moderates the impact of perceived attribute experience on overall satisfaction.Therefore, identifying the category of tourism service attributes is helpful to provide a clearer improvement direction for promoting tourism satisfaction and realize a more effective allocation of tourism service resources, which is necessary for further research.
In summary, the extant related literature does not provide an effective scheme to design tourism service improvement strategies from abundant UGC.We will develop a comprehensive approach including multiple techniques to model the complex relationship between customer satisfaction and multiple dimensions of service quality perception, and to classify the types of service attributes.

Kano Model
The Kano model is a two-dimensional demand analysis model that classifies product/service attributes into different categories.According to the realization degree of attributes and their impact on customer satisfaction, Kano divides into five types of attributes, as shown in Figure 1 [25]: (2) Excitement attribute: This kind of attribute will make tourists sa realized, but will not lead to dissatisfaction when it is not realized; (3) Must-be attributes: the realization of such attributes is taken for ists, and will not improve satisfaction; however, not being realized wil faction of tourists; (4) Reverse attributes: the higher the realization of such attributes, t fied tourists will be; (5) Indifferent attributes: the realization degree of such attributes h with tourist satisfaction, or has only a very small impact.

An Approach to Modeling Tourist Satisfaction in Online Travel Re
In this section, we propose our methodological framework, and the in the following.

Methodological Framework
This section introduces a methodological framework for modeling t from online travel reviews, as shown in Figure 2.For a clearer introduct framework, first define some basic concepts covered in this article.(1) Performance attributes: The realization of such attributes is positively related to tourist satisfaction.When such attributes are not implemented, tourists will be dissatisfied; when implemented, they will bring satisfaction to tourists; (2) Excitement attribute: This kind of attribute will make tourists satisfied when it is realized, but will not lead to dissatisfaction when it is not realized; (3) Must-be attributes: the realization of such attributes is taken for granted by tourists, and will not improve satisfaction; however, not being realized will lead to dissatisfaction of tourists; (4) Reverse attributes: the higher the realization of such attributes, the more dissatisfied tourists will be; (5) Indifferent attributes: the realization degree of such attributes has nothing to do with tourist satisfaction, or has only a very small impact.

An Approach to Modeling Tourist Satisfaction in Online Travel Reviews
In this section, we propose our methodological framework, and then detail each part in the following.

Methodological Framework
This section introduces a methodological framework for modeling tourist satisfaction from online travel reviews, as shown in Figure 2.For a clearer introduction to the method framework, first define some basic concepts covered in this article.1. Online travel reviews: An online travel review is text generated by a tourist, incl ing the tourist's opinions on the tourism services he/she experienced; 2. Tourist satisfaction: Tourist satisfaction is a kind of psychological state, result from tourists' overall subjective evaluation of tourism services based on their exp tations and actual performance [7].In previous studies, online ratings of custom are usually used to represent customers' satisfaction with products and serv [8,16,28,29].Following these studies, this paper will use online ratings of tourist express their satisfaction levels with the tourism services they experienced;  1. Online travel reviews: An online travel review is text generated by a tourist, including the tourist's opinions on the tourism services he/she experienced; 2. Tourist satisfaction: Tourist satisfaction is a kind of psychological state, resulting from tourists' overall subjective evaluation of tourism services based on their expectations and actual performance [7].In previous studies, online ratings of customers are usually used to represent customers' satisfaction with products and services [8,16,28,29].Following these studies, this paper will use online ratings of tourists to express their satisfaction levels with the tourism services they experienced; 3. Tourist satisfaction dimension (TSD): Tourists usually evaluate tourism services based on their perception of some important attributes of the tourism services they experienced.Similar to the research of Guo et al. [8], this paper defines these important attributes of tourism services as TSDs; 4. Classification of TSDs: In this paper, under the classification framework of the Kano model, TSDs are divided into five categories, namely, performance TSD, excitement TSD, must-be TSD, reverse TSD, and indifferent TSD, whose meanings correspond to the five basic attributes of Kano model.
Based on Figure 2 and an explanation of the basic concepts, let's look at the three main parts of the framework.

Mining Tourists' Sentiments toward TSDs from Online Reviews
The collected online review data exist in the form of free text, which cannot be directly analyzed.In this part, through various text processing technologies, tourists' positive or negative sentiments on different TSDs are mined from online text reviews, so as to transform unstructured online text reviews into a structured data matrix for modeling tourist satisfaction.Specifically, this part consists of two stages: (1) mine TSDs from online reviews using the LDA model; (2) identify the sentiment orientations of the review data for each TSD based on the sentiment dictionary.

Identifying the Category of Each TSD
Tourist satisfaction is expressed by the online rating given by tourists.The process of tourists giving the overall rating is a comprehensive evaluation of the performance of the tourism services in various aspects.Thus, tourist satisfaction can be viewed as a complex combination of tourist sentiments regarding the multiple TSDs covered in their reviews.Based on the structured review data obtained in the last stage, as well as overall tourist satisfaction (online ratings), this paper uses the BPNN network to depict the influence of tourists' positive and negative sentiments of TSDs on their satisfaction, and then puts forward the calculation method of effect according to the weight parameters obtained by model training.

Measuring the Influence of Each TSD Sentiment on Tourism Service Satisfaction
Identifying the corresponding attribute category of each TSD under the Kano framework is conducive to improving the tourism services more effectively and thus improving tourist satisfaction.According to the importance calculated in the previous step, we can use the effect-based Kano model (EKM) to convert the extracted TSDs into five categories (performance TSD, excitement TSD, must-be TSD, reverse TSD, and indifferent TSD).Finally, the identified TSDs category plays an important role in the improvement strategy of tourism services.
Let's go through each of these sections in detail based on the overall framework in Figure 2.

Extracting TSDs Based on LDA
Previous studies have proved that LDA is an effective topic extraction method for online reviews [8,30].LDA is a three-level Bayesian model, which assumes that each item in the set contains a finite number of topics.In the LDA model, words, topics, and documents are three important concepts.The word is a basic unit.Each document consists of multiple words and contains one or more topics.In this context, each online review can be viewed as a document, so the words in the document are the words in the review.According to the frequency of each word in each review, we can obtain the topic distribution of the review, and the word distribution of the topic by training the LDA model.
Extracting TSDs from travel reviews using LDA mainly includes two steps: preprocessing text reviews and extracting TSDs from review data:

Preprocessing text review data
In the Chinese travel text review data, there are not only words related to the required TSDs, but also a large number of noise words and irrelevant words, which will not become the target CSDs and will aggravate the data sparsity problem.Therefore, it is necessary to improve the effect of the LDA model through the pretreatment process.First, we divide the Chinese text review data into words.Then, we filter the corresponding words in the sentences according to the stop word dictionary (HIT Stop Word List, Chinese Stop Word List), negation dictionary, degree adverb dictionary, and sentiment dictionary.Let R = {r 1 ,r 2 , . . .r M } denote the online review dataset, where r m (m = 1, 2 . . ., M) denotes the Mth text review, and m is the number of reviews in the dataset.By counting the occurrence frequency of each word in each preprocessed text review, we can obtain the review-word matrix X M×N , where N denotes the number of words appearing in all the preprocessed review data; 2. TSDs extraction based on LDA By using the obtained X M×N matrix as input, the LDA model can be trained.The output of the trained LDA model has three parts, including review-topic matrix, topicword matrix, and topic list.Because there are noise words in the obtained topics, and some subject words may have similar meanings, we can manually filter the noise words and merge subject words with similar meanings to obtain more reasonable results.Then, we select the appropriate topics from the results and assign the tag to each topic.As in some existing studies, each thematic term can be regarded as a TSD [7,16].Let I denote the number of TSDs (i.e., topics), each consisting of multiple frequent words, and let C i denote the number of frequent words under the ith topic, so that the ith TSD can be denoted as t i = word i1 , word i2 , . . ., word iC i , where word ij (i = 1, 2 . . ., I; j = 1, 2, . . ., C i ) denotes the jth frequent word in the ith TSD.

Dictionary-Based TSD Granularity Sentiment Recognition 1. Review text decomposition
Typically, a single online review may contain several sentences related to different TSDs.Table 1 shows several examples of online travel review text.At the same time, the TSDs mentioned in different reviews may be different.To identify tourists' emotional attitudes towards different attributes, it is necessary to extract the sentences containing each attribute from the original reviews, that is, decompose the original reviews into components under different TSDs.First, we divide the online reviews in R into clauses according to punctuation marks.

No.
Review Content 1 Huangguoshu Waterfall lives up to its name and is spectacular.
2 This is a very huge water cave, beautiful scenery, can see very high, the cave is very cool, different sky, lights a dozen colorful, especially good looking, the ticket is not expensive.Welcome everyone here.
3 Although the service is not great, I have to say that the scenery is good, especially the waterfalls and the river valley are very commendable, but it is recommended to visit during the off-season.
Then, according to the obtained t i = word i1 , word i2 , . . ., word iC i , we extract sentences from R which contains word ij (i = 1, 2 . . ., I; j = 1, 2, . . ., C i ), and obtain the review set R i related to the topic t i (TSD i ).In particular, if more than one sentence in a review is related to a TSD, the sentences are merged into one.If a review contains no sentence related to a TSD, the corresponding component is null.
Let R i = {r i1 , . . . ,r im , . . . ,r iM } (i = 1, 2, . . ., I) denote the review set related to the ith TSD in the total review data R, r im denote the mth review in R i , and M denote the number of reviews in the dataset.To show this process more clearly, we take the reviews in Table 1 as an example.Through the above processing, we can extract the clauses related to TSDs from the three reviews.The specific results are shown in Table 2.

Review text decomposition
In order to identify the sentiment orientations of each review in R i , this paper will use a Chinese sentiment dictionary for sentiment analysis.Dalian University of Technology Chinese Emotion Vocabulary Ontology Database (called EMO_DIC) covers commonly used sentiment words in Chinese context, which is a widely used sentiment dictionary in the field of Chinese sentiment analysis [31,32].Referring to these studies, this paper uses EMO_DIC as the basic sentiment dictionary, which contains a total of 27,466 Chinese sentiment words.It divides Chinese sentiment into 7 categories and 21 sub-categories.Specifically, they include "joy", "good", "anger", "sorrow", "fear", "evil", and "shock".As shown in Table 3, the dictionary classifies each sentiment word into a specific sentiment category and provides the sentiment polarity of the word (1 for positive, 2 for negative).The parts of speech include adjectives, nouns, etc.Based on the sentiment orientations of each review in R i , we can obtain the sentiment orientations of each review r m about t_i (that is, the ith TSD), i = 1, 2, . . ., I, m = 1, 2, . . ., M, and convert the results into nominal encoded data, as shown in Table 4.The missing values in Table 4 indicate that there is no review about t i in the review r m , or that there is no sentiment orientations shown in the review about t i .Let F = {Pos, Neg}(i = 1, 2, . . .I; m = 1, 2, . . ., M) denote the sentiment orientations of review r m about t i .Through Formula (1), we can convert nominal encoded data into structured data, as shown in Table 5.As can be seen from Table 5, if r m 's sentiment orientations about t i are positive, (E Pos im ,

Evaluate the Impact of Each TSD's Sentiment Orientation on Tourist Satisfaction
In this section, we propose a method based on backpropagation neural network (BPNN) to measure how tourists' sentiments towards TSDs affect their satisfaction.
Most current studies modeling tourist satisfaction from online review data follow the following assumptions [28,29]: (1) assume that tourist satisfaction (i.e., online rating) follows a Gaussian distribution; (2) at the same time, it is assumed that tourist satisfaction is a linear combination of tourists' emotional attitudes towards TSDs; (3) in addition, the multicollinearity between different TSDs is low.However, in many practical problems, these assumptions cannot be satisfied.In practice, the TSDs mined from online reviews and the online ratings of tourists usually have the following characteristics: (1) tourist satisfaction (online ratings) usually follows the positive skew, asymmetric, bimodal (or J-shaped) distribution [23]; (2) tourist satisfaction may be a nonlinear combination of sentiments toward TSDs; (3) there may be a multicollinearity relationship between the different attributes automatically mined from the reviews, and there may be a complex nonlinear relationship between the attributes and tourist satisfaction.In fact, tourist satisfaction is a complex union of their emotional attitudes towards the full range of TSDs involved in the reviews.Therefore, considering the above characteristics, this section proposes a new method to measure the impact of tourists' TSD sentiments on tourist satisfaction.
Neural network (NN) is an effective prediction method.In some complex data environments (such as non-normal data, nonlinear relationship, multicollinearity, etc.), NN is significantly better than the multiple regression model because it is not affected by collinear independent variables and does not require the linear assumption of multivariate input variables and dependent variables [7,33].Although NN is proposed to be used for prediction tasks, some studies have shown that it can also be used to determine the weight information of input variables [34].For example, artificial NNs were utilized in the study [35] to evaluate the relative importance of the influence factors of consumer acceptance of behavioral-targeted advertising services.Therefore, NN is undoubtedly a competitive alternative method for measuring the influence of positive and negative TSDs sentiments of tourists on their satisfaction.BPNN is one of the most popular NN models, which will be used as the importance measurement technique in this paper.Specifically, this paper builds a BPNN model with three layers of network structure, including input layer, hidden layer, and output layer, as shown in Figure 3 below.Generally, BPNN includes two processes of forward information propagation and reverse error propagation to train the model.In the forward process, the input node transmits the tourists' sentiment information about each TSD to the hidden node, and then the hidden node transmits the corresponding information to the output layer through the activation function.In the reverse process, according to the error between the model calculation results and the real results, the gradient descent method is used to minimize the error, and the model parameters (namely the weight) are updated [36].
{( 1 ,  1 ), ( 2 ,  2 ), … , (  ,   )},  = 1,2, … , .The of calculating the impact of tourist' TSD sentim BPNN training.(1) Let  denote the trained BPNN model.between the input nodes    and    in BPN tively (the blue line in Figure 3), where  = 1,2. . . between the ℎth hidden node in ℎ and the out  in BPNN b and the hth hidden node, respectively (the blue line in Figure 3), where i = 1, 2 . . .I, h = 1, 2 . . ., D. Let w h denote the weight between the hth hidden node in h and the output node (yellow line in Figure 2), h = 1, 2 . . ., D. Let W Pos (2) In order to reduce the overfitting problem and enhance the reliability of the results, we conduct 10-fold cross-validation on the dataset.According to Equations (2) and (3), we calculate the sentiment weight information of the 10 trained BPNN models, respectively, and take their average value as the final required result, denoted as W Pos i and W Neg i , i = 1, 2 , . . ., I.
(3) Based on the calculated W Pos i and W Neg i , we can evaluate the total impact (relative importance weight) of the ith TSD on tourist satisfaction.Let R i denote the range of influence of the ith TSD on tourist satisfaction.R i can be calculated by Formula (4): , (i = 1, 2, . . ., I).

TSD Category Recognition Based on Kano Model
According to the obtained W  (1) In Figure 4a, positive sentiment is considered as the performance of the TSD achieves the requirements of tourists (that is, the green rectangle in Figure 4a); in contrast negative sentiment is considered to be when the performance of the TSD does not achieve the requirement of tourists (the red rectangle in Figure 4a).In addition, the online ratings of tourists indicate the overall satisfaction of tourists with the tourism services they en joyed.
(2) In Figure 4b, with the introduction of  ̅   and  ̅   , the traditional Kano model framework is divided into two parts.Among them, the right side is the part related to positive sentiments, that is, the requirements for the TSD are fulfilled.Meanwhile  ̅   can be regarded as the influence of   on the overall satisfaction of tourists when TSD   is satisfied; accordingly, the left side of Figure 4b is the part related to negative sentiments, that is, the requirements of the TSD are not fulfilled.Meanwhile,  ̅   can be regarded as the influence of   on the overall satisfaction of tourists when TSD   is not satisfied.At this point, the detailed meanings of  ̅   and  ̅   are as follows: (i)  ̅   > 0 indicates that the overall satisfaction of tourists will increase when thei requirements for   are satisfied; (ii)  ̅   ≤ 0 indicates that the overall satisfaction of tourists will not increase when (1) In Figure 4a, positive sentiment is considered as the performance of the TSD achieves the requirements of tourists (that is, the green rectangle in Figure 4a); in contrast, negative sentiment is considered to be when the performance of the TSD does not achieve the requirement of tourists (the red rectangle in Figure 4a).In addition, the online ratings of tourists indicate the overall satisfaction of tourists with the tourism services they enjoyed.
(2) In Figure 4b, with the introduction of W , the traditional Kano model framework is divided into two parts.Among them, the right side is the part related to positive sentiments, that is, the requirements for the TSD are fulfilled.Meanwhile, W Pos i can be regarded as the influence of t i on the overall satisfaction of tourists when TSD t i is satisfied; accordingly, the left side of Figure 4b is the part related to negative sentiments, that is, the requirements of the TSD are not fulfilled.Meanwhile, W Neg i can be regarded as the influence of t i on the overall satisfaction of tourists when TSD t i is not satisfied.At this point, the detailed meanings of W ≤ 0 indicates that the overall satisfaction of tourists will not increase when their requirements for t i are satisfied; (iii) W Neg i ≥ 0 indicates that the overall satisfaction of tourists will not decrease when their requirements for t i are not satisfied; (iv) W Neg i < 0 indicates that the overall satisfaction of tourists will decrease when their requirements for t i are not satisfied.
In Figure 4c, W Pos i and W Neg i are denoted as the horizontal axis and vertical axis, respectively.Therefore, the TSD represented as a curve in Figure 4b can be converted into a point in Figure 4c.Thus, according to the basic principles of the Kano model, combined with , TSDs can be divided into five types in Figure 4c, the detailed explanation is as follows: , this indicates that t i has a very small effect on the overall satisfaction of tourists, and t i is an indifferent attribute.It is worth noting that that means the threshold that determines whether a CSD is an indifferent property; the classification conditions for other cases are as follows, (ii)-(v); , then t i is a must-be attribute, that is, when the requirements of tourists for t i are satisfied, the overall satisfaction of tourists will not increase.When they are not satisfied, the overall satisfaction of tourists will decrease; , then t i is a reverse attribute, that is, when the requirements of tourists for t i are satisfied, the overall satisfaction of tourists will not increase.When they are not satisfied, the overall satisfaction of tourists will not decrease; ( , then t i is a performance attribute, that is, when the requirements of tourists for t i are satisfied, the overall satisfaction of tourists will increase.When they are not satisfied, the overall satisfaction of tourists will decrease; , then t i is an excitement attribute, that is, when the requirements of tourists for t i are satisfied, the overall satisfaction of tourists will increase.When they are not satisfied, the overall satisfaction of tourists will not decrease.

Empirical Research Based on Online Review Data
This section will verify the method proposed in Section 3 by online reviews posted by tourists.This study is carried out on a PC, which is configured with 8 GB memory space, 1.8 GHz dual-core Intel Core i5 processor, and an iOS operating system.Data processing and method implementation uses Python.The following chapters first introduce the experimental data, then explain the experimental procedures and some important experimental results.

Data Source
The tourism industry is a pillar industry in Guizhou, China.For this study, we crawled 10,307 online reviews of tourist attractions in Guizhou between 1 January 2020, and 31 May 2021, from Ctrip (www.ctrip.com,accessed on 8 June 2021), one of the most popular online travel platforms in China.The time span of the dataset includes the initial period of the COVID-19 outbreak to the period of the normal management of the pandemic, so it can reflect the views of tourists on the services of tourist attractions in Guizhou during the COVID-19 period.Figure 5 is an example of the data collected.As can be seen from the figure, the collected data include the review text of tourists, the ratings of tourists (overall satisfaction), the date of the reviews, etc. popular online travel platforms in China.The time span of the dataset includes the initial period of the COVID-19 outbreak to the period of the normal management of the pandemic, so it can reflect the views of tourists on the services of tourist attractions in Guizhou during the COVID-19 period.Figure 5 is an example of the data collected.As can be seen from the figure, the collected data include the review text of tourists, the ratings of tourists (overall satisfaction), the date of the reviews, etc.A major attraction with excellent overall management.One is that the arrangement of playing vehicles is good, and you can choose which one to play first, which helps to divert traffic.Another is that there are many places for dining in the scenic area.It is necessary to visit a scenic spot for 6 hours, and after seeing the affordable and high-quality products, KFC can meet the needs of different tourists without having to carry dry food.This is something that no major scenic spot can achieve after visiting so many places.This is the reason for giving praise besides the scenery.

Exploring Tourist Sentiments towards TSDs
According to the process of CSDs extraction in Section 3.1.1,we use the LDA model to extract CSDs from 10,307 reviews.The LDA parameters are num_topics = 20, al pha = 0.1, eta = 0.01 and passes = 2000.By filtering the noise words under each topic, merging topics with similar meanings, and assigning a label to each topic, we finally obtain 18 TSDs, as shown in Table 6.In Table 6, C i (i = 1, 2, . . ., 18) is the number of frequent words included in t i (i.e., TSD), and the total frequency is the total number of times the word word ij in t i appears in all reviews.As can be seen from the table, 'Huangguoshu Waterfall' and 'sightseeing transportation in tourist attraction' are the two TSDs about which the tourists are most concerned, with each TSD appearing more than 7000 times in the total review data.Secondly, two TSDS, 'general feeling' and 'ticket service', appear 6400 times and 4800 times, respectively.Next, the four TSDs of 'Pingba Cherry Garden', 'Zunyi Red Culture', 'Fanjingshan', and 'Minority Culture' all appear more than 3000 times.According to Section 3.1.2,the sentiment orientation of 10,307 reviews with respect to 18 TSDs can be decided.The statistical results for the sentiment orientation of each TSD across all reviews are shown in Figure 6.As can be seen from the figure, for all TSDs, the number of positive reviews outnumbers the number of negative reviews.Among them, t 16 , i.e., the dimension of 'overall feeling', shows the largest number of positive sentiments, which is much higher than the other dimensions.On the contrary, t 5 , i.e., the dimension of 'scenic tourism transportation' has the most negative sentiments.Next, we convert the nominal encoded data into structured data according to Formula (1).

Measuring the Influence of CSDs Sentiment on Tourist Satisfaction
We used the obtained structured data to train the BPNN model (the number of nodes in the hidden layer is 37), where the network parameters are set as the learning rate = 0.6, maximum allowable error = 0.01, and the number of iterations = 1000.In order to overcome the overfitting problem and enhance the reliability of the results, we performed 10-fold cross-validation on the dataset and recorded the weight parameter information of the 10 trained BPNN models.The RMSE values of the 10 models are shown in Figure 7.It can be seen from the figure that the RMSE values of the 10 models are all small, which can obtain better prediction performance.Finally, according to Formulas (2) and ( 3 7. ( = 1 each TSD, respectively, and calculate their average v each TSD, as shown in Table 7.

Measuring the Influence of CSDs Sentiment on Tour
Based on the results in Table 7 and the EKM attr 4, the TSD categories of Guizhou tourism services a the figure, 0.0056 (that is, 1 (10 × 18) ⁄ ) is the thresho an indifferent TSD (that is, the blue area in Figure 8) TSDs of most Guizhou tourism services belong to t identified as the excitement attribute.Specifi 'Huangguoshu Waterfall', 'Karst cave scenery', 'Hu ture', and 'tourist traffic of tourist attractions'.Must-'Miao Village', 'natural scenery', 'Guiyang landmar services', 'night views of Maotai Town', 'Pingba cher and 'ancient town tours'; performance TSDs includ feeling'; reverse TSDs are adverse environmental fac

Measuring the Influence of CSDs Sentiment on Tourist Satisfaction
Based on the results in Table 7 and the EKM attribute classification process in Figure 4, the TSD categories of Guizhou tourism services are obtained, as shown in Figure 8.In the figure, 0.0056 (that is, 1/(10 × 18)) is the threshold that determines whether a TSD is an indifferent TSD (that is, the blue area in Figure 8).As can be seen from the figure, the TSDs of most Guizhou tourism services belong to the must-be attribute, and no TSD is identified as the excitement attribute.Specifically, indifferent TSDs include 'Huangguoshu Waterfall', 'Karst cave scenery', 'Huaxi Wetland Park', 'Zunyi Red Culture', and 'tourist traffic of tourist attractions'.Must-be TSDs include 'Fanjing Mountain', 'Miao Village', 'natural scenery', 'Guiyang landmarks', 'tour guides and hotels', 'ticket services', 'night views of Maotai Town', 'Pingba cherry blossom Garden', 'ethnic culture', and 'ancient town tours'; performance TSDs include 'Qianling Mountain', 'the overall feeling'; reverse TSDs are adverse environmental factors.

Systems 2023, 11, x FOR PEER REVIEW
According to the classification results of TSDs, we can determine the TSD order in the formulation of the tourism service promotion strategy.First of all, th 10 must-be TSDs represent the tourists' basic requirements for Guizhou tourism s When these requirements are not fulfilled, tourists will feel very dissatisfied.Th relevant tourism managers should first examine must-be TSDs and try to fulfill quirements associated with them.Then, since tourist satisfaction is directly propo to the level of performance TSDs, relevant managers should consider fulfilling t requirements related to performance TSDs on the premise of fulfilling the must-b Finally, for reverse TSDs, managers should carefully examine these factors to avo situations as much as possible.

Conclusions
Online tourism platforms provide an open and convenient channel for tou share their experiences in travel.At the same time, for the tourism industry, these According to the classification results of TSDs, we can determine the TSD priority order in the formulation of the tourism service promotion strategy.First of all, the above 10 mustbe TSDs represent the tourists' basic requirements for Guizhou tourism services.When these requirements are not fulfilled, tourists will feel very dissatisfied.Therefore, relevant tourism managers should first examine must-be TSDs and try to fulfill the requirements associated with them.Then, since tourist satisfaction is directly proportional to the level of performance TSDs, relevant managers should consider fulfilling tourists' requirements related to performance TSDs on the premise of fulfilling the must-be TSDs.Finally, for reverse TSDs, managers should carefully examine these factors to avoid such situations as much as possible.

Conclusions
Online tourism platforms provide an open and convenient channel for tourists to share their experiences in travel.At the same time, for the tourism industry, these online travel reviews also contain valuable information related to the quality of tourism services.This paper focuses on how to measure the key factors of customer satisfaction from the online travel review data and then classifies the key factors to provide the basis for the tourism service quality improvement strategy.To this end, this paper proposed a framework for modeling customer satisfaction from free review texts.First, this paper used the LDA model to explore the key dimensions of tourist satisfaction from online reviews.Next, based on the sentiment dictionary, we identified tourists' sentiment attitudes towards all dimensions from these reviews; then, we used the deep learning network to measure the complex relationship between tourists' sentiment orientations to different dimensions and their satisfaction.Finally, according to the improved Kano model, namely EKM, multidimensional attribute classification was realized to support the analysis of tourism service quality improvement.

Theoretical Contribution
The contribution of this paper has three aspects.First, this paper contributes to the research of online product/service reviews in the field of Information Systems (IS).Most of the related literature studies the impact of online reviews on customer purchases [39,40], explores the antecedents of the usefulness [41,42] of product reviews, or studies the influencing factors of online review behavior [43].The research of this paper focuses on the mining of customers' experience of service in online reviews, expands the scope of IS research, and puts forward an effective and feasible method to transform a large amount of online review data into useful business intelligence.Specifically, compared to [7,[21][22][23][24], we model customer satisfaction more accurately through BPNN, and classify the types of service attributes through quantitative calculation process of EKM.
Secondly, this study contributes to the literature in the field of tourism.This study provides an effective methodological framework for analyzing tourist satisfaction from the perspective of customer demand.Generally, customer satisfaction is usually evaluated by conducting market surveys or experiments to collect and analyze customer preference data.However, this method of customer preference survey often requires a lot of time and cost.The method framework provided in this study can be refined into a dimensional evaluation of tourism service satisfaction based on online travel review data.Especially for the influence calculation of tourist satisfaction dimension based on BPNN, this method can model more complex data situations and does not require the assumption of Gaussian distribution of satisfaction (online rating) and the additive independence between all attribute sentiments, which is more realistic.
Thirdly, this study contributes to the literature in the field of market analysis.This study proposes an Effect-based Kano model (EKM), which is oriented by tourist satisfaction and realizes the classification and prioritization of tourists' requirements in the dimension of service attributes.By perfectly integrating the influence of customer sentiment on satisfaction in service dimensions with the traditional Kano model, EKM can analyze and visualize customer requirements for tourism services.Using the dataset in this paper, the validity of EKM can be empirically explained.

Practical Contribution
This article also contributes several valuable enlightenments to the tourism management industry.Comprehensively using the proposed method, this paper mines four types of information from online travel reviews, including TSDs of tourism service, tourist sentiments towards these TSDs, the impact of tourist sentiments towards TSDs on their satisfaction, and the category of each TSD.The four kinds of mined information can provide helpful information for real tourism industry managers, and further provide reference analysis methods for other product/service industries.
Firstly, the TSDs mined from online reviews usually represent the aspects that tourists care most about tourism services.Therefore, according to the extracted TSDs, managers can know which tourism service dimensions are of most concern to tourists.
Secondly, tourists' views or sentiments on tourism service TSDs reflect their multidimensional perception of tourism services and can be regarded as the actual performance of the tourism service dimension provided.Therefore, managers can evaluate the performance of tourism services in each dimension based on customers' views or sentiments on the service dimensions.On this basis, managers can further understand the advantages and disadvantages of tourism services.
Thirdly, the proposed method enables managers to assess the influence of tourists' TSD sentiment on their satisfaction, which can be regarded as aggregated customer preferences, which is important for planning and decision-making related to service quality optimization.
Fourthly, the method in this paper can enable managers to understand the different categories of tourism service TSDs, which is crucial for managing tourist demands and deciding service optimization strategies.Specifically, the service characteristics of tourism are divided into five categories, including performance type, excitement type, must-be type, reverse type, and indifferent type.This understanding can help decision makers of the tourism industry to decide the priority of TSD development or promotion and, make more effective development plans for tourism service quality.

Limitations and Future Research Directions
There are also several limitations in this study, which deserve further attention.Firstly, this study assumes that all online reviews obtained from tourism social platforms are authentic.However, spam reviews, which may be manipulated or fake, are common on online platforms.At present, fake review recognition is receiving more and more research attention [44].Therefore, it is a necessary and interesting research line to combine effective fake review recognition methods with this study.Secondly, a potential problem of this study is the representativeness of reviewers on online tourism platforms, that is, it is widely recognized that online reviewers may not be effective representatives of all target groups.Therefore, it is not reasonable to use online tourism reviews to infer the tourism service demands of all tourists for the representativeness of the reviewers and online reviews adopted.To reduce this concern, one possible direction for future research is to incorporate review data from multiple sources (multiple online tourism platforms) to reduce bias issues related to a single platform.Alternatively, we can collect some characteristic information of reviewers and adopt some statistical methods to extend the self-selection questions in online review data.Thirdly, this study analyzes all tourists as a whole.In fact, different consumer types often have different requirements for service attributes.In the future, we can further consider the customer type factor for analysis based on the current study.Fourthly, a potential research direction is to study other potentially more effective methods for analyzing tourist sentiment, such as building a professional sentiment dictionary in the field of tourism, so as to improve the accuracy of analysis results.

Figure 2 .
Figure 2. Framework for modeling tourist satisfaction based on online reviews.

3 .
Tourist satisfaction dimension (TSD): Tourists usually evaluate tourism serv based on their perception of some important attributes of the tourism services t experienced.Similar to the research of Guo et al. [8], this paper defines these portant attributes of tourism services as TSDs; 4. Classification of TSDs: In this paper, under the classification framework of the K model, TSDs are divided into five categories, namely, performance TSD, excitem TSD, must-be TSD, reverse TSD, and indifferent TSD, whose meanings corresp to the five basic attributes of Kano model.Based on Figure 2 and an explanation of the basic concepts, let's look at the th main parts of the framework.

Figure 2 .
Figure 2. Framework for modeling tourist satisfaction based on online reviews.

Figure 3 .
Figure 3. Schematic diagram of BPNN structure.Let x m = (E Pos 1m , E Neg 2m , . . ., E Pos im , E Neg im , . . ., E Pos Im , E Neg Im ) denote the structured data of r m , that is, the emotional attitudes of tourists towards each TSD (a row of data in Table 5), where E Pos im , E Neg im ∈ {0, 1} and E Pos im + E Neg im ≤ 1(i = 1, 2, . . ., I, m = 1, 2, . . ., M).In addition, let y m denote the review r m corresponding to tourist satisfaction.Then, the training sample is composed of x m and y m , which can be denoted as S = {(x 1 , y 1 ), (x 2 , y 2 ), . . . ,(x M , y M )}, m = 1, 2, . . ., M. The following describes in detail the method of calculating the impact of tourist' TSD sentiment on tourist satisfaction based on the BPNN training.(1) Let b denote the trained BPNN model.Let w Pos ih and w Neg ih denote the weight denote the influence of positive and negative sentiments of tourists towards each TSD on tourist satisfaction.W Pos i and W Neg i can be obtained from Formulas (2) and (3), respectively.

,
as well as the basic principle of the Kano model, we proposed a model based on the effect of Kano (Effect-Based Kano (EKM)), which can identify the category of each TSD from the perspective of tourists.The core idea of EKM is shown in Figure 4. ystems 2023, 11, x FOR PEER REVIEW 12 of 1 which can identify the category of each TSD from the perspective of tourists.The core idea of EKM is shown in Figure 4. (a) Basic Kano (b) Weight (c) Classification
that the overall satisfaction of tourists will increase when their requirements for t i are satisfied;(ii) W Pos i

Figure 5 .
Figure 5.An online review posted by a tourist.

Figure 5 .
Figure 5.An online review posted by a tourist.

Figure 6 .
Figure 6.Statistical results of TSD sentiment orientations in the review data.

Figure 6 .
Figure 6.Statistical results of TSD sentiment orientations in the review data.
), we can calculate the importance information W Pos i and W Neg i (i = 1, 2, . . ., 18) of 10 BPNN models about each TSD, respectively, and calculate their average value as the final W Pos i and W Neg i of each TSD, as shown in Table

Table 1 .
Three examples of online travel text reviews.

Table 2 .
Examples of short sentence extraction about TSDs.

Table 3 .
Examples of Dalian University of Technology Chinese Emotion Vocabulary Ontology Database.

Table 4 .
Nominal coded data for each TSD in online travel reviews.

Table 5 .
Structured online travel reviews.

Table 6 .
The extracted tourist satisfaction dimension based on LDA.

Table 7 .
The final W

Table 7 .
The final  ̅   and  ̅ for each TSD.