Exploring Bidirectional Performance of Hotel Attributes through Online Reviews Based on Sentiment Analysis and Kano-IPA Model

As people increasingly make hotel booking decisions relying on online reviews, how to effectively improve customer ratings has become a major point for hotel managers. Online reviews serve as a promising data source to enhance service attributes in order to improve online bookings. This paper employs online customer ratings and textual reviews to explore the bidirectional performance (good performance in positive reviews and poor performance in negative reviews) of hotel attributes in terms of four hotel star ratings. Sentiment analysis and a combination of the Kano model and importance-performance analysis (IPA) are applied. Feature extraction and sentiment analysis techniques are used to analyze the bidirectional performance of hotel attributes in terms of four hotel star ratings from 1,090,341 online reviews of hotels in London collected from TripAdvisor.com (accessed on 4 January 2022). In particular, a new sentiment lexicon for hospitality domain is built from numerous online reviews using the PolarityRank algorithm to convert textual reviews into sentiment scores. The Kano-IPA model is applied to explain customers’ rating behaviors and prioritize attributes for improvement. The results provide determinants of high/low customer ratings to different star hotels and suggest that hotel attributes contributing to high/low customer ratings vary across hotel star ratings. In addition, this paper analyzed the Kano categories and priority rankings of six hotel attributes for each star rating of hotels to formulate improvement strategies. Theoretical and practical implications of these results are discussed in the end.


Introduction
Unlike using recommendations of relatives and friends in the past, people increasingly make hotel booking decisions relying on online reviews on various online travel platforms in the modern era. Hotel online reviews are posted by numerous customers according to their experiences in hotels, which are perceived as more objective, trustworthy and helpful than information provided by hotels [1,2]. Online reviews generally consist of online ratings and textual reviews. Online ratings signal customer satisfaction or dissatisfaction with hotels. Textual reviews contain customers' actual expectations, feelings and perceptions about hotel services. According to bounded rationality model, customers are unable to elaborate and extract useful information from numerous and heterogeneous data, thus driving them to prefer and rely more on ratings than on textual reviews [3]. As more and more potential customers regard the online ratings as one of the direct references of hotel quality when selecting hotels, it is crucial for hotels to obtain high customer ratings to achieve the goal of improving online bookings [4,5]. Therefore, exploring what contributes to the difference in online ratings between satisfied and dissatisfied customers is particularly important for hotels. In other words, for the purpose of being competitive sustainably in 2 of 33 the hospitality industry, it is critical for hotels to understand the determinants of customer satisfaction and dissatisfaction which are proxied by online ratings [6,7].
Existing studies have proved that the performance of multiple hotel attributes is strongly correlated with customer satisfaction [8][9][10]. Most studies have investigated the hotel attributes that lead to customer satisfaction and dissatisfaction through surveys [11][12][13]. Recently, with the development of data mining techniques, online reviews serve as the promising data source for customer satisfaction analysis. Several scholars have analyzed the attribute performance through online reviews using sentiment analysis methods, and hence found the determinants of customer satisfaction in the hotel industry [14,15]. However, these studies processed hotel reviews as a whole dataset, neglecting discriminating positive and negative reviews. Processing hotel reviews as a whole can compare the overall performance of multiple attributes from the perspective of all customers but could not distinguish between contributors of customer satisfaction and factors resulting in customer dissatisfaction. Previous studies have found that dual-valence (that is, featuring both positive and negative sentiment) reviews existing in hotels of one-five-star ratings [16,17]. The presence of negative sentiment toward attributes in positive reviews and positive sentiment toward attributes in negative reviews was observed [18][19][20]. In other words, even if the performance of several hotel attributes does not meet customer expectations, customers are still satisfied with the hotel and give high ratings to the hotel because of the good performance of other hotel attributes. Meanwhile, customers can be very dissatisfied with the hotel and give low ratings to hotels when the performance of some certain hotel attributes is poor, even though they think other hotel attributes perform well. Therefore, it's necessary to investigate the following question: Research Question 1 (RQ1). Which hotel attribute with good performance contributes to high customer ratings and which hotel attribute with poor performance causes low customer ratings?
In fact, it should be pointed out that customers' expectations and perceptions vary across different market segments, such as different hotel star ratings [14,21]. Exploring the determinants of customer satisfaction and dissatisfaction of each market segment is beneficial for making more appropriate and precise strategies [10]. Moreover, it is helpful for hotel managers to understand customer demands for different star hotels in the decision-making of marching into new markets through comparing the difference of attribute performance in different star hotels. However, whether the hotel attribute contributing to high/low customer ratings varies across different star hotels has not been verified. Therefore, this study intends to investigate the following question: Research Question 2 (RQ2). Does the hotel attribute contributing to high/low customer ratings vary across different star hotels?
To answer the above two questions, it is necessary to analyze the effect of attribute performance on customer satisfaction. Customers' preferences, expectations and perceptions on each hotel attribute are influenced by comprehensive factors, thus driving positive and negative customer evaluations toward the bidirectional (good and poor) performance of hotel attributes [22]. Traditionally, one unit increase in good performance and one unit decrease in poor performance concerning a certain hotel attribute should cause the same change of customer satisfaction, thus the relationship between attribute performance and customer satisfaction is assumed to be linear or symmetric [23]. However, some studies have demonstrated that some attributes provide more satisfaction than dissatisfaction [24][25][26]. In other words, hotel attributes can have asymmetric effects on customer satisfaction [24]. The Kano model was proposed by Kano et al. (1984) to identify these non-linear or asymmetric relationships between attribute performance and customer satisfaction. The Kano model is often applied to classify hotel attributes into different categories in terms of customer demands, which is helpful for hotel managers to better understand customer expectations and perceptions [27,28]. Meanwhile, considering the limited hotel resources, it is critical to determine attribute priority to maximize customer satisfaction through service improvement. Many studies have shown that applying a combination of the Kano model and importance-performance analysis (IPA) in customer satisfaction analysis can not only analyze customer requirements toward service attributes, but also determine attribute priority [10,[29][30][31][32][33]. The IPA is a common and effective technique to formulate improvement strategies according to the importance and performance of the attribute [34]. However, existing studies concerning the Kano-IPA model are mainly based on surveys, and few studies use online reviews as a data source for the Kano-IPA model. There are two main reasons limiting the application of the Kano-IPA model in online reviews. On the one hand, online textual reviews are unstructured and therefore need to be processed before they can be converted into usable structured data. On the other hand, there is a question of how to apply the processed data to the customer satisfaction model to obtain different Kano categories. Considering online reviews serving as promising data source for analyzing and improving hotel services, this study intends to apply feature extraction and natural language processing (NLP) techniques to conduct Kano-IPA model through online reviews.
In summary, this study aims to identify the well-performed attributes contributing to high customer ratings and poorly performed attributes causing low customer ratings for different star hotels. For this, firstly, we distinguish between positive and negative reviews for different star hotels according to online ratings. Next, we apply feature extraction and sentiment analysis techniques to explore bidirectional performance of hotel attributes. In particular, a new sentiment lexicon for hospitality domain was built from numerous online reviews using the PolarityRank algorithm. To further understand customers' rating behaviors and demands for hotel service, this study intends to conduct the Kano-IPA model through online reviews for attribute classification and prioritizing. We propose an approach to classify attributes into the Kano model, which provides convenience for the application of the Kano model in textual reviews. Lastly, the comparative analysis of attribute performance and priority rankings is carried out to enhance the understanding of customers' demands for different star hotels.
The remainder of this paper is organized as follows. Section 2 briefly reviews the relevant literature to provide the motivation for this study. Section 3 presents the framework and methodology employed in this study. Section 4 presents the results and provides some discussion of this study. Section 5 concludes and offers theoretical and practical implications, limitations, and directions for future research.

Studies on Hotel Online Reviews
Hotel online reviews, in the form of online ratings and textual reviews, represent customers' emotions and experience toward service quality based on their expectations against their actual experience. In general, most websites collect customer ratings and opinions on hotels by offering several critical attributes for evaluation. Many researchers have used these hotel attributes to explore customer behaviors. For example, Wang et al. [35] investigated the importance of six attributes including value, location, service, room, cleanliness and sleep quality offered by TripAdvisor.com during the process of hotel selection decision-making. Liu at al. [9] verified the differences of these six hotel attributes' preferences between domestic and international tourists. Bi et al. [10] also used online reviews from TripAdvisor.com to analyze the asymmetric effect of the performance of these six attributes on overall customer ratings. Nicolau et al. [36] analyzed the influence of the variations in the ratings of hotel attributes (comfort, staff, services, value for money and cleanliness) on the variation in the ratings of location to test the halo effect, where these attributes are offered by Booking.com (accessed on 4 January 2022) for evaluation. Evidently, online reviews contain various information of service quality concerning hotel attributes. Thus, it's significant to extract useful information from massive online reviews to help hotel management to improve service quality.
Many previous studies focused on the analysis of numerical ratings, including the overall and multi-attribute ratings. Though overall ratings can indicate customers' overall satisfaction in a straightforward way [37], multi-attribute ratings can obtain a better understanding of factors driving customer satisfaction for different segments within hotels [38]. Sharma et al. [39] classified the multi-attribute ratings into positive, neutral and negative sentiments and then applied interval-valued neutrosophic TOPSIS for ranking hotels. Bi et al. [10] used both overall and multi-attribute ratings to explore the asymmetric effects of attribute performance on customer satisfaction. However, multi-attribute ratings are usually incomplete, limiting utilizing multi-attribute ratings to obtain more information about customers' feelings on service quality of each attribute.
Online textual reviews, a kind of unstructured data source, contain a wealth of information, including customers' preferences, expectations, feelings and perceptions toward hotels [6,38], have gained growing interest among scholars. Especially, with the advance in NLP techniques, more and more studies based on text analysis have been conducted. In the current hospitality research, topic analysis that aims to extract the review's important aspects has been popular and a number of topic mining algorithms have been applied. Guo et al. [40] used Latent Dirichlet Allocation (LDA) topic modeling tool to analyze customers' preferences of hotel attributes. Hu et al. [41] adopted a structural topic model text analysis method to analyze the causes of customers' complaints for hotel service improvement. Wang et al. [35] extracted key factors of different attributes for ranking hotels using the term frequency-inverse document frequency (TF-IDF) and Word2Vec methods. While topic mining is useful to identify key service factors, it cannot reflect whether customers are satisfied with hotel service quality. Sentiment analysis of textual data, which focuses on extracting a review's sentiment polarity, such as positive, negative and neutral, can indicate customers' real emotions and satisfaction toward hotel services [42,43]. Therefore, some hospitality scholars have gradually applied both topic mining and sentiment analysis techniques to capture customers' concerns, emotions, satisfaction and complaints toward hotel services. Bi et al. [44] applied LDA and IOVO-SVM algorithms to identify hotel attributes and their sentiment strengths to conduct IPA plotting for attribute improvement strategies. Al-Smadi et al. [45] used the bidirectional Long Short-Term Memory (LSTM) to extract opinionated aspects and their polarity from Arabic hotels' reviews. Nie et al. [46] applied a semantic partitioned sentiment dictionary to obtain sentiment values of different attributes to rank hotels.
Existing studies suggested that online textual reviews indicate details on customers' demands and perceptions of hotel attributes [6,9]. Although numerical textual reviews have been studied widely to extract critical hotel attributes and their sentiment [14,15,46], few studies have distinguished between positive reviews and negative reviews to identify wellperformed attributes contributing to satisfied customers and poorly performed attributes resulting in dissatisfied customers. Therefore, this study attempts to explore the difference between the good performance and poor performance of the same attribute with respect to positive reviews and negative reviews concerning different star hotels through sentiment analysis.

Studies on Sentiment Analysis
Sentiment analysis has emerged as an important aspect of NLP. Sentiment analysis leverages a variety of NLP techniques to extract the sentiment expressed in texts and determine whether they are positive, negative or neutral [47,48]. Analysis of text sentiments has spread across many fields such as consumer information, marketing, books, application, social media, tourism destination and hotels [49][50][51][52][53][54][55]. The approaches to sentiment analysis can be mainly divided into two types, namely, machine learning and lexicon-based methods.
Machine learning methods represent documents as vectors in a feature space and classify them into predefined sentiment categories [56]. There are several machine learning methods for sentiment classification, such as naive Bayes (NB), maximum entropy and sup-Appl. Sci. 2022, 12, 692 5 of 33 port vector machine (SVM). These classifiers usually use the bag-of-words (BoW) method comprising unigrams or n-grams to determine how the documents are represented [57], resulting in high dimensionality of the feature space. With the help of feature selection techniques, such as part-of-speech (POS) tagging which aims to disambiguate text sense based on a lexical category, machine learning algorithms can reduce high-dimensional feature space by eliminating the noisy and irrelevant features. Existing studies concluded that some machine learning classifier algorithms have better performance than lexicon-based [58,59], but these methods have some certain defects: (1) classifiers trained for a domain-specific problem do not perform well in other domains [58]; (2) feature construction is critical but can hardly implement [60] and (3) these methods usually rely on a great volume of manually labeled training data [61]. Given these drawbacks, unsupervised methods like lexicon-based methods are applied.
Lexicon-based methods are based on the assumption that the contextual sentiment orientation is the sum of the sentiment orientation of each word or phrase by matching a word or phrase with words from sentiment lexicon and their associated sentiment scores [62]. In general, adjectives are used as indicators of the semantic orientation of a text [58]. More recently, verbs and nouns are also used to compile into a sentiment dictionary [63]. Such a lexicon or dictionary can be created manually, or automatically, using seed words to expand the list of words [64]. Abdulla et al. [65] built a lexicon for the Arabic language and the proposed approach gained better accuracy than other methods. Taboada et al. [58] constructed a dictionary incorporating intensification and negation to compute text sentiment scores, which is called the semantic orientation calculator (SO-CAL) approach. Dey et al. [66] developed an n-gram sentiment dictionary called Senti-N-Gram for automatic score calculation. Compared to machine learning methods, sentiment lexicons learned from a certain domain preserve the domain-based orientation of words, which provides greater accuracy for sentiment analysis tasks [67]. Furthermore, lexicon-based methods take the lexical and syntactical information in linguistic content into account in order to revise the sentiment valence [56]. That is, in the sentiment scoring process, negation, intensification, and the rhetorical roles of text segments are taken into account as well. The language-dependent features can also be considered in lexicon-based model [68].
In summary, machine learning sentiment analysis trained on a particular dataset by using features, which may reach quite high accuracy in detecting the polarity of a text. However, this is highly dependent on labeled data, limiting its application. Unsupervised lexicon-based methods, such as knowledge-graph propagation and seed word-based methods, not only overcome the absence of labeled data, but are able to extract domainspecific sentiment words [69,70]. Thus, lexicon-based methods are considered preferable for sentiment analysis in a certain domain, and in this study an unsupervised lexicon-based sentiment method is used for sentiment analysis to explore the bidirectional performance of hotel attributes.

The Kano Model
Traditionally, customer satisfaction has been regarded as one-dimensional or symmetric: the higher the perceived product/service quality is, the higher the customer's satisfaction is and vice versa [23]. However, continuous improvements in product/service attributes without considering what customers actually want may not engender a higher level of customer satisfaction. Some researchers argued that the relationship between attribute performance and customer satisfaction is nonlinear or asymmetric [23,24,71]. Consequently, Kano et al. (1984) introduced a two-dimensional model, called the Kano model, that clarifies the asymmetric and nonlinear relationship between product/service attribute performance and customer satisfaction, and classifies the attributes into five categories, namely, must-be factors, one-dimensional factors, attractive factors, indifferent factors and reverse factors [72]. Later, the simplified Kano model classifies attributes into the following three factors, basic, performance and excitement factors corresponding to Appl. Sci. 2022, 12, 692 6 of 33 must-be, one-dimensional, attractive factors [73], which has been widely used in different research domains, as shown in Figure 1.
Appl. Sci. 2022, 11, x FOR PEER REVIEW 6 of 33 performance and customer satisfaction, and classifies the attributes into five categories, namely, must-be factors, one-dimensional factors, attractive factors, indifferent factors and reverse factors [72]. Later, the simplified Kano model classifies attributes into the following three factors, basic, performance and excitement factors corresponding to mustbe, one-dimensional, attractive factors [73], which has been widely used in different research domains, as shown in Figure 1. 1. Basic: These attributes are the basic requirements of product/service. Customers are extremely dissatisfied when these attributes don't meet their expectations. However, when customer expectations are exceeded, customers are just neutral since they take it for granted. 2. Performance: The performance of these attributes is positively and linearly related to customer satisfaction. In other words, the customer satisfaction increases with the increase in attribute performance, and vice versa. 3. Excitement: When the performance of these attributes exceeds customer expectations, customers are satisfied, but they are not dissatisfied when these attributes are absent. Therefore, good performance of this category has a stronger impact on customer satisfaction than its poor performance.
Identifying different categories of attributes is beneficial for hotels to understand the determinants of customer satisfaction and dissatisfaction, and hence improving service attributes effectively [29,74]. Many scholars have applied the Kano model to understand customer expectations and perceptions toward different service attributes in hospitality research, as shown in Table 1. Explore nonlinear effects of service quality on customers' evaluation of three-star hotels in Rio de Janeiro, Brazil. CIT and PRCA TripAdvisor.com

1.
Basic: These attributes are the basic requirements of product/service. Customers are extremely dissatisfied when these attributes don't meet their expectations. However, when customer expectations are exceeded, customers are just neutral since they take it for granted.

2.
Performance: The performance of these attributes is positively and linearly related to customer satisfaction. In other words, the customer satisfaction increases with the increase in attribute performance, and vice versa.

3.
Excitement: When the performance of these attributes exceeds customer expectations, customers are satisfied, but they are not dissatisfied when these attributes are absent. Therefore, good performance of this category has a stronger impact on customer satisfaction than its poor performance.
Identifying different categories of attributes is beneficial for hotels to understand the determinants of customer satisfaction and dissatisfaction, and hence improving service attributes effectively [29,74]. Many scholars have applied the Kano model to understand customer expectations and perceptions toward different service attributes in hospitality research, as shown in Table 1.  [30] Integrate IPA with the Kano three-factor theory to examine the difference of service attribute importance in different market segments using the case of luxury hotels in Macau.

Importance grid Questionnaire
Beheshtinia and Farzaneh Azad (2017) [12] Identify customer needs for the hotels and prioritize them using a combination of the SERVQUAL and Kano approaches.
Kano's method Questionnaire Cheng and Chen (2018) [11] Analyze competitive qualities required for improvement to enhance service quality of motels in Taiwan.

Moderated regression Questionnaire
Bi et al., (2020) [10] Explore the asymmetric effects of attribute performance on customer satisfaction with respect to different market segments.

PRCA
TripAdvisor.com Most of these studies are based on a questionnaire survey, scarcely extracting key attributes from big data such as online reviews. One of problems limiting its application to big data is that it is hard to classify service attributes into Kano categories using existing methods. In current Kano model analysis, several methods are introduced to classify quality attributes. Kano et al. (1984) provided an approach using a structured questionnaire with functional and dysfunctional questions for each attribute [73]. The penalty-reward contrast analysis (PRCA) has been used widely to classify quality attributes by regression analysis with two sets of dummy variables for each attribute [25,73]. The moderated regression approach based on a five-point Likert scale, proposed by Lin et al. [77], uses regression coefficients to classify attributes. Another quantitative method called the "importance grid" has also been applied to a variety of studies, which compares explicit and implicit importance of each attribute to category in three factors [30,78]. Qualitative data methods including critical incident technique (CIT) and the "analysis of complaints and compliments" (ACC) have been applied to category attributes by comparing the difference in attribute frequency mentioned by customers in a positive context or a negative context [76,79,80]. In conclusion, these methods distinguish between different types of attributes by comparing the impacts of good performance and poor performance of the attribute on customer satisfaction.
Most studies relied on questionnaire survey when using the above Kano classifying methods to category attributes, which indicates existing classifying methods may not be suitable for the application of the Kano model in numerous textual reviews because of unstructured feature. Following the Kano model classifying principle, this study aims to propose a novel approach to classify hotel attributes into the Kano model in text analysis. This new approach will provide support for the application of the Kano model in numerous unstructured data to explore customer satisfaction.

Importance-Performance Analysis
Importance-performance analysis (IPA), proposed by Martilla and James (1977), is a graphical tool to classify attributes for improvement and rank their priority based on the importance and performance of each product/service attribute [34,81]. This approach constructs a plot with two dimensions, importance and performance of product/service attribute perceived by consumers, and classifies attributes into four quadrants equipped with different management strategies. An example of IPA is given in Figure 2. Quadrant I is labeled 'Keep up the good work', where attributes are considered highly important, and their performance is high. Attributes located in quadrant I can be considered as the major strengths of the product/service and should be maintained. Quadrant II is labeled 'Possible overkill', where attributes have low importance but high performance. The resources dedicated to these attributes may be excessive, so reallocating limited resources to address other more important attributes is proper. Quadrant III is labeled 'low priority', where attributes have both low importance and performance. Attributes in this quadrant are regarded as the minor weakness and have a low priority for improvement. Quadrant IV is labeled 'Concentrate here', where attributes are considered highly important but are poorly performed. Attributes in this quadrant are regarded as the major weaknesses and should be given a high priority for improvement. are regarded as the minor weakness and have a low priority for improvement. Quadrant IV is labeled 'Concentrate here', where attributes are considered highly important but are poorly performed. Attributes in this quadrant are regarded as the major weaknesses and should be given a high priority for improvement. IPA is applied in a wide variety of research domains, partly due to the clear managerial strategies it provides on how to allocate resources and efforts [82], and also due to its ability to identify the strengths and weaknesses of product/service to guide management in taking effective measures to keep competitive [83]. In hospitality research, IPA is commonly integrated with other techniques, such as SERVQUAL [84,85], data envelopment analysis [86,87], partial least squares path modeling [88], the Kano model [10,[30][31][32][33]. Many scholars have applied a combination of the Kano model and IPA on customer satisfaction analysis. For example, Bi et al. [10] applied the Kano model and asymmetric IPA to explore the asymmetric effects of hotel attribute performance on customer satisfaction through online ratings. Jou and Day [31] integrated the Kano model and IPA into a three-dimensional IPA approach to identify the critical service attributes for hotel online booking through survey. Tseng [32] constructed an IPA-Kano model for classifying and diagnosing service attributes at the TPE airport. Pai et al. [33] combined the Kano model and IPA to investigate the critical service quality attributes to enhance customer satisfaction in the chain restaurant industry.
However, few hotel studies conducted Kano-IPA analysis using online textual review. Furthermore, few studies have applied Kano-IPA model to obtain the hotel attribute priority ranking for resource allocation to get improved across different hotel star ratings. These literature gaps need to be dealt with. Thus, considering the effectiveness of the Kano model and IPA for providing constructive guidelines to hotels to enhance customer satisfaction, it is of great significance to explore the application of Kano-IPA model in hotel textual reviews across different hotel star ratings.

Materials and Methods
The main objective of this study is to explore what contributes to the difference in hotel customer ratings for different star hotels. Specifically, this study identifies well-performed attributes contributing to high customer ratings and poorly performed attributes causing low customer ratings in terms of hotel star ratings by exploring the bidirectional performance of hotel attributes. This study also aims to apply the Kano-IPA model in online textual reviews for a better understanding of customers' rating behaviors and demands, and hence provides effective attribute improvement strategies for different star hotels. IPA is applied in a wide variety of research domains, partly due to the clear managerial strategies it provides on how to allocate resources and efforts [82], and also due to its ability to identify the strengths and weaknesses of product/service to guide management in taking effective measures to keep competitive [83]. In hospitality research, IPA is commonly integrated with other techniques, such as SERVQUAL [84,85], data envelopment analysis [86,87], partial least squares path modeling [88], the Kano model [10,[30][31][32][33]. Many scholars have applied a combination of the Kano model and IPA on customer satisfaction analysis. For example, Bi et al. [10] applied the Kano model and asymmetric IPA to explore the asymmetric effects of hotel attribute performance on customer satisfaction through online ratings. Jou and Day [31] integrated the Kano model and IPA into a threedimensional IPA approach to identify the critical service attributes for hotel online booking through survey. Tseng [32] constructed an IPA-Kano model for classifying and diagnosing service attributes at the TPE airport. Pai et al. [33] combined the Kano model and IPA to investigate the critical service quality attributes to enhance customer satisfaction in the chain restaurant industry.
However, few hotel studies conducted Kano-IPA analysis using online textual review. Furthermore, few studies have applied Kano-IPA model to obtain the hotel attribute priority ranking for resource allocation to get improved across different hotel star ratings. These literature gaps need to be dealt with. Thus, considering the effectiveness of the Kano model and IPA for providing constructive guidelines to hotels to enhance customer satisfaction, it is of great significance to explore the application of Kano-IPA model in hotel textual reviews across different hotel star ratings.

Materials and Methods
The main objective of this study is to explore what contributes to the difference in hotel customer ratings for different star hotels. Specifically, this study identifies wellperformed attributes contributing to high customer ratings and poorly performed attributes causing low customer ratings in terms of hotel star ratings by exploring the bidirectional performance of hotel attributes. This study also aims to apply the Kano-IPA model in online textual reviews for a better understanding of customers' rating behaviors and demands, and hence provides effective attribute improvement strategies for different star hotels.
In this section, we propose a methodology to realize the above objectives and the structure of this methodology framework is shown in Figure 3. First, we collected data from TripAdvisor.com and processed the data according to hotel star ratings and customer ratings. Second, sub-attributes of six hotel attributes (value, location, service, room, cleanliness and sleep quality) that customers mentioned in online reviews were extracted. Specifically, similar terms and the similarity under each attribute are identified through the Word2Vec algorithm. Third, a sentiment lexicon for the hospitality domain to obtain sentiment values of each attribute and sub-attribute was obtained through the PolarityRank algorithm. Fourth, well-performed attributes that contribute to customer satisfaction and poorly performed attributes that cause customer dissatisfaction were identified for hotels of different star ratings through sentiment analysis. Finally, the above results by text mining were applied to conduct the Kano-IPA analysis for different star hotels. In particular, a novel approach for Kano model classification is proposed. Thus, the improvement strategies and priority of attributes are provided for different star hotels.
In this section, we propose a methodology to realize the above objectives and the structure of this methodology framework is shown in Figure 3. First, we collected data from TripAdvisor.com and processed the data according to hotel star ratings and customer ratings. Second, sub-attributes of six hotel attributes (value, location, service, room, cleanliness and sleep quality) that customers mentioned in online reviews were extracted. Specifically, similar terms and the similarity under each attribute are identified through the Word2Vec algorithm. Third, a sentiment lexicon for the hospitality domain to obtain sentiment values of each attribute and sub-attribute was obtained through the Polari-tyRank algorithm. Fourth, well-performed attributes that contribute to customer satisfaction and poorly performed attributes that cause customer dissatisfaction were identified for hotels of different star ratings through sentiment analysis. Finally, the above results by text mining were applied to conduct the Kano-IPA analysis for different star hotels. In particular, a novel approach for Kano model classification is proposed. Thus, the improvement strategies and priority of attributes are provided for different star hotels.

Data cleaning
Classifying the data according to hotel star ratings

Data Collecting and Processing
We collected hotel online reviews in London from TripAdvisor.com, which is the world's largest travel-sharing website. TripAdvisor.com contains millions of unbiased user-generated reviews from customers worldwide; thus, it's feasible to collect a large volume of online reviews. The data collection and processing steps in this paper are as follows.

Data Collecting and Processing
We collected hotel online reviews in London from TripAdvisor.com, which is the world's largest travel-sharing website. TripAdvisor.com contains millions of unbiased usergenerated reviews from customers worldwide; thus, it's feasible to collect a large volume of online reviews. The data collection and processing steps in this paper are as follows.
First, hotels in London were selected as data source for this research. London is one of the largest financial centers in Europe, as well as one of the world's most famous tourist attractions. It attracts millions of customers across the world. Statistically, London recorded 28.47 million bed nights of domestic tourists and 118.9 million nights of international visitors in 2019 [89,90].
Second, we crawled all available information at both hotel-level and review-level in London using a Python program. The hotels with fewer than 400 reviews in English were removed to ensure the credibility of this research sample. A total of 640 hotels with 1,090,341 reviews in English satisfied our requirements. Hotel-level information contains hotel name, star, rating, number of reviews and address. Each review-level data contains reviewer, travel type, posting time, stay time, textual review, overall rating and ratings on six hotel attributes (value, location, service, room, cleanliness and sleep quality).
Finally, we classified the hotel reviews into different datasets according to the hotel star ratings and review overall ratings. Following the studies in [9,91], we categorized the online reviews into four datasets, namely, two-star and below hotels (1-, 1.5-, 2-and 2.5-stars hotels), three-star hotels (3-and 3.5-stars hotels), four-star hotels (4-and 4.5-stars hotels) and five-star hotels according to the hotel star ratings. Review classification based on the review overall rating is controversial [18]. The main argument is whether the 3-score rating reviews should be classified as neutral or negative. Studies have shown that a 3-score evaluation is close to the service failure for most of potential customers [18,92]. Therefore, in this study, according to review overall ratings, online reviews of each hotel star were divided into two sub-datasets respectively, 1-3-score rating reviews as negative reviews and 4-5-score rating reviews as positive reviews. Let D neg t and D pos t respectively indicate the negative and positive dataset of t-star hotels, t = 2, 3, 4, 5. The final distribution of sub-datasets is shown in Table 2. Several standard steps were adopted to complete the text preprocessing task by using modules of the Natural Language Toolkit in Python programming environment, including:

Sub-Attributes Selection
In this study, six key attributes including value, location, service, room, cleanliness and sleep quality [9,10,35,36,46] are selected to explore the role of their bidirectional performance on customer overall ratings. These six attributes are provided by TripAdvisor.com as significant factors for customers to review [35]. Hotel customers use a variety of elements to evaluate the performance of the same attribute [46,93,94]. For example, customers may use "locate", "place" and "distance" to describe the attribute "location". Therefore, extracting words that are semantically similar to each hotel attribute is essential to comprehensively understand customers' opinions.
In this study, we use Word2Vec algorithm to extract words semantically similar to each hotel attribute from textual reviews. Word2Vec is a generative similarity analysis method used to compare the degrees of semantic similarity between two words or two texts. Given a text corpus, Word2Vec learns a vector for each word in the vocabulary using the Continuous Bag-of-Words or the Skip-Gram neural network architectures [95]. Continuous Bag-of-Words is suitable for a small corpus, while Skip-Gram performs better in a large corpus. After training the word vector model, the similarity of the words can be obtained. For this study, gensim is used as library which provides ready-made implementation of Word2Vec algorithm. We trained word vectors from each dataset of different hotel star using the Skip-Gram model. With the pre-trained Word2Vec model for each dataset, the similarity value between attribute A i and each word in dataset is calculated, where i = 1, 2, 3, 4, 5, 6. The words with similarity value under attribute A i greater than 0.5 are selected as the sub-attribute B ij of attribute A i , where j = 1, 2, . . . , P is the number of sub-attributes. Let SS ij denoted the similarity value between sub-attribute B ij and attribute A i .

Sentiment Lexicon Creation
We used the PolarityRank algorithm to create a sentiment lexicon from hotel reviews, which has achieved reasonable accuracy without training for domain-specific sentiment analysis [63,96]. The PolarityRank algorithm is a non-supervised sentiment analysis method based on PageRank, with the ability to consider the relevance between nodes, and spread both positive PolarityRank (PR + ) and negative PolarityRank (PR − ) of one node to other nodes through the relevance by edges of weights in a graph [63,96,97]. The main idea behind PolarityRank is to calculate two measures of relevance, the positive and the negative for each node in the graph [63].
Given a text, a graph can be built based on lexical and syntactical dependency, which is named a dependency-based parse tree in NLP. The lexical graph is defined as G = (N, E), where N = {g x } is a set of nodes and E is a set of bidirectional edges between pairs of nodes according to the syntactic dependencies and between all nodes contained in descendant branches. The edge E between node g x and g y contains an associated weight denoted by w xy . An example of lexical graph is given, shown in Figure 4. After generating the graph, propagation process with PR + and PR − of each node begins. The detailed descriptions of implementation steps are given in Sections 3.3.1-3.3.3 each hotel attribute from textual reviews. Word2Vec is a generative similarity analysis method used to compare the degrees of semantic similarity between two words or two texts. Given a text corpus, Word2Vec learns a vector for each word in the vocabulary using the Continuous Bag-of-Words or the Skip-Gram neural network architectures [95]. Continuous Bag-of-Words is suitable for a small corpus, while Skip-Gram performs better in a large corpus. After training the word vector model, the similarity of the words can be obtained. For this study, gensim is used as library which provides ready-made implementation of Word2Vec algorithm. We trained word vectors from each dataset of different hotel star using the Skip-Gram model. With the pre-trained Word2Vec model for each dataset, the similarity value between attribute and each word in dataset is calculated, where = 1,2,3,4,5,6. The words with similarity value under attribute greater than 0.5 are selected as the sub-attribute of attribute , where = 1,2, … , is the number of sub-attributes. Let denoted the similarity value between sub-attribute and attribute .

Sentiment Lexicon Creation
We used the PolarityRank algorithm to create a sentiment lexicon from hotel reviews, which has achieved reasonable accuracy without training for domain-specific sentiment analysis [63,96]. The PolarityRank algorithm is a non-supervised sentiment analysis method based on PageRank, with the ability to consider the relevance between nodes, and spread both positive PolarityRank ( + ) and negative PolarityRank ( − ) of one node to other nodes through the relevance by edges of weights in a graph [63,96,97]. The main idea behind PolarityRank is to calculate two measures of relevance, the positive and the negative for each node in the graph [63].
Given a text, a graph can be built based on lexical and syntactical dependency, which is named a dependency-based parse tree in NLP. The lexical graph is defined as = ( , ), where = { } is a set of nodes and is a set of bidirectional edges between pairs of nodes according to the syntactic dependencies and between all nodes contained in descendant branches. The edge between node and contains an associated weight denoted by . An example of lexical graph is given, shown in Figure 4. After generating the graph, propagation process with + and − of each node begins. The detailed descriptions of implementation steps are given in Sections 3.3.1-3.3.3.

Selecting Candidate Sentiment Words
After text preprocessing, the words were lemmatized as nouns, verbs, adjectives, adverbs, pronouns, etc. Previous studies selected the lemmatized nouns, adjectives and verbs as candidate sentiment words, discarding adverbs for they merely alter the degree of the polarity of the words they modify, but do not carry an inherent sentiment polarity [96,98]. Actually, many adverbs carry sentiment polarity, such as the adverb "luckily" in sentence "Luckily, there was one room available" expresses positive emotion.
To accurately analyze customers' feelings, we used all the lemmatized nouns (n), verbs (v), adjectives (a) and adverbs (ad) as candidate sentiment words. The nodes of the graph corresponding to candidate sentiment words from hotel reviews are connected by the bidirectional edges. Following the study of Fernández-Gavilanes et al. [96], the co-occurrence frequency of node g x and g y in the whole dataset is assigned to the weight w xy of edge E joining node g x and g y .

Assigning Initial Values to Candidate Sentiment Words
In this section, the candidate sentiment words are assigned initial positive value e + and negative value e − by SentiWordNet 3.0 through encoding a Python program. SentiWordNet 3.0 is a general sentiment lexicon publicly available for researchers, with three sentiment scores for each word, namely positive, negative and objective scores [99]. For each candidate sentiment word, we assigned the positive value from SentiWordNet 3.0 to e + and the negative value from SentiWordNet 3.0 to e − . For the words excluded in SentiWordNet 3.0, the e + and e − are equal to zero.

Calculation of PR + and PR −
With weights for edges and pairs of initial sentiment values for nodes, calculation of PR + and PR − could commence. Let E(g x ) be a set of indices y of the nodes for which there exists an edge to node g x . Then, suppose e + x and e − x be the initial positive and negative values of node g x respectively. The parameter α is set to 0.85 based on the original definition of PageRank, which is a damping factor to ensure convergence [63,97]. The PR + and PR − are estimated as follows: The propagation process is stopped until the calculation converges or iteration times reach a fixed approximation threshold. In this study, after testing this process, a maximum of 300 iterations is set as the stopping criterion.

Calculation of Semantic Orientation
With the final values PR + and PR − , referred to Cruz et al. [63], semantic orientation SO of each candidate sentiment word is normalized as: Finally, we dropped the candidate sentiment words with a zero SO. Thus, the sentiment lexicon from hotel reviews consists of the words with nonzero SO. Let two-tuple (g k , SO k ) denote sentiment word g k and the corresponding sentiment value SO k , where SO k ∈ [−5, 5] and k = 1, 2, . . . , m, with m representing the number of words in the lexicon.

Calculation of Sub-Attribute Sentiment Values
According to the principle of Lexicon-based methods to sentiment analysis, the polarity of a sentence can be obtained from the polarities of words in that sentence [62]. To obtain the sentiment value of each sub-attribute from different sub-datasets, we calculate the sentiment value of each sentence in different sub-datasets and record whether sub-attribute B ij exists in that sentence. For a single dataset, let β l q = G l q , SO l q be a two-tuple consisting of the qth sentiment word G l q and corresponding sentiment value SO l q of the lth sentence, where l = 1, 2, . . . , L, with L denoting the number of sentences in the dataset, q = 1, 2, . . . , Q, with Q denoting the number of sentiment words, and G l q belongs to the sentiment lexicon we created. Then, let β l = β l 1 , β l 2 , . . . , β l q , . . . , β l Q be a set of pairs of sentiment words and the corresponding sentiment values in the lth sentence. For sub-attribute B ij existing in the lth sentence, the sentiment value of B ij in the lth sentence is calculated by the following Equation (4): where l = 1, 2, . . . , L, with L denoting the number of sentences in the dataset.
To improve the accuracy of sub-attributes sentiment polarities, it is important to take the intensifiers and negators into account since these words can affect the sentiment values [46,56]. The sentiment propagation for intensification and negation is described as follows.
Intensifiers are linguistic terms that primarily combine with adjectives, as well as modify nouns, adverbs and verbs. These words serve to influence the strength of the sentiment word, enhancing or diminishing the sentiment strength. The most common way of identifying these valence shifters is using a list of words, such as adverbs and adjectives, associated with fixed values for intensifiers [100,101]. In this study, we used a list of intensifiers, adapted from Brooke, where each element is a modifier that emphasizes or attenuates words [102]. Let γ r represent the shift value of intensifier r, where r = 1, 2, . . . , R. Following the above description of sentiment calculation, for a dataset, if there're intensifiers existing in lth sentence, the sum of these shift values γ sum l is calculated. If not, the γ sum l is assigned zero. The propagation of S l ij is represented as: where l = 1, 2, . . . , L, with L denoting the number of sentences in the dataset. Propagation 2: Negation.
In sentiment analysis, negators are the words like "not" that cause negation. Negators could alter the meaning of a word, sentence or provide a negation context, like converting an affirmative statement into a negative statement. The most common way to process negators is attaching these terms to the nearest words [96]; i.e., in "This story is not interesting", the word "interesting" is converted into "NOT-interesting". In this processing method, negators are considered as polarity shifters of polar expressions that produce the opposite polarity. In other words, the polarity value was simply inverted if a polar expression fell within the negation scope [101]. Thus, as the term "perfect" assigned a positive sentiment value of +4, "NOT-perfect" has the sentiment value of −4. However, some researchers hold the opinion that it is more reasonable to decrease the strength of sentiment words rather than directly invert them [96,102]. We use a list of negators, adapted from Brooke, where the negators are used as sentiment shifter with a default shift value of 4 [102]. If there's at least one negator existing in the lth sentence, the negation propagation begins and is represented by Equation (6): where l = 1, 2, . . . , L, with L denoting the number of sentences in the dataset.

Calculation of Attribute Sentiment Values
For the purpose of ensuring that we get the pure positive sentiment value of attribute A i in each positive dataset, only the positive sentiment value of each sub-attribute under attribute A i is retained. In other words, the negative sub-attribute sentiment values in the positive dataset are re-assigned to zero, i.e., in the 6th sentence of five-star positive dataset indicate the re-assigned sentiment value of sub-attribute B ij in the lth sentence of the negative dataset. These two concepts can be computed as follows: where l = 1, 2, . . . , L, with L denoting the number of sentences in the dataset. Given that the sub-attribute B ij is the homonymsemantic similar word of attribute A i but not exactly equal to A i , it's necessary to consider the semantic similarity between sub-attribute B ij and attribute A i . Let In addition, we also calculate the sentiment values of each attribute without reassigned propagation for the following studies. The positive and negative datasets of the same hotel star are merged, and let SC i represent the overall sentiment value of attribute A i without discriminating positive and negative reviews, which is estimated as: where l = 1, 2, . . . , L , with L indicating the total number of sentences in the review datasets of different hotel star ratings.

Kano-IPA Analysis
In this study, the Kano-IPA analysis contains three relevant parts. First, the six hotel attributes of each hotel star rating are classified into different categories in order to understand the effect of attribute performance on customer satisfaction. Second, we construct the IPA plot for hotels of different star ratings through analyzing the attributes' importance and performance. Finally, the attribute priority rankings for improvement and resource allocation are given, so the different improvement strategies are provided for hotels of different star ratings. A detailed description of the Kano-IPA analysis is given as below.

Classifying Attributes into Kano Categories
In this study, a new approach to classify hotel attributes into Kano categories is proposed. As the above descriptions in our study, the positive sentiment value SC pos i of attribute A i is obtained from customers whose expectations toward hotel attribute A i has been met or even exceeded. So SC pos i of attribute A i indicates the customer satisfaction that attribute A i can bring when it performs well. Likewise, the negative sentiment value SC neg i of attribute A i is obtained from customers who think the attribute realistic performance hasn't met their expectations, which represents customer dissatisfaction that attribute A i causes when its performance is poor. The overall sentiment value SC i of attribute A i is obtained from all customers stayed in the hotels of the same star. Thus, the SC i is regarded as the expectant customer satisfaction that attribute A i should generate. In accordance with the obtained SC pos i , SC neg i and SC i , following the previous index value classifying methods [10,24], here we define an index SI to compare the effects of the attributes' good performance and poor performance on customer satisfaction in hotels of the same star rating, and the SI index of attribute A i can be calculated as: Obviously, SI i ∈ [0, +∞]. The SI index indicates the ratio of the customer satisfaction of good performance to the customer dissatisfaction of poor performance comparing with the expectant customer satisfaction of the overall performance concerning attribute A i . To determine the Kano category of each hotel attributes, a cut-off point θ is defined subjectively. According to the testing results based on different assignment methods in these review datasets, we define θ = SI MAX − SI MI N /6, where SI MAX and SI MI N represent the largest and smallest values of the SI index among the six hotel attributes. Moreover, the mean of the SI index among the six hotel attributes is calculated, denoting SI. Hence, hotel attributes can be classified into Kano categories as follows: If 0 ≤ SI i < SI − θ, attribute A i is regarded as basic factor, indicating attributes in this category bring more customer dissatisfaction compared to other attributes.
If SI − θ ≤ SI i ≤ SI + θ, attribute A i is regarded as performance factor, indicating attributes in this category bring equal or approximate customer satisfaction and dissatisfaction compared to other attributes.
If SI i > SI + θ, attribute A i is regarded as excitement factor, indicating attributes in this category bring more customer satisfaction compared to other attributes.

Constructing the IPA Plot
In this section, we try to construct an IPA plot of the six attributes. From Section 4.4, SC i indicating the overall performance of each attribute A i , so our next task is to estimate the importance of each attribute. In this study, the term frequency-inverse document frequency (TF-IDF) algorithm is utilized to estimate the importance of each sub-attribute. TF-IDF is a statistical method, which is widely used to evaluate the relative importance of a word to a particular document in a set of documents or a corpus [35,103]. The term's importance increases as it appears more frequently in the document, but at the same time, its importance decreases as the frequency it appears increases in the whole corpus. Based on TF-IDF algorithm, we defined u ij indicating the weight of sub-attribute B ij . As mentioned above, the sub-attribute B ij is semanticly similar to the attribute A i and the similarity SS ij indicating the degree of semantic proximity. Therefore, we adopted the processing method of attribute importance from the study of Wang et al. [35], and the attribute importance is calculated as follows: With the performance and importance of each attribute, the IPA plot can be constructed. The IPA plot is drawn with importance on the vertical axis and performance on the horizontal axis, with the crosshair located inside based on the data-centered method [104], as shown in Figure 2. According to IPA, hotel managers should improve the attributes in Q IV and Q III in that order, maintain the attributes in Q I, and finally consider reducing investment for attributes in Q II [10,29].

Analyzing the Attribute Priority Rankings
Due to the limitation of hotel resource and efforts, the detailed priority rankings for resource allocation in the same quadrant still need to be determined. The Kano model indicates that the effect of attribute performance on customer satisfaction varies from different Kano categories. According to product lifecycle, the attributes of a product or service are regarded as excitement, performance and basic factors [32], which provides a guideline for resource allocation. Specifically, the basic factors should be given the first priority to fulfill, the performance factors should be put in the second order to fulfill, and the excitement factors are given the lower priority to fulfill [10,29]. Therefore, based on the integrated Kano-IPA model, the attribute priority rankings for resource allocation are as shown in Table 3.

Results of Sub-Attributes Selection
According to the process described in Section 3.2, sub-attributes and the corresponding similarity under each attribute are obtained from online reviews through Word2Vec algorithm. The sub-attributes of each attribute are sorted by the similarity values. Due to space limitations, we only show the top 10 similar sub-attributes with respect to the six attributes extracted from the five-star hotel reviews in Table 4. In Table 4, "Similarity" indicates the similarity between sub-attributes and the corresponding attribute. Considering the six attribute terms also appear in textual reviews, the six attribute terms are also considered as sub-attributes of themselves. For example, room is a sub-attribute of attribute room, and the similarity is 1. As results shown in Table 4, we find that some terms may be the sub-attributes of two or more attributes. For example, the similarity between the term "bed" and attribute room is 0.5867, meanwhile the similarity between the term "bed" and attribute sleep quality is 0.6537. That is, term "bed" is a sub-attribute of both attributes room and sleep quality. This observation is similar to the sub-attributes (or key factors) selection findings of Wang et al. [35] and Nie et al. [46], indicating that the scopes of different attributes may overlap.

Results of Sentiment Lexicon from Hotel Reviews
According to the process given in Section 3.3, the PolarityRank algorithm is employed to create a sentiment lexicon from the corpus composed of all textual reviews after preprocessing.
Based on the selecting criteria, the nouns, adjectives, verbs and adverbs with POS are selected as candidate sentiment words. To ensure the efficiency of the PolarityRank algorithm, the final list of candidate sentiment words is composed of words that exist in at least 30 reviews. A total of 13,933 candidate sentiment words and the co-occurrence frequency of any two nodes in the whole dataset are obtained. Subsequently, the initial positive and negative sentiment values of each candidate word are assigned based on SentiWordNet. The results of candidate sentiment words with POS, frequency and initial sentiment value are shown in Table 5. In Table 5, "Tag" indicates the POS of each candidate sentiment words, and "Number of Words" indicates the number of times that candidate sentiment words appear in the whole corpus.   (1) and (2). The PolarityRank algorithm propagation process stopped until convergence. Additionally, SO of each candidate sentiment word can be calculated by Equation (3). According to the results, we can see the SO of some candidate sentiment words is equal to zero. The word with a zero SO is dropped because it is neutral without sentiment polarity. Finally, the sentiment lexicon composed of 5837 sentiment words with nonzero SO is created for attribute sentiment analysis. Due to space limitations, only the results of top 10 positive and negative sentiment words are shown in Table 6. From the results of sentiment lexicon, we find that some words that may not be used in daily life, but express emotions are identified. These less-common words are identified from numerous user-generated data of hotel domain through PolarityRank algorithm. This sentiment lexicon preserves some terms particular to the hotel domain, and hence it is a preferable choice to be used for sentiment analysis of hotel attributes to ensure greater accuracy [67]. are respectively obtained by Equations (4)- (10). Each sub-attribute has bidirectional performance, represented by positive sentiment value and negative sentiment value. The top 20 well-performed sub-attributes with the strongest positive sentiment polarity and top 20 poorly performed sub-attributes with the strongest negative sentiment polarity with respect to six attributes in five-star hotel reviews are shown in Tables 7 and 8. The results indicate that the bidirectional performance of the same sub-attributes may affect customer satisfaction differently. For example, considering the sub-attribute "decorate" under attribute room, its positive sentiment value from positive reviews is 18,064.37, ranked 3, but on the other hand, the negative sentiment value from negative reviews is −61.57, ranked 17. That is, for "decorate", customers tend to give much more praises when it performs well, whereas customers are probably not sensitive to its poor performance. The observation provides support for the existence of asymmetric relationship between attribute performance and customer satisfaction [24,72].  By Equations (11)-(13), the sentiment values of each attribute in hotel reviews of different star ratings are calculated. The positive, negative and overall sentiment values of each attribute are given in Table 9 according to the hotel star ratings. According to the negative sentiment values of six attributes in negative reviews, the ranking of poorly performed attributes is derived for hotels of each star rating. Hotels of three-stars, two stars and below have the same poorly performed attribute ranking, while hotels of four stars and five stars also have the same poorly performed attribute ranking. That is, Room < Cleanliness < Location < Value < Service < Sleep quality in negative reviews of threestar, two-star and below hotels, and Room < Cleanliness < Location < Service < Value < Sleep quality in negative reviews of four-star and five-star hotels. Similarly, according to the positive sentiment values of six attributes in positive reviews, the ranking of wellperformed attributes is derived for hotels of each star rating. That is, Cleanliness > Location > Room > Value > Service > Sleep quality in positive reviews of three-star, twostar and below hotels, Cleanliness > Room > Location > Service > Value > Sleep quality in positive reviews of four-star hotels, and Cleanliness > Room > Location > Service > Sleep quality > Value in positive reviews of five-star hotels. To better analyze the antecedents of both high and low customer ratings, the percentages of negative sentiment values and positive sentiment values concerning six attributes in terms of hotel star ratings are respectively calculated, shown in Figures 5 and 6. Results show that room, cleanliness and location account for about 75% of the sum of attribute negative sentiment values, meanwhile these three attributes also account for about 75% of the sum of attribute positive sentiment values for hotels of each star rating. Room, cleanliness and location are core attributes of hotels, in line with some prior research [35,94,105,106]. This finding also implies the main contributors to high customer ratings and causes of low customer ratings are the same for hotels of each star rating, similar to the studies of Berezina et al. [94] and Kitsios et al. [107]. Value, service and sleep quality have less impact on both high and low customer ratings, contrary to some previous research. For instance, the study of Ban et al. [105] implied that intangible service has the greatest impact on customer satisfaction.
The results also imply that the percentage of positive/negative sentiment values concerning location, value and service fluctuate with hotel star rating. Thus, the effect of good/poor performance concerning location, value, service on high/low customer ratings varies across hotel star ratings. For location, its good/poor performance contributes less to customer satisfaction/dissatisfaction in four-star hotels than in other star hotels. For value, the percentages of both positive and negative sentiment values gradually drop with the increase in hotel star ratings above three-star hotels. That is, for value, poor performance in high-star (four-star and five-star) hotels does not cause as many complaints as in low-star (three-star and below) hotels, and good performance brings less satisfaction for customers of high-star hotels. This finding is consistent with common sense that customers who choose low-star hotels lay greater emphasis on value for money [108], and customers in high-star hotels may take it for granted when value performs well because they spend more [10]. On the contrary, for service, good/poor performance contributes markedly more to high/low customer ratings in high-star hotels than in low-star hotels. The results show the same finding as earlier studies which showed that the effect of service's poor performance on customer dissatisfaction increases with the improvement of hotel level and luxury (i.e., four-five-star ratings) hotel customers emphasize good service [10,21]. Moreover, it is observed that the good performance of sleep quality has the potential to be the incentive for high customer ratings in five-star hotels. To better analyze the antecedents of both high and low customer ratings, the percentages of negative sentiment values and positive sentiment values concerning six attributes in terms of hotel star ratings are respectively calculated, shown in Figures 5 and 6. Results show that room, cleanliness and location account for about 75% of the sum of attribute negative sentiment values, meanwhile these three attributes also account for about 75% of the sum of attribute positive sentiment values for hotels of each star rating. Room, cleanliness and location are core attributes of hotels, in line with some prior research [35,94,105,106]. This finding also implies the main contributors to high customer ratings and causes of low customer ratings are the same for hotels of each star rating, similar to the studies of Berezina et al. [94] and Kitsios et al. [107]. Value, service and sleep quality have less impact on both high and low customer ratings, contrary to some previous research. For instance, the study of Ban et al. [105] implied that intangible service has the greatest impact on customer satisfaction.  The results also imply that the percentage of positive/negative sentiment values concerning location, value and service fluctuate with hotel star rating. Thus, the effect of  To better analyze the antecedents of both high and low customer ratings, the percentages of negative sentiment values and positive sentiment values concerning six attributes in terms of hotel star ratings are respectively calculated, shown in Figures 5 and 6. Results show that room, cleanliness and location account for about 75% of the sum of attribute negative sentiment values, meanwhile these three attributes also account for about 75% of the sum of attribute positive sentiment values for hotels of each star rating. Room, cleanliness and location are core attributes of hotels, in line with some prior research [35,94,105,106]. This finding also implies the main contributors to high customer ratings and causes of low customer ratings are the same for hotels of each star rating, similar to the studies of Berezina et al. [94] and Kitsios et al. [107]. Value, service and sleep quality have less impact on both high and low customer ratings, contrary to some previous research. For instance, the study of Ban et al. [105] implied that intangible service has the greatest impact on customer satisfaction.  The results also imply that the percentage of positive/negative sentiment values concerning location, value and service fluctuate with hotel star rating. Thus, the effect of By comparing the lines in Figures 5 and 6 for one certain attribute, it can be found that the impact of the bidirectional performance concerning one attribute on high/low customer ratings is different. For room and sleep quality, the effect of their poor performance on low customer ratings is stronger than the effect of their good performance on high customer ratings. On the contrary, for cleanliness, location and service, the effect of their good performance on high customer ratings is stronger than the effect of their poor performance on high customer ratings. For value, the effect of its good performance on high customer ratings is stronger than the effect of its poor performance on low customer ratings in low-star hotels, while quite the opposite is true for high-star hotels. Therefore, the results indicate that the effect of attribute performance on customer ratings is asymmetric, consistent with many previous studies [8,10,23,24,71,76]. Furthermore, the asymmetric effect of values' performance on customer ratings is different between high-star and lowstar hotels.  (14), the SI values of six attributes concerning four hotel star ratings can be calculated and further the six attributes are classified into three Kano categories, as shown in Table 10. The final classification of attribute categories is basically consistent with the relative effect of each attribute on customer satisfaction for hotels of the same star rating. On the whole, the categories of all attributes except value vary across different hotel star ratings. Specifically, value is always classified as an excitement factor, indicating that value can bring more satisfaction when it performs well regardless of hotel star ratings. Unlike the study of Bi et al. [10], this study shows that value is an excitement factor, providing support for the finding (value and price is the attractive factor for four-five-star hotels) of Chiang et al. [13]. Location is classified as a performance factor in hotels of three stars and below and is classified as an excitement factor in hotels of four stars and five stars. Compared with hotels of three stars and below, the good performance of location can bring more customer satisfaction for four-star and five-star hotels. Luxury hotel customers are willing to pay more for a convenient location [21]. Thus, customers in high-star hotels will be very satisfied when the performance of location, which is the core requirement, exceeds their expectations. Service and sleep quality, showing the same change with the increase in hotel star, are classified as basic factors in hotels of three stars and below, and are classified as performance factors in hotels of four and five stars. Thus, customers in hotels of three stars and below may not be sensitive to the good performance of service and sleep quality, but they are dissatisfied when the performance of service and sleep quality is poor. Meanwhile, customers in hotels of four and five stars are sensitive to the bidirectional performance of service and sleep quality. Room is classified as an excitement factor in hotels of three stars and below and is classified as a performance factor in hotels of four and five stars. The good performance of room can bring customer satisfaction in hotels of each star rating, while poor performance of room can bring more customer satisfaction in hotels of four and five stars than in hotels of three stars and below. Cleanliness is classified as a performance factor in hotels of two stars and below and is classified as a basic factor in hotels of three, four and five stars. This result indicates that customers in two-star and below hotels may feel satisfied when the room is clean, but customers in other star hotels take the good performance of cleanliness for granted. These findings are different to the study of Bi et al. [10] who classified service, sleep quality, room and cleanliness as basic factors in hotels of each type.

The IPA Plot
Based on the TF-IDF algorithm, we obtained the weight of each sub-attribute with respect to the six attributes concerning four hotel star ratings. Then the importance of the six attributes concerning four hotel star ratings was calculated respectively by Equation (15), as shown in Table 11. On the whole, the importance of value, service and sleep quality varies across the hotel star ratings, while other attributes' importance fluctuates slightly and are considered as very important for all hotels. This finding is consistent with previous research that revealed that customers of high-star hotels are more likely to value some intangible attributes (i.e., service and sleep quality) [40]. Specifically, with the increase in hotel stars, the importance of value decreases, while the importance of service and sleep quality increases. In other words, customers who select high-star hotels pay more attention to service and sleep quality, and consider value as less important. On the contrary, customers who choose low-star hotels highly emphasize value, but consider service and sleep quality as less important. The importance of value shows a significant downward trend with the improvement in hotel stars, coinciding with Zhao's [108] research. With the obtained importance and performance of each attribute concerning four hotel star ratings, the IPA plots can be constructed, as shown in Figure 7. Location and cleanliness are located in Q I in hotels of all stars, which indicates that location and cleanliness should be well remained for their high importance and performance. Value, service and sleep quality are located in Q III in hotels of all stars, with low importance and performance, so they are of low priority for improvement. In contrast, room is located in Q IV in hotels of three stars, two stars and below, urgent to be improved, while it is located in Q I in four and five star hotels, indicating it is the hotels' strength.
tel star ratings, the IPA plots can be constructed, as shown in Figure 7. Location and cleanliness are located in Q I in hotels of all stars, which indicates that location and cleanliness should be well remained for their high importance and performance. Value, service and sleep quality are located in Q III in hotels of all stars, with low importance and performance, so they are of low priority for improvement. In contrast, room is located in Q IV in hotels of three stars, two stars and below, urgent to be improved, while it is located in Q I in four and five star hotels, indicating it is the hotels' strength.

Suggestions for Attribute Improvement and Priority
With the obtained performance and importance of the six attributes, the attribute priority rankings for resource allocation concerning four hotel star ratings are obtained by integrating the Kano categories of six attributes with the IPA plot, as shown in Table 12. The attribute priority rankings are divided into two groups, namely, > > > > > for low-star (three stars, two stars and below) hotels, and > > > > > for high-star (five-star and four-star) hotels.

Suggestions for Attribute Improvement and Priority
With the obtained performance and importance of the six attributes, the attribute priority rankings for resource allocation concerning four hotel star ratings are obtained by integrating the Kano categories of six attributes with the IPA plot, as shown in Table 12. The attribute priority rankings are divided into two groups, namely, Room > Service > Sleep quality > Value > Cleanliness > Location for low-star (three stars, two stars and below) hotels, and Service > Sleep quality > Value > Cleanliness > Room > Location for high-star (five-star and four-star) hotels. For low-star hotels, room (an excitement factor) is of the first priorities to get improved since it is very important, but it performs poorly from the perspective of customers. According to sub-attributes results, some effective measures can be taken to improve the room performance, such as paying attention to improving the facilities, tidiness, room size and soundproofing. The importance and performance of service, sleep quality and value are low. Service and sleep quality are basic factors which can cause numerous complaints when they perform poorly, while value is an excitement factor which can generate more customer satisfaction when it performs well. Service's importance is higher than sleep quality, so service and sleep quality are respectively given the second and third priority for resource allocation for improvement. Regarding service, improving staff's skill and attitude is important, and more professional, friendly and polite staff are needed. Moreover, hotel managers should pay attention to beds, pillows and soundproofing facilities to improve customers' sleep quality. Value is given the fourth priority for improvement. The importance of value is significantly higher in low-star hotels than in high-star hotels, which indicates that customers who choose low-star hotels are more likely to emphasize value for money. Offering a variety of discounts, reasonable price and member reward is helpful to enhance customer satisfaction. Lastly, cleanliness and location are the strengths of hotels, which should be well maintained. Considering cleanliness is considered more important than location, cleanliness is given the fifth priority for resource allocation to get improved.
For high-star hotels, there are no attributes that should be improved urgently. However, it is still necessary to invest resources and effort in service, sleep quality and value. Service is given the highest priority for resource allocation for improvement. From Figure 7, it can be concluded that service performs much better in five-star hotels than other hotels and its importance gradually increases, but it has not been the strength of five-star hotels yet. Unlike common service aspects which should be strengthened for low-star hotels, some advanced service aspects need to be improved. For example, improving staffing levels, providing more proactive, pet-friendly and infant-related service, multilingual receptionists, etc., are preferable ways to obtain more customer satisfaction. Sleep quality (a performance factor) and value (an excitement factor) are given the second and third priorities, respectively, and their importance is very low. If possible, investing resources in improving sleep quality and value (i.e., improving bedding and soundproofing facilities, offering discounts and reasonable prices) can also improve hotel customer ratings. Cleanliness, room and location are the strengths of high-star hotels, and these attributes should be well maintained. Especially, room is a unique strength for high-star hotels, while it is the weakness of low-star hotels. This result is consistent with the hotel star rating system offered by the Automobile Association that good performance of room is a must-be requirement for hotels to be rated as high-star [109]. According to attribute categories, cleanliness, room and location are prioritized in order for resource allocation because they are basic, performance and excitement factors, respectively.

Theoretical Implications
This study explored the attribute bidirectional performance by dividing online reviews into positive reviews and negative reviews. The Kano-IPA model was used for further understanding of customer's rating behaviors and demands for hotel service. The proposed methodology in five phases of sentiment analysis and Kano-IPA model enriches the research on online hotel reviews. The main theoretical contributions introduced are as follows: First, this study explores the well-performed attributes contributing to high customer ratings and the poorly performed attributes causing low customer ratings. By dividing 1,090,341 online reviews into positive and negative reviews, the six attributes' good performance (positive sentiment values) in positive reviews and poor performance (negative sentiment values) in negative reviews are calculated through sentiment analysis. Our findings suggest that room, cleanliness and location are the most crucial determinants of both high and low customer ratings for hotels of these four levels. By contrast, other attributes, including value, service and sleep quality, have less impact on customers' rating behaviors. Therefore, the most crucial hotel attributes influencing customer satisfaction and dissatisfaction are exactly the same. Focusing on improving service quality of these general attributes including room, cleanliness and location is the key to win high customer ratings for all hotels. Thus, the effect of good/poor performance concerning location, value, service on high/low customer ratings varies across hotel star ratings.
Second, comparative analysis of attribute bidirectional performance concerning four hotel star ratings was conducted to verify the difference of hotel attributes contributing to high/low customer ratings among different hotel star ratings. This study indicates that the impact of several attributes on high/low customer ratings varies across different star hotels. On one hand, the impact of value and service's poor performance on low customer ratings varies across hotel star ratings. With the improvement in the hotel level, the impact of value's poor performance on low customer ratings shows a downward trend, while the impact of service's poor performance on low customer ratings shows an upward trend. For three-star and below hotels, value's poor performance contributes more to low customer ratings than service's poor performance. In contrast, for four and five star hotels, service's poor performance has greater impact on low customer ratings. On the other hand, the good performance in room, location, value, service and sleep quality contributes to high customer ratings differently among different star hotels, where the impact of value and service's good performance on high customer ratings shows a larger range of changes. Interestingly, for value and service, with the improvement in the hotel level, the impact of their good performance on high customer ratings shows the same trend as the impact of their poor performance on low customer ratings. These findings indicate that customers' expectations and perceptions on the good/poor performance of each attribute may vary across hotel star ratings. Thus, it is necessary to take hotel star ratings into consideration on customer satisfaction research.
Third, this study suggests that the effect of good performance on high customer ratings may not be equal to the effect of poor performance on low customer ratings for the same hotel attribute. In other words, the effect of attribute performance on customer satisfaction is asymmetric. For this reason, the Kano-IPA model was applied to better understand customer's rating behaviors and demands for hotel service. The Kano categories of five attributes (location, service, room, cleanliness and sleep quality) vary across different hotel star ratings. Furthermore, suggestions on priority for attribute improvement are formulated for hotels of the four star ratings according to the results of Kano-IPA model.
Fourth, this study proposes a methodology for hotel attribute sentiment analysis based on the automated textual analysis techniques including the Word2Vec and PolarityRank algorithms. A new sentiment lexicon was created from user generated reviews based on the PolarityRank algorithm, contributing to sentiment analysis in the hotel domain. The advance in the sentiment lexicon creation contains the following two points. On the one hand, we adopted more words (i.e., adverbs) than existing studies for PolarityRank propagation [63,96], which avoids missing some important sentiment words. On the other hand, initial both positive and negative sentiment values of each candidate sentiment word are assigned by a function from SentiWordNet instead of assigning positive seed words and negative seed words sentiment values manually, which is considered more objective and trustworthy. In addition, to our best knowledge, our sentiment lexicon built from the 1,090,341 textual reviews is the instructive application of the PolarityRank algorithm in million-level datasets. Thus, the comprehensive and complete sentiment propagation provides a guarantee of more precise sentiment calculation.
Lastly, this study proposed a novel index approach for Kano model classification and further makes it possible to apply the Kano-IPA model to numerous textual reviews. The SI index is defined to represent the satisfaction-stimulating ability of any one hotel attribute. Then the six attributes are classified into three Kano categories by comparing each SI index with the average index value for hotels of each star rating. The proposed approach enriches the existing research on the classification of the Kano model. Additionally, based on the TF-IDF algorithm, the importance of each attribute is obtained to construct the IPA plot. This study is a preferable attempt to apply online reviews to explore the effects of attribute performance on customer satisfaction to understand customers' rating behaviors.

Practical Implications
As consumers' reliance on the Internet grows, online reviews are increasingly important since customers usually browse a lot of hotel reviews when making hotel choices. It is important to analyze how hotel attributes contribute to high and low customer ratings. This study enables hotel managers and hotel online platforms to understand customers' rating behaviors, expectations and perceptions on hotel attributes. Furthermore, our findings and discussions provide a reference for hotel managers to allocate resources for attribute improvement and prioritization to achieve higher customer ratings.
First, due to the findings that the final attribute priority rankings for improvement are divided into two groups, two strategies for attribute improvement are given to low-star (three stars and below) and high-star (four-and five-star) hotels, respectively. For lowstar hotels, room, which is an excitement factor, should be given the highest priority for resource allocation for improvement. Effective measures such as refurnishing, renovating, providing tidy and spacious rooms and proper decoration could be taken to improve room's performance in order to enhance customer satisfaction. Service, sleep quality and value are of lower priority for improvement, and they are basic, basic, and excitement factors, respectively. Some effective measures should be taken to enhance the performance of service and sleep quality in order to reduce customer dissatisfaction, which might include, for instance, staff training for work skill and attitude improvement, quality improvement in beds, pillows and soundproofing. With sufficient resource, low-star hotel managers should also provide attractive discounts or reasonable prices to customers since value for money is highly important for them. For high-star hotels, though nothing calls for urgent improvement, there still a need for better performance in service, sleep quality and value. Service and sleep quality are performance factors, and their importance is significantly higher for customers in high-star hotels. Service improvement (i.e., higher staffing levels, proactive, pet-friendly and infant-related services and multilingual receptionists) and providing better sleeping conditions (i.e., better bedding and soundproofing) are preferable methods to enhance customer satisfaction. Moreover, providing proper discounts and price for customers is also needed.
Second, some strengths should be well maintained for different star hotels. For lowstar hotel managers, cleanliness and location are the strengths to win customer satisfaction. Since cleanliness and location are performance or basic factors and of high importance for customers in low-star hotels, it is necessary to invest sufficient resource to ensure their high quality. For high-star hotel managers, cleanliness, room and location are the strengths that need to be well maintained. In contrast to customers in low-star hotels, cleanliness and location are, respectively, basic and excitement factors for customers in high-star hotels. Investing more in hotel location is a preferable way for high-star hotels to enhance customer satisfaction. While it is hard to transform the existing locations, some convenient transportation services can be offered to improve access to attractions or traffic stations, such as free shuttles, attraction brochures. Additionally, room is a unique strength for high-star hotels, while it is a weakness of low-star hotels. These findings are in line with the hotel star rating system offered by the Automobile Association that room is a basic and quantitative indicator for hotel star rating [109]. Therefore, hotel managers should pay great attention to room improvement for higher star ratings.
Third, this study indicates that attribute improvement priorities are the same for hotels of three stars, two stars and below. However, compared with two-star and below hotels, service and sleep quality's importance is higher but performance is worse in three-star hotels. Service and sleep quality are basic factors, so their poor performance is more likely to cause great customer dissatisfaction. Customers pay more for a better hotel, so their expectations increase [110]. Thus, three-star hotel managers should pay more attention to improve performance in service and sleep quality to reduce customer dissatisfaction, and further enhance the competitive strengths against two-star and below hotels. Similarly, five-star hotel managers should keep alert for the pursuit of higher service quality since the SI index values of location, service, room, cleanliness and sleep quality show a downward trend compared to four-star hotels. This can be explained as follows: customers place much higher expectations on five-star hotels than four-star hotels, so very minor service failures can also cause great complaints. Compared to four-star hotels, investing resources to provide customers more attentive service and better sleep quality is necessary for fivestar hotels.
Last but not least, for hotel online platforms, two aspects of practical significance are as follows. On the one hand, this study serves as references for online websites to recommend hotels to customers when they filter hotel star ratings. Our findings imply that customers have different expectations, preferences and demands for the six attributes when they choose hotels of different star ratings. Thus, different weights assigned to each hotel attribute according to hotel star ratings can be considered when designing the hotel recommendation system. On the other hand, we suggest that the six evaluation dimensions on the website should be upgraded. For example, considering the sub-attribute lists of room and cleanliness are similar, they can be merged into one dimension or given some notes for each attribute to help customers to distinguish between them.

Limitations
This study also has several limitations, which might serve as avenues for future research. First, the data were collected from one online travel website, which may not provide the complete information about customers' opinions. In addition, not all customers write textual reviews and give ratings to the hotels after leaving. Therefore, hotel reviews can be collected from multiple online websites and customers who book hotels offline. Second, although this study explores the differences in the categories and performance of six attributes across four hotel star ratings, attribute differences between different traveling purposes or different regions may exist. Customers with different traveling purposes and from different districts have different preferences on hotel attributes, which may influence attribute performance and further influence attribute classification in the Kano model. In the future, classifying online reviews based on other methods involves complex research. Third, for each hotel, its star rating may move up or down when the hotel makes some changes such as redecoration, management mode upgrades or becoming run-down. Although the cost of improving hotel star ratings is very high, some hotels may attempt to make efforts for higher star ratings. As a result, for some hotels, earlier online reviews may not reveal their quality appropriately in the current star ratings. This will affect the attribute bidirectional performance analysis results among different hotel star ratings. Thus, it is preferable to select online reviews during the current star rating period or exclude the hotels with changes in star ratings in the future research. Additionally, exploring the difference in determinants of customer satisfaction and dissatisfaction between the previous and current hotel star ratings is a future research direction. Finally, the attributes used in this study are the six evaluation dimensions on TripAdvisor.com, which may not include all topics expressed in textual reviews. To comprehensively understand customer demands, different categories of attributes can be extracted from textual reviews in future research.