Research on the Role of Influencing Factors on Hotel Customer Satisfaction Based on BP Neural Network and Text Mining

With the flourishing development of the hotel industry, the study of customer satisfaction based on online reviews and data has become a new model. In this paper, customer reviews and ratings on Ctrip.com are used, and TF-IDF and K-means algorithms are used to extract and cluster the keywords of reviews texts. Finally, 10 first-level influencing factors of hotel customer satisfaction are determined: epidemic prevention, consumption emotion, convenience, environment, facilities, catering, target group, perceived value, price, and service. Based on backpropagation neural network and weight matrix operation, an influencing factor analysis model of hotel customer satisfaction is constructed to explore the role of these factors. The results show that consumption emotion, perceived value, epidemic prevention, target group, and convenience would significantly affect customer satisfaction, among which epidemic prevention becomes a new factor affecting customer satisfaction. Environment, facilities, catering, and service have relatively little effect on customer satisfaction, while price has the least effect. This study provides a path and method for online reviews of hotel management to improve customer satisfaction and provides a theoretical basis for the study of online reviews of hotels.


Introduction
At present, the hotel industry is very competitive, and if we want to stand out among them, customer satisfaction needs to be valued. Customer satisfaction is an important barometer for assessing hotel performance, so a deep understanding of the factors of customer satisfaction and dissatisfaction is essential for hotel management [1]. According to the service profit chain theory, customer consumption experience with a product directly determines customer satisfaction [2], which also affects the final value that customers bring to hotels. Hotel customer satisfaction refers to the contrasting results of the quality of products or services provided by hotels and customer benefit needs and has a huge impact on the image promotion of hotel brands as well as the sustainable development of competitiveness [3].
There are many studies on the customer satisfaction characteristics of hotels, and many scholars have conducted studies on hotel customer satisfaction. For example, Duan et al. studied the impact of dimensions such as service, catering, facilities, hygiene, location, and price of hotels on customer satisfaction [4]. Xiong and Xu found the influence of some factors, such as the room, network, catering, and site selection of hotels, on the comprehensive satisfaction of hotels [5]. Tsaur and Lin indicated that value, room, and service are the most concerned factors for hotels by customers [6], while Pine and Phillips pointed out that consumers have different preferences for hotels of different grades: consumers of luxury hotels have strict web requirements, while guests of economy hotels have relatively The remainder of the paper is organized in the following way. Section 2 reviews the relevant literature. Section 3 constructs an analysis model of customer satisfaction factors of hotels based on the calculation of natural language processing, BP neural network, and weight matrix and experimental analysis is conducted. Section 4 provides conclusions and theoretical and managerial implications. Finally, limitations and future research directions are given in Section 5.

Customer Satisfaction
Expectation theory states that customer satisfaction is a type of feelings that customers have after their products exceed their inner expectations, and they are disappointed and dissatisfied if evaluations of a product do not exceed their original psychological expectations [22]. Many scholars have conducted research on customer satisfaction evaluation. Among the customer satisfaction evaluation models, the most classic are the American Customer Satisfaction Index (ACSI) and the European Customer Satisfaction Index (ECSI), and most of the various satisfaction evaluation models proposed by scholars after are based on these two models for improvement and development. Tontini added an adjustment factor to the coefficient of customer satisfaction in the model, assigned greater weight to the factors that affect customer satisfaction, and realized the evaluation of customer satisfaction based on the weight of the influencing factors [23]. Shi refined the evaluation indexes in the model and reconstituted the satisfaction evaluation model system from different aspects [24]. Li and Wei proposed information satisfaction and information system satisfaction as two key factors in customer satisfaction evaluation models based on e-commerce characteristics [25]. Liu designed a customer satisfaction evaluation model of logistics enterprises using fuzzy algorithm [26]. China has also accumulated some experience with satisfaction evaluation research, with the formal implementation of CCSI in 2015. Yang and Wang improved the existing theoretical models of customer satisfaction evaluation, constructed a system of customer satisfaction evaluation using structural equations, and finally validated the model using fuzzy hierarchical analysis [27]. On the customer satisfaction evaluation method, Schneider et al. summarized the satisfaction level of customers based on a customer satisfaction index model to evaluate the customer satisfaction from a qualitative point of view [28], but their research lacks the basis of quantitative analysis. To break through the limitations of the original model and realize the quantitative research of customer satisfaction, Chen et al. used fuzzy hierarchical analysis to evaluate the customer satisfaction, determined the weight of indicators by establishing a comparison matrix, and calculated the customer satisfaction by affiliation [29]. Sun et al. proposed using the Kano model to evaluate customer satisfaction, statistically analyze the relationship between demand and customer satisfaction using survey, and establish a mathematical model between demand and satisfaction [30].

Online Reviews and Hotel Customer Satisfaction
Identifying the causes of customer satisfaction and dissatisfaction has been an important area of research for researchers from multiple disciplines, including marketing and the hotel industry [31]. In the hotel environment, hotel customers usually assess satisfaction based on their characteristics and services of the hotel. In other words, these characteristics of hotels can not only reveal customers' preferences about hotel characteristics but also characterize their satisfaction. Online reviews and customer ratings of hotels appear to be an excellent tool, as they provide a new avenue to investigate customer satisfaction. Online reviews and review ratings provide an opportunity to observe the causes of customer satisfaction [32][33][34]. Positive reviews indicate customer satisfaction, while negative reviews indicate customer dissatisfaction [35,36]. Customer satisfaction arises from the comparison between the a priori expectation and the a posteriori perceived performance of a product or service [37]. If consumers perceive higher than expected performance, satisfaction is revealed. In contrast, lower than expected product or service performance can lead to negative dissatisfaction and dissatisfaction. Hotels are typical service enterprises. Thus, satisfaction is not only the intuitive satisfaction of consumers with regard to product quality, price, service, etc., and a deeper level should be the degree of service provided by hotels or the fit of the products with consumer expectations. Therefore, we consider customer satisfaction as an overall emotional reaction of consumers to the entire set of tangible facilities and intangible services.
Online reviews published by customers best reflect such emotional reactions and are the most direct expression of whether or not the customer is satisfied. One study showed that 73% of respondents preferred to read online consumer reviews about hotels rather than relying on hotels' descriptions of themselves [38]. Hundreds of millions of potential hotel customers visit such online reviews each year [39]. Existing evidence suggests that online users are affected by online reviews in their purchase decisions [40]. Therefore, online reviews have become a relevant source of information for understanding hotel customers. Customers consider online reviews as a reliable source of information when selecting hotels [41,42]. Interestingly, research on online reviews in Tripadvisor showed that online reviews are more important than traditional ways in explaining hotel customer satisfaction. Ye et al. conducted an empirical study using data from a large online travel service in China to determine the impact of online user generated reviews on room sales and business performance in hotels [43]. Vermeulen and Seegers applied the consideration set theory to simulate the impact of online hotel reviews on consumer choice and found that review price, hotel familiarity, and reviewer expertise affect consumer perceptions of hotels [44].
Online reviews have proven to be a rich source of information for analyzing customers' opinions, practices, and behaviors [45,46]. In the hotel industry, the promotion of online reviews has been found to increase customer awareness and scrutiny of the hotel [44,47,48]. Online travel reviews are becoming an influential source of information that affects customers' prepurchase evaluation of hotels. The core parts of online reviews are digital ratings and text reviews [49]. Another study emphasizes that the usefulness of reviews and recommendation centered prices are important factors affecting consumer behavior and purchase decisions [50]. The importance of online review has been widely recognized in the existing literature [51,52]. The existing research to a large extent solved the impact of various rating values on consumer preferences and behavior [53,54]. In addition, the decision-making process of customers is strongly influenced by online reviews [55]. Therefore, it is very important to study the reviews of online hotel recommendation websites to determine customers' behaviors and preferences, because it is an excellent tool to understand customer satisfaction [46,56]. In addition, one of the main advantages of studying customer reviews and ratings is that it can directly show customer satisfaction [32].

Text Analysis of Online Reviews
Compared with questionnaires, online text reviews are unstructured user generated content [57] that can reflect customer consumption experiences and perceptions in greater detail [35]. Previous studies of online reviews of hotels can be divided into two types.
(1) Some focus on the content of text reviews to find the attributes mentioned by hotel customers and their perceptions of the hotel accommodation experience. These attributes include room quality, staff attitudes and behaviors, location, access, values, food, etc. [35]. Experiences include customer satisfaction and dissatisfaction [33], with positive evaluations indicating customer satisfaction and negative evaluations indicating customer dissatisfaction. Many studies use text mining techniques, including content analysis [58], frequency analysis [59], text linkage analysis [33], and latent semantic analysis [35], which are used to identify attributes of hotel facilities and services that are of customer concern.
(2) The technical aspects of processing online text reviews about hotels have attracted more attention in the research field of online reviews. The big data era has accelerated the use of text mining techniques, which include statistical and machine learning approaches to inform decisions [60]. Text mining is the process of extracting and analyzing large amounts of data from different rules and scenarios to discover hidden patterns in a given dataset [61], which helps reveal hidden trends regarding products, customers, market trends, and other key factors for the success of the company [62]. Text mining analysis of online reviews can help hotels make predictions, such as review usefulness [63], customer satisfaction [64], and hotel performance [65]. Regarding the relationship between hotel text reviews and customer satisfaction, Geetha et al. focused on the emotional polarity of online customer reviews and find that it affects customer satisfaction [16]. He et al. used natural language pre-processing, text mining, and sentiment analysis techniques to analyze online hotel text reviews, and they found that the affective scores of the titles and contents of online customer reviews are highly correlated with the overall customer ratings of hotels [66]. Qu et al. also found that most attribute emotions derived from text reviews were significantly related to customers' overall ratings [67]. This also supports the findings of Kim et al., who showed that overall rating is the key predictor of hotel performance [68].
Since Chinese consumers are considered typical collectivists and their behaviors are different from those in western countries, what is the impact of online textual reviews of hotels on their satisfaction? Our study helps to address this question. We study the technical aspects of hotel text reviews. First, through text mining of online reviews, we classify and cluster the reviews to obtain the first-level influencing factors on customer satisfaction and conclude the second-level influencing factors on customer satisfaction by referring to the CCSI. Second, we construct the BP neural network and the weight matrix of the hotel customer satisfaction influencing factors model. Here, we take the second-level influencing factors as variables and utilize backpropagation (BP) neural network to do regression analysis on the overall customer satisfaction and calculate the effect size of the first-level influencing factors through the weight matrix of BP neural network.
Finally, according to the intensity of the role of first-level influencing factors, the facilities and services of hotels are adjusted to meet the needs of customers and improve the competitiveness of hotels.

Model and Experimental Analysis
This study includes the acquisition and pre-processing of text data, TF-IDF, word2vec, and K-means, and a BP neural network-based regression model of hotel customer satisfaction is constructed by combining CCSI to obtain the effect size difference of each influencing factor on hotel customer satisfaction. The online review text data of hotels were pre-processed, and the TF-IDF value of each word was calculated. This paper defines the 60 top ranked terms of TF-IDF values as the second-level influencing factors on customer satisfaction in hotels. K-means clustering was performed on the second-level influencing factors, clusters of points near the same centroid obtained from clustering were assigned to the same class, and factors in the same class were named first-level influencing factors in combination with CCSI. Through the weight matrix calculation of the BP neural network is able to clearly analyze the customer's characteristic attention preferences to hotels and their characteristic emotions on the satisfaction of hotels, thus helping hotel operators adjust their operation model of hotels to provide better services to customers. The research framework of this paper is shown in Figure 1.

Data Sources
This paper refers to the 2019-2020 China Travel Word of Mouth List released by Ctrip.com, which is inductive based on data such as destination customer voting, searching, and browsing destination content products to select six target cities. Hotel choice is based on: (1) the criteria for selecting star hotels in the studies by Xiang et al. [69] and Geetha et al. [16]; (2) the global hotel leader in 2019 issued by authoritative media "Hotels" in the global hotel industry; and (3) the audience of hotel brands in China. Ten hotel brands (covering 38 sub brands) were selected. Meanwhile, the time frame was limited to one year considering the review timeliness. We obtained 37,146 reviews of the Ctrip.com, which spans from November 2019 to November 2020 and involves six cities and 122 hotels. Specific information fields for reviews and ratings are shown in Figure 2.
In addition, three-star hotels and above were selected as the research object. From the perspective of hotels, the more comprehensive the star service is, the more emphasis is placed on enhancing the customer experience, thus the review content is very important for hotels to improve the service quality. From the customer perspective, star rated hotels are selected to pursue better services and experiences, and reviews are the best way for them to obtain information and give feedback.

Data Sources and Pre-Processing
This paper refers to the 2019-2020 China Travel Word of Mouth List released by Ctrip.com, which is inductive based on data such as destination customer voting, searching, and browsing destination content products to select six target cities. Hotel choice is based on: (1) the criteria for selecting star hotels in the studies by Xiang et al. [69] and In addition, three-star hotels and above were selected as the research object. From the perspective of hotels, the more comprehensive the star service is, the more emphasis is placed on enhancing the customer experience, thus the review content is very important for hotels to improve the service quality. From the customer perspective, star rated hotels are selected to pursue better services and experiences, and reviews are the best way for them to obtain information and give feedback.

Data Cleaning
First, we excluded the invalid reviews. The principles are as follows: reviews that are too short or have a few words, reviews with garbled code, and reviews with no analytical significance, which means that there are only symbols or expressions but no words, were removed. Then, we added the word segmentation and stop word library to segment and delete the stop words. Finally, 36,936 valid reviews were obtained after filtering the noise word. The city distribution of the reviews is shown in Table 1. Step 1: Jieba word segmentation Jieba word segmentation has a good segmentation effect for data in the hotel field and the module design of Jieba segmentation is easy to use and integrate in a system which can save unnecessary time without affecting the accuracy of segmentation [70] Therefore, this paper selects Jieba word segmentation tool to process the cleaned data and uses Python Jieba word segmentation interface to segment the text data.
Step 2: Stop word processing In this paper, we choose the stop list of Harbin Institute of Technology and remove the modal particles in the text, as well as conjunctions such as "and" and "but" in sentences. Combined with the research data of this paper, the words of place names are added to the stop list to filter out the stop words, such as "Hotel", "Sanya", "Shanghai", "Beijing", "Xiamen", "Guangzhou", "Hangzhou", etc. and further remove irrelevant words in the text.
Step 3: Filter the noise words After the word segmentation results are processed with stop words, the Sklearn filter

Data Cleaning
First, we excluded the invalid reviews. The principles are as follows: reviews that are too short or have a few words, reviews with garbled code, and reviews with no analytical significance, which means that there are only symbols or expressions but no words, were removed. Then, we added the word segmentation and stop word library to segment and delete the stop words. Finally, 36,936 valid reviews were obtained after filtering the noise word. The city distribution of the reviews is shown in Table 1. Step 1: Jieba word segmentation Jieba word segmentation has a good segmentation effect for data in the hotel field, and the module design of Jieba segmentation is easy to use and integrate in a system, which can save unnecessary time without affecting the accuracy of segmentation [70]. Therefore, this paper selects Jieba word segmentation tool to process the cleaned data and uses Python Jieba word segmentation interface to segment the text data.
Step 2: Stop word processing In this paper, we choose the stop list of Harbin Institute of Technology and remove the modal particles in the text, as well as conjunctions such as "and" and "but" in sentences. Combined with the research data of this paper, the words of place names are added to the stop list to filter out the stop words, such as "Hotel", "Sanya", "Shanghai", "Beijing", "Xiamen", "Guangzhou", "Hangzhou", etc. and further remove irrelevant words in the text.
Step 3: Filter the noise words After the word segmentation results are processed with stop words, the Sklearn filter function is loaded to process the noise data irrelevant to the topic in the text, and the date, non-Chinese words, and numbers in the comments are filtered to obtain the preliminary data processing results.

Determine the Influencing Factors of Hotel Customer Satisfaction
(1) The TF-IDF algorithm combined with the related theory to hotel customer satisfaction is used to obtain the second-level influencing factors of hotel customer satisfaction. The TF-IDF algorithm meets the need of keyword extraction, and the more important a word is to the text, the more likely it is to be the keyword of the text [71,72]. The TF-IDF algorithm is calculated according to Equation (1): where n ij means how often word i appears in text j; |D| is the total number of texts in the text set; |D i | is the number of documents in which word i appears in the text set; and the denominator plus 1 is conducted by adopting the Laplacian smoothing, which avoids the occurrence of parts of new words that did not appear in the corpus resulting in a denominator of zero, enhancing the robustness of the algorithm. After text pre-processing, TF-IDF values for each word are taken in the next step. The pre-existing corpus data are first trained and then loaded with TF-IDF algorithm to train on corpus data [73]. In the process of training, the method of word filtering retains only noun as the key word, combined with hotel customer satisfaction related theory and research, to obtain 60 secondary influencing factors of hotel customer satisfaction, and the experimental results are shown in Table 2. (2) The Word2Vec tool was used to vectorize the words of the second-level influencing factors. The Word2Vec model can well utilize the word context information and the internal structure information of sentences, based on the neural network model to map words into a low-dimensional, dense real number vector. To get the word vector of a certain word in a field, we first pre-processed the corpus of the field; then used the API interface of Gensim module to load Word2Vec to train the word vector model, resulting in the word vector of each word; and finally extracted the quantitative representation of 60 characteristic words obtained from the previous step.
(3) Euclidean distances between the word vectors were calculated, K-means clustering algorithm was used to obtain the clustering results of the word vectors, and silhouette coefficient was used to examine the separation of the clusters and the compactness of the clusters to evaluate the clustering quality. The K value was constantly adjusted for repeated experiments, and it was found that at k = 10, its contour coefficient reached a maximum value of 0.741, indicating that the K value was optimal at this time. The clustering results are plotted in Figure 3, where the centroids are identified in ordinal sequence, the clusters of points assigned near the same centroid are the same class as the first-level influence factors, and the value of the abscissa represents the boundary range determined by the value of the second-level influence factor word vector and is dimensionless. Through the results of text cluster analysis, combined with customer satisfaction related theory and research, the influencing factors of customer satisfaction in hotels are classified into 10 categories: epidemic prevention, consumption emotion, convenience, environment, facility, catering, target group, perceived value, price, and service. Then, another 60 second-level influencing factors are classified under the first-level influencing factors, and the specific results are shown in Table 3. (3) Euclidean distances between the word vectors were calculated, K-means clustering algorithm was used to obtain the clustering results of the word vectors, and silhouette coefficient was used to examine the separation of the clusters and the compactness of the clusters to evaluate the clustering quality. The K value was constantly adjusted for repeated experiments, and it was found that at k = 10, its contour coefficient reached a maximum value of 0.741, indicating that the K value was optimal at this time. The clustering results are plotted in Figure 3, where the centroids are identified in ordinal sequence, the clusters of points assigned near the same centroid are the same class as the first-level influence factors, and the value of the abscissa represents the boundary range determined by the value of the second-level influence factor word vector and is dimensionless. Through the results of text cluster analysis, combined with customer satisfaction related theory and research, the influencing factors of customer satisfaction in hotels are classified into 10 categories: epidemic prevention, consumption emotion, convenience, environment, facility, catering, target group, perceived value, price, and service. Then, another 60 second-level influencing factors are classified under the first-level influencing factors, and the specific results are shown in Table 3.

First-Level Influencing Factors Second-Level Influencing Factors
(4) We determine the value of first-level and second-level influencing factors in each review. First, the value of second-level influencing factors of online reviews of hotels is calculated. The online reviews of hotels are data cleaned, and then each review after cleaning is contrasted with the word of the second-level influencing factors; if one second-level influencing factor is included in the review, the corresponding second-level influencing factors of this review takes the value of 1, and so on, sequentially, adding up the values of all the second-level influencing factors included in the review. Finally, the value of the firstlevel influencing factor of the hotel's online review takes the value of the corresponding second-level influencing factor additively.
(5) Customer satisfaction rating: On Ctrip.com, the review rating is the customer satisfaction rating of the hotel. Each customer evaluates the hotel as a rating of their satisfaction of the hotel, and the review rating has a value range of 1-5 points, where 5 represents the highest satisfaction. The review rating is directly crawled from the hotel's webpage, and the customer review rating is shown in Figure 2. Customer rating reflects the overall satisfaction during the hotel consumption experience, corresponding to the comments made to the hotels, and, therefore, the rating can well reflect customer satisfaction with the hotels [74].

Regression Model and Experiment Based on BP Neural Network
The first-level influencing factors of customers' online reviews are taken as independent variables (input) and customer satisfaction scores as dependent variables (output), which are divided into two parts: training set and test set. According to the training set, the customer satisfaction regression model based on BP neural network is constructed by setting and adjusting the corresponding parameters combined with empirical formula. The test set is used to calculate the regression analysis results of the model and verify the trained BP regression model.

Training Set and Test Set
K-fold cross validation method is used to determine the training set and test set, which effectively avoids over learning and under learning [75]. In this paper, K = 10, i.e., 36,936 hotel online reviews are randomly divided into 10 parts based on the satisfaction level, each part is a subset, and 10 different sets of training set and test set are obtained.

BP Network Structure and Experiments
The parameters including the number of nodes in the input layer, the number of hidden layers, the number of nodes in the hidden layer, and the number of nodes in the output layer were determined to set the BP network structure. BP neural network models can contain multiple hidden layers when constructing data relationships, but this does not mean that more hidden layers is always better. Moreover, the junction of the hidden layer in the model is actually the knowledge that the BP model learns through the dataset, and it is able to exhibit complex nonlinear relationships in the data. If the number of hidden layer nodes is too small, the model construction will fail. On the contrary, if the number of hidden layer nodes is too large, it will produce the defect of overfitting, which will make the prediction results have too large error. Therefore, to avoid overfitting in training as much as possible, sufficiently high network performance and generalization ability are guaranteed. The most basic principle for determining the number of hidden layer nodes is to take the most compact structure possible, i.e., to take the smallest possible number of hidden layer nodes, while meeting the accuracy requirements. The appropriate number of nodes was chosen to maximize the performance of the BP neural network. The empirical formula is adopted to select the number of implicit layers and nodes to determine their approximate range, which avoids blinding and improves the effectiveness of the selection [76]. In this paper, the number of hidden layer nodes is determined with reference to empirical Equation (2).
where S is the number of nodes of the hidden layer, m is the number of nodes of the input layer, and n is the number of nodes of the output layer. a is a constant belonging to 1-10. According to the data features of this paper, the input layer corresponds to the firstlevel influencing factors, which are epidemiological prevention, consumption emotion, prevention, environment, facility, catalog, target group, administered value, price, and service. Thus, the number of input layer nodes: M = 10. The output layer corresponds to the review rating with the number of nodes in the output layer: n = 1. Drawing from empirical Equation (3) and after several trials in this study, we found that, when the number of hidden layers is 1 and the number of nodes of the implicit layer is 9, the BP neural network has the smallest mean square error (MSE) and the fitting coefficient R is closest to 1. Therefore, we set the number of layers of the implicit layer to 1 and the number of nodes of the implicit layer to 9. The model of the BP neural network is shown in Figure 4. [76]. In this paper, the number of hidden layer nodes is determined with referen pirical Equation (2).

S = √ +
where S is the number of nodes of the hidden layer, m is the number of nodes of layer, and n is the number of nodes of the output layer. a is a constant belonging According to the data features of this paper, the input layer corresponds to level influencing factors, which are epidemiological prevention, consumption prevention, environment, facility, catalog, target group, administered value, p service. Thus, the number of input layer nodes: M = 10. The output layer corres the review rating with the number of nodes in the output layer: n = 1. Drawing pirical Equation (3) and after several trials in this study, we found that, when the of hidden layers is 1 and the number of nodes of the implicit layer is 9, the B network has the smallest mean square error (MSE) and the fitting coefficient R to 1. Therefore, we set the number of layers of the implicit layer to 1 and the n nodes of the implicit layer to 9. The model of the BP neural network is shown in In this study, the experimental environment selected MATLAB r2016b ne work toolbox, with the hidden layer node activation function as sigmoid, the out neuron activation function as purelin, and the training function as trainlm. Le normalize the 36,936 online review data to randomly split into 10 parts, result different sets of training and test sets. The maximum number of training is 10,0 The initial value of momentum factor is 0.95, and the initial right value is 0.3.
The basic idea of the BP neural network algorithm is the gradient descent which uses the gradient search technique so that the error of the actual output v the expected output value of the network are the smallest and the best values weights and output weights are obtained. After training the neural network,  In this study, the experimental environment selected MATLAB r2016b neural network toolbox, with the hidden layer node activation function as sigmoid, the output layer neuron activation function as purelin, and the training function as trainlm. Let K = 10, normalize the 36,936 online review data to randomly split into 10 parts, resulting in 10 different sets of training and test sets. The maximum number of training is 10,000 times. The initial value of momentum factor is 0.95, and the initial right value is 0.3.
The basic idea of the BP neural network algorithm is the gradient descent method, which uses the gradient search technique so that the error of the actual output value and the expected output value of the network are the smallest and the best values of input weights and output weights are obtained. After training the neural network, the ideal model is obtained. MSE in the training process is 0.00466, and the network error meets the requirements. The fitting coefficient R = 0.9279, the predicted value is very close to the actual value, and the fitting effect is ideal. To investigate the overall regression prediction ability of BP neural network on customer satisfaction, we calculated the goodness of fit and MSE of the model and compared it with deep belief networks (DBN), support vector machine (SVM), and random forest (RF), as shown in Table 4. The experimental results show that the MSE of BP neural network model is smaller than the other three methods, and the fitting coefficient is the largest. Therefore, BP neural network model is better than DBN, SVM, and RF in the analysis of influencing factors of customer satisfaction.

The Effect of First-Level Factors
We adopt Equation (3) to calculate the relative strength of the first-level influencing factors, i.e., the effect size of the input variables (first-level influencing factors) on the output variable (customer satisfaction) [19,77,78].
where W ki indicates the weight between the kth hidden unit and the ith input unit. W ki forms the input weight matrices. W jk denotes the weight between the jth output unit and the kth hidden unit. W jk formed the output weight matrices. R ji is the relative strength between the ith input and the jth output variable. R ji is a statistic whose value is the ratio of the strength of the relationship between the ith input variable and the jth output variable to the total strength of all input and output variables.
To estimate the right value of BP neural network, Table 5 presents the input layer to hidden layer right value Wki estimation table and Table 6 presents the hidden layer to output layer right value Wki estimation table. N1-N9 in Tables 5 and 6 indicate the nine nodes of the hidden layer. R ji was calculated by Equation (3). R ji effectively accounts for the relative strength between input and output variables in BP neural network model. Thus, the strength of the effect of first-level influencing factors in online reviews on customer satisfaction is calculated as shown in Table 7.  Through the above experimental analysis, epidemic prevention, consumption emotion, convenience, catering, target group, perceived value, price, service, and customer satisfaction are positively related, while environment, facility, and customer satisfaction are negatively related. The relative strengths of consumption emotion, perceived value, epidemic prevention, target group, and convenience are greater than 0.06, and they significantly affect customer satisfaction. The relative strength of perceived value is significantly greater than the odds ratios of other factors, indicating that the value that hotels offer to customers has the greatest impact on customer ratings. The second is consumption emotion, whereby customers value the emotional experience that the hotel's services bring to them. It is worth noting that epidemic prevention emerged as a new factor affecting customer satisfaction, which has not appeared in previous studies, but is in line with the concern of customers traveling for their own health and residential environment under the current outbreak. The environment, facility, catering, and service have less impact on customer satisfaction, while price has the least impact. This may be because customers who choose star rated hotels have stronger affordability and the price has some effect on their satisfaction, but only a few customers are conscious to express concern about price in the reviews.

Conclusions
This study analyzed 36,936 hotel online reviews by text mining and summarized 10 first-order factors affecting hotel customer satisfaction based on clustering results and the CCSI: epidemic prevention, consumption emotion, convenience, environment, facility, catering, target group, perceived value, price, and service. A regression model combining BP neural network and weight matrix was constructed to analyze the relationship between first-level influencing factors and customer satisfaction to provide practical suggestions for enhancing customer satisfaction.
With the above experimental analysis, we have some interesting findings. Through text mining of online reviews of hotels, we obtained some new second-level influencing factors, which are shown in Table 3. Combining the CCSI yields some new first-level influencing factors on customer satisfaction of hotels, which are shown in Table 3. Modeling the BP neural network and the right matrix, we obtained the relative strength of the first-level influencing factors (Table 7), and the results show that some influencing factors' effects are not consistent with the existing studies. This paper emphasizes the role of consumption emotion, perceived value, epidemic prevention, target group, and convenience in customer perception and satisfaction of hotels. which also further supports the conclusion of Wu et al. [79], who incorporated perceived value into customer satisfaction evaluation indicators. It is known from our findings that perceived value directly affects customer satisfaction and plays an important role. At the same time, we also found some factors that rarely appear in previous studies on hotel satisfaction, such as epidemic prevention, target group, and consumption emotion. Epidemic prevention reflects the current customer concerns about infectious disease prevention, with customers focusing on physical health and self-protection throughout travel. For example, hotels provide disposable medical masks free of charge and bring care and warmth to customers, which becomes an important influencing factor for customer satisfaction. Nilashi classified tourist groups into four categories, namely travelling solo, travelling with family, travelling as a couple, and travelling with friends, and suggested that different needs arise from different groups of customers [80], which is similar to the conclusion of our study. Target group was often mentioned in online reviews, referring to groups traveling for concern in a customer perspective, such as families, parent-child, children, and friends. This suggests that customers generally enjoy travel times with loved ones and are more concerned about the feelings and experiences of their peers than with themselves. Consumption emotion refers to a series of emotional reactions that arise during product use and consumption experience. In the whole consumption process, all the links affect customers' consumption emotion. The good and bad emotional experiences of customers directly determine the high and low levels of satisfaction. Convenience is also an important factor influencing customer satisfaction, where airport and subway have become the keys for people to choose hotels. Airport helps customers arrive at their destination city quickly, and subway helps customers travel through the city for easy congestion. Price is a factor that often emerges in satisfaction research [81,82]. In previous studies, price is one of the important factors affecting customer satisfaction and purchase decisions in hotels [3]. However, this study shows that price has little relevance to our systematic objectives, that is, price has less impact on customer satisfaction in hotels. It can be seen that customers who choose hotels with three stars or more have generally high economic strength and are not price sensitive. Besides price, customers value perceived value and consumption experience more. Environment, facility, and catering factors that are frequently observed in previous studies on customer satisfaction are still influencing customers' ratings of the hotel, but they are less influential compared to consumption emotion, perceived value, epidemic prevention, target group, and convenience.
The experiment in this paper shows that environment and catering are negatively related to customer satisfaction, which indicates that customer dissatisfaction is mainly focused on environment and catering. The negative correlation between environment and satisfaction may be that some hotels ignore customers' need for shopping and playing in accommodation. Customers have a strong shopping demand during travel, prefer to take some of the more characteristic goods home, and prefer to buy them at larger shopping malls. Catering is negatively related to satisfaction, and many customers are not satisfied with the breakfast at the hotel, feeling that the type of breakfast is less diverse and lacked fruits.

Theoretical Significance
First, this paper combines Chinese customer satisfaction index, natural language processing, BP neural network, and weight matrix calculation to construct a new theoretical model to study the influencing factors of hotel customer satisfaction, and then discusses the proposed research questions. The main purpose of this study was to verify and identify the influencing factors that affect customer satisfaction of hotels based on online reviews, as well as to explore the influence and polarity of these influencing factors. The model of hotel customer satisfaction constructed in this paper can be realized to extend to other research areas in the future.
Second, from the perspective of influencing factor acquisition methods, the authentic evaluations of customer expressions in online reviews are extracted using text mining techniques, which are more objective and authentic for data acquisition than traditional questionnaires. Sometimes customers do not fill out questionnaires, but go to their hotel ordering site to share their positive or negative experiences. Therefore, finding the feelings and experiences of hotels in online reviews of customer hotels is the premise and key to research on customer satisfaction. Third, from a methodological perspective, the BP neural network and the weight matrix can be used in the study of hotel customer satisfaction. BP neural networks employ a data-driven adaptive approach that requires only few a priori assumptions about the model. Meanwhile, the application of the right matrix calculation makes the neural network more suitable than the linear model to describe the relationship between complex variables. The BP neural network and the weight matrix in this study can help hotels discover the decisive factors of customer satisfaction by analyzing a large number of online reviews. Therefore, this paper combines text mining, BP neural network and weight matrix to study hotel customer satisfaction, which is a meaningful attempt.

Management Implications
Review rating is a direct measure of customer satisfaction. Text reviews are indirect measures of customer perceptions and satisfaction that adequately reflect customer consumption experiences. The results of this study can motivate hotels to mine more attributes from customer text reviews and deeply investigate the relationship between online reviews and customer satisfaction. These online reviews produced a high word-of-mouth effect that influenced reservation decisions of future customers. This study analyzed the factors that affect customer satisfaction of hotels in online reviews, with some practical management implications.
First, new needs of customers are identified from online reviews in a timely manner. Crawler is employed to obtain online reviews of customers, which is more real-time and excavates new factors affecting customers through natural language processing. Then, BP neural network and weight matrix are adopted to calculate the accurate analysis of the influence size and polarity of these new factors. This study shows that pandemic prevention factors become new factors affecting customer satisfaction, and the relative impact strength of pandemic prevention is even higher than convenience. In the period of COVID-19, pandemic prevention is a problem for everyone. Hotels need to provide measures against pandemic to meet customers' new needs, for example, providing disposable medical masks free of charge every day during the hotel stay, and disinfecting rooms and public areas daily.
Second, hotels should accurately implement online marketing. The results of this study indicate that the high relative impact strength of the target group is one of the key factors affecting consumer satisfaction. Different types of customers have different consumption preferences and target customers can be accurately located through online reviews. Therefore, hotel managers should take different steps to provide differentiated services to maximize the demand for different customers. According to changes such as time and holidays, combining demand differences of different groups and consumption pain points provides a differentiated portfolio of products, e.g., introducing parent-child rooms, couple rooms, business rooms, etc.
Third, hotels must keep up with customer consumption changes. This study showed that customers of Hotels with three or more stars are insensitive to price factors, which means that managers cannot rely excessively on price to mobilize consumer perceived value, but instead they should focus more time and effort on other factors to enhance customer satisfaction by improving service quality, facility quality, and creating a more convenient environment.
Fourth, hotels are to profit in cooperation. Hotels should work with their surrounding neighbors, as this study showed that malls, walking streets, and the quantity and quality of attractions are all important external factors in a hotel. Customers, in addition to accommodation needs, also have the need to shop and play during travel. Hotels can cooperate with local malls or specialty shops to launch the hotel's memorial set of goods. Hotels can also cooperate with scenic spots and open direct free shuttle bus from hotels to scenic spots.

Future Research Directions
The limitations of this paper are its research scope and general applicability. The current study data come from hotel online reviews in six cities on carrier ship online, and whether these results are applicable to other regions and web ordering platforms remains to be verified. Whether the conclusion that price is not a major factor affecting customer satisfaction is applicable to hotels in other regions remains to be examined. Future research can further investigate using hotel reviews from other regions and web ordering platforms as research objects to take review data from other websites (e.g., www.booking.com and www.tripadvisor.cn).