Investigating Key Attributes in Experience and Satisfaction of Hotel Customer Using Online Review Data

: With the development of social media, customers are sharing their experiences, and it is rapidly spreading as a form of online review. That is why the online review has become a signiﬁcant information source a ﬀ ecting customers’ purchase intention and behavior. Therefore, it is important to understand the customer’s experience shown in the online review in order to maintain sustainable customer satisfaction and loyalty. The purpose of this study is to investigate what are the key attributes and the structural relationship of those key attributes. To accomplish this purpose, a total of 6596 hotel reviews were collected from Google (google.com). A frequency analysis using text mining was performed to ﬁgure out the most frequently mentioned attributes. In addition, semantic network analysis, factor analysis, and regression analysis were applied to understand the experience and satisfaction of the hotel customer. As a result, the top 99 keywords were divided into four groups such as “Intangible Service”, “Physical Environment”, “Purpose”, and “Location”. The factor analysis reduced the dimension of the original 64 keywords to 22 keywords, and grouped them into ﬁve factors, which are “Access”, “F&B (Food and Beverage)”, “Purpose”, “Tangibles”, and “Empathy”. Based on these results, theoretical and practical implications for sustainable hotel marketing strategies are suggested.


Introduction
Nowadays, reviewing a product or service online is common for customers. In the online review, customer leave text comments with a numeric rating to briefly indicate their evaluation of the product or service [1]. The review can be called word of mouth (WOM). In the Internet era, the effect of WOM has been further enhanced in the form of electronic word of mouth (eWOM), and it has developed substantially [2]. The eWOM is a new method to identify the main attributes of service quality from a customer's perspective. The eWOM is the result of a summary of the customer's experience and is usually written voluntarily without any economic cost or external stimulus [3]. This eWOM is used by customers who have experienced a particular service to help other customers make the right choice. Therefore, the experience of the services mentioned in the eWOM implies the main attributes and quality levels of the product or service that the customer considers [4]. Accordingly, online review mining research is actively under way to extract customer information needed to develop new products

Customer Experience and Satisfaction
Customer satisfaction is a complex customer experience in the service industry and can be defined as an evaluation of what the customers have experienced [5]. Understanding what customers expect from a service industry is important in order to provide a standard of comparison against which customers consider an organization's performance regarding the expectation [7]. Service quality can be defined as a customer's overall impression of the relative efficiency of the organization [21]. In addition, customer satisfaction can be defined as an experience made on the basis of a specific service encounter, and it contributes to loyalty, repeat purchase, positive WOM, and ultimately higher profitability [22].
Hotel selection attributes are what a customer considers important among the many attributes a hotel has when choosing a hotel [23]. Hotel selectivity is a determinant that is particularly preferred and considered by the customer and is defined as a target for assessing satisfaction and dissatisfaction [20,24]. Therefore, identifying a customer's hotel selection attributes is part of the effort to improve the quality of service and increase customer satisfaction to gain competitive advantage [25].

Electronic Word of Mouth (eWOM)
eWOM is an Internet-based development in which the delivery of reviews takes place on the Internet [19]. This refers to the online exchange of product information and experience [21]. Unlike WOM, eWOM is very influential to customers as it can be written and edited without the limitations of time and space. Through eWOM, customers share their product reviews with others through social network service (SNS) and online communities regardless of commercial interests. Unlike one-sided advertising, an actual review or specific information from customers with experience is greatly influenced by customers' selection attributes [26]. Through the Internet, people can freely produce and share their experiences and opinions with potential customers at all times.
The service features inseparability where purchases and consumption take place at the same time and intangibility being invisible, unlike ordinary products [27]. Therefore, customers want more specific and practical information to minimize failure. At this time, people will listen to more diverse and specific opinions from a wider group of experienced people through Internet searches, beyond the limited information they get from the people around them [28]. According to Klein [29], this is because customers rely more on the reviews of the actual customers when purchasing experience products than when buying general merchandise. Hu et al. [17] said that online reviews are far more interesting, reflect the best information, and are more reliable than information provided by the company. In other words, customers are more sympathetic to the experiences they have left behind than the promotional information provided by the company. This online review is used as a new source of information for other customers looking for information about the hotel. Accordingly, reviewers will play a role as opinion leaders, whether or not they are meant to be.

Text Mining and Semantic Network Analysis
The text mining refers to discovering unknown useful patterns and knowledge in text using information retrieval, information extraction, and natural language processing techniques [30]. The general process of text mining consists of the steps of data collection, data refining, data analysis, and management information system as shown in Figure 1. Data collection identifies and clarifies the type of information that researchers seek to find. Then, it is important to limit the range of data you want to collect and to be familiar with the characteristics of keywords. The data refining process is the process of converting unstructured text data into structured forms. When the analysis stage is used to analyze text based on technologies such as information extraction, clustering, and categorization, it is utilized as a management information system and accumulated as knowledge [31,32]. products than when buying general merchandise. Hu et al. [17] said that online reviews are far more interesting, reflect the best information, and are more reliable than information provided by the company. In other words, customers are more sympathetic to the experiences they have left behind than the promotional information provided by the company. This online review is used as a new source of information for other customers looking for information about the hotel. Accordingly, reviewers will play a role as opinion leaders, whether or not they are meant to be.

Text Mining and Semantic Network Analysis
The text mining refers to discovering unknown useful patterns and knowledge in text using information retrieval, information extraction, and natural language processing techniques [30]. The general process of text mining consists of the steps of data collection, data refining, data analysis, and management information system as shown in Figure 1. Data collection identifies and clarifies the type of information that researchers seek to find. Then, it is important to limit the range of data you want to collect and to be familiar with the characteristics of keywords. The data refining process is the process of converting unstructured text data into structured forms. When the analysis stage is used to analyze text based on technologies such as information extraction, clustering, and categorization, it is utilized as a management information system and accumulated as knowledge [31,32]. Research using text mining in the hospitality industry has been active recently. Joseph and Varghese [33] analyzed user reviews to understand various aspects that drive customer satisfaction using text mining analysis. Hu et al. [17] analyzed 27,864 hotel reviews in New York City, and the results showed that customer complaints for high-end hotels are mainly related to service issues. Kuhzady and Ghasemi [34] used text mining to analyze online hotel reviews for Mazandaran province in Iran. Berezina et al. [35] analyzed the factors of satisfaction and dissatisfaction of U.S. hotel customers, and Philander and Zhong [36] used text mining to analyze customer's feelings about a Las Vegas resort on Twitter.
Semantic network analysis can be described as the use of network analytics techniques based on shared meaning as opposed to paired associations of behavioral or perceived communication links [37][38][39]. In other words, text is coded into a network to identify the structural relationship between words. Semantic network analysis can identify the inherent meaning of the context that has not been revealed. Moreover, it is possible to understand the degree of influence of a particular word through frequency and cluster analysis of words and to analyze how a particular word affects the relationships between groups [40,41]. Semantic network analysis, as a method of quantitative textual analysis, provides an impressive theoretical and methodological foundation with which to describe the semantic nature of the online review [42].

Data Collection
The data collection procedure used in this study was as follow. The first step was to collect the research data. The online hotel reviews were obtained from google.com, which is the largest search engine in the world. That is why it is easy to access and share the online hotel review. The data was collected by web crawling, and the web crawler was written in the Python 2.7. The server operating Research using text mining in the hospitality industry has been active recently. Joseph and Varghese [33] analyzed user reviews to understand various aspects that drive customer satisfaction using text mining analysis. Hu et al. [17] analyzed 27,864 hotel reviews in New York City, and the results showed that customer complaints for high-end hotels are mainly related to service issues. Kuhzady and Ghasemi [34] used text mining to analyze online hotel reviews for Mazandaran province in Iran. Berezina et al. [35] analyzed the factors of satisfaction and dissatisfaction of U.S. hotel customers, and Philander and Zhong [36] used text mining to analyze customer's feelings about a Las Vegas resort on Twitter.
Semantic network analysis can be described as the use of network analytics techniques based on shared meaning as opposed to paired associations of behavioral or perceived communication links [37][38][39]. In other words, text is coded into a network to identify the structural relationship between words. Semantic network analysis can identify the inherent meaning of the context that has not been revealed. Moreover, it is possible to understand the degree of influence of a particular word through frequency and cluster analysis of words and to analyze how a particular word affects the relationships between groups [40,41]. Semantic network analysis, as a method of quantitative textual analysis, provides an impressive theoretical and methodological foundation with which to describe the semantic nature of the online review [42].

Data Collection
The data collection procedure used in this study was as follow. The first step was to collect the research data. The online hotel reviews were obtained from google.com, which is the largest search engine in the world. That is why it is easy to access and share the online hotel review. The data was collected by web crawling, and the web crawler was written in the Python 2.7. The server operating system was Ubuntu 16.04 LTS from google.com. Several types of data were collected including hotel brands, writer's identification code, written date, overall score, and review as shown in Figure 2. The overall score was defined as the customer satisfaction in this study.
Sustainability 2019, 11, x FOR PEER REVIEW 4 of 14 system was Ubuntu 16.04 LTS from google.com. Several types of data were collected including hotel brands, writer's identification code, written date, overall score, and review as shown in Figure 2. The overall score was defined as the customer satisfaction in this study. Initially, 19,332 reviews were collected in top 25 hotels in the world. The ranking information was from TripAdvisor (tripadvisor.com). The top 25 hotels and numbers of reviews are presented in Table 1. After deleting online reviews of languages other than English, there were 6597 online reviews left and a total of 189,571 words were extracted. The period of collected data was for five years from August 2014 to July 2019.

Data Analysis
As for the data analysis of this study, first, the text mining technique was applied to obtain the word frequency from online hotel reviews. The articles, prepositions, and pronouns that are meaningless were excluded, and words that pertained to hotel experience were contained in the refined data. The top 99 frequent words were manually selected after refining the collected data. Furthermore, the distribution of hotel experience evaluation based upon overall satisfaction score was performed to generally ascertain the satisfaction level of hotel experience, and this overall score was used as a dependent variable because its value can be treated as a main output variable. In addition, the word matrix (keywords × keywords) was deduced for further data analysis.
Secondly, the semantic network analysis of these top 99 frequent words was performed using the UCINET 6.0 package with the visualization tool named NetDraw. Furthermore, Freeman's degree centrality and Eigenvector centrality were chosen for illustrating the semantic network. Finally, CONCOR (CONvergence of iterated CORrelation) analysis was conducted to obtain the subgroups of these words so as to understand these interwoven correlations with each other and figure out the facets that customers are interested in.
At last, the results of semantic network analysis were synthesized to select words for further factor analysis and linear regression analysis with a dummy variable. Factor analysis was performed to derive the main factors affecting hotel satisfaction using 64 words out of the 99 top-frequency Initially, 19,332 reviews were collected in top 25 hotels in the world. The ranking information was from TripAdvisor (tripadvisor.com). The top 25 hotels and numbers of reviews are presented in Table 1. After deleting online reviews of languages other than English, there were 6597 online reviews left and a total of 189,571 words were extracted. The period of collected data was for five years from August 2014 to July 2019.

Data Analysis
As for the data analysis of this study, first, the text mining technique was applied to obtain the word frequency from online hotel reviews. The articles, prepositions, and pronouns that are meaningless were excluded, and words that pertained to hotel experience were contained in the refined data. The top 99 frequent words were manually selected after refining the collected data. Furthermore, the distribution of hotel experience evaluation based upon overall satisfaction score was performed to generally ascertain the satisfaction level of hotel experience, and this overall score was used as a dependent variable because its value can be treated as a main output variable. In addition, the word matrix (keywords × keywords) was deduced for further data analysis.
Secondly, the semantic network analysis of these top 99 frequent words was performed using the UCINET 6.0 package with the visualization tool named NetDraw. Furthermore, Freeman's degree centrality and Eigenvector centrality were chosen for illustrating the semantic network. Finally, CONCOR (CONvergence of iterated CORrelation) analysis was conducted to obtain the subgroups of these words so as to understand these interwoven correlations with each other and figure out the facets that customers are interested in.
At last, the results of semantic network analysis were synthesized to select words for further factor analysis and linear regression analysis with a dummy variable. Factor analysis was performed to derive the main factors affecting hotel satisfaction using 64 words out of the 99 top-frequency words, which will be considered as variables for the linear regression analysis. Moreover, the linear regression analysis comprised six independent variables retrieved from the factor analysis and the overall score for each review as a dependent variable was employed to test the hypothesis: The hotel experience represented in online reviews can be used to explain customer satisfaction.

Semantic Network Analysis
The semantic network analysis identifies the relationship between words and expresses the connection between them. The centrality and CONCOR analyses of keywords were performed. In the online hotel review, among the top 99 frequent words, the results of an analysis of the degree and eigenvector centrality of the words are described in Table 3.

Semantic Network Analysis
The semantic network analysis identifies the relationship between words and expresses the connection between them. The centrality and CONCOR analyses of keywords were performed. In the online hotel review, among the top 99 frequent words, the results of an analysis of the degree and eigenvector centrality of the words are described in Table 3.

Semantic Network Analysis
The semantic network analysis identifies the relationship between words and expresses the connection between them. The centrality and CONCOR analyses of keywords were performed. In the online hotel review, among the top 99 frequent words, the results of an analysis of the degree and eigenvector centrality of the words are described in Table 3.
The degree centrality is a simple centrality measure that counts how many neighbors a node has and refers to the degree to which a word has many connections and becomes central, and the more connections it has, the greater its impact on other words and the more dominant it can be [43]. The eigenvector centrality extends the concept of connective centrality by considering not only the number of words connected, but also how important a connected relationship is. Thus, it is a useful indicator for finding the most influential central node in the network [44]. It is sometimes used to measure a node's influence in the network. It performs matrix calculations to determine adjustments. The result is that 'staff', 'service', 'room' recorded top in both degree and eigenvector centrality. The word 'breakfast' recorded a lower rank in frequency and degree centrality than in eigenvector centrality. The words 'spa', 'facility', 'quality', 'bar', 'amenity' recorded higher in degree and eigenvector centrality than in frequency. However, the word 'luxury' recorded higher in frequency than in degree and eigenvector centrality.
The CONCOR analysis is the connection of the relationship and discovering patterns between words, and the greater the similarity of the connection relationship patterns, the greater the degree of structural equivalence of the other words. It forms clusters that include keywords with similarities to each other [45]. In other words, the CONCOR analysis is a method of repeatedly analyzing correlations to search certain levels of similarity groups. This study identifies the blocks of nodes according to the correlation coefficient of the matrix of the concurrent keywords and forms clusters that include keywords with similarities [5]. The keywords extracted from the frequency histogram according to the frequency ranking were used and a matrix was constructed. To visualize the results, NetDraw in UCINET 6.0 program was applied. The nodes are presented as blue squares and their sizes indicate their frequency, and the network shows the connectivity between them.
The result of the CONCOR analysis is shown in Figure 5 with visibility. There are four groups that were intricately interwoven with each other. After looking at the words in the group, the group was named as "Intangible Service", "Physical Environment", "Location", and "Purpose".

Factor Analysis
The factor analysis can discover the commonalities among these keywords and show connection of variables through the variance of keywords within the same online hotel review. The purpose of the factor analysis is to reduce a large number of variables into smaller factors using an oblique rotation process. Common factorial criteria were used in extracting the factors, and the minimum factor loading was set to 0.400 in the final model. The factors also had to have Eigenvalues greater than 1.0 and had to explain a substantial percentage of total variance. Eight elements had low scores, which is less than 0.3. Ten elements loaded on two factors. Therefore, a total of 18 elements were excised from the 64 keywords. Finally, five factors with 22 keywords covering 22.099% of all variance to elicit the hotel experience were generated through the process of elimination of five times. Table 5 shows the results of the factor analysis with 0.651 of KMO (Kaiser Meyer Olkin), which is higher than 0.6. Therefore, it was verified that the use of the factor analysis was suitable for this study. The Bartlett's test of sphericity value (X 2 ) was 113,025.397, with overall significance of the correlation matrix (p < 0.001). This result showed that the data did not produce an identity matrix, and it was multivariate normal and fit for using exploratory factor analysis. The five factors were named as "Access (Factor 1)", "F&B (Factor 2)", "Purpose (Factor 3)", "Tangibles (Factor 4)", "Empathy (Factor 5)". Factor 1 contains 'locate', 'town', 'Amira', 'walk', which are related to the accessible location. Factor 2 has 'food', 'breakfast', 'drink', 'dining' which is related with F&B in the hotel. Factor 3 was about purpose containing 'weekend', 'birthday', 'anniversary', 'honeymoon'. In addition, Factor 4 consisted of aspects concerning intangibles, such as 'pool', 'spa', 'beach', 'restaurant', 'bar', 'room'. Factor 5 has 'staff', 'care', 'service' and 'friend', which is related with empathy.

Linear Regression Analysis
After the factor analysis, a customer experience and satisfaction was derived by using regression analysis as shown in Table 6. It has five independent variables: Access (A), F&B (FB), Purpose (P), Tangibles (T), Empathy (E), and one dependent variable: Customer Satisfaction (CS). The overall variance explained by the five predictors was 12% (R 2 = 0.120) and the standard error of the estimated value was calculated as 0.51021. The correlation between the independent and dependent variables was rather low because many factors affecting customer experience and satisfaction might not have been included among the five factors due to their low frequency in the online hotel reviews. In regression models related to opinion mining it is impossible to include all relevant variables to estimate output variables such as opinion from text mining data. Therefore, the R 2 value can be low [46][47][48]. There is a prior study saying that the R 2 value is 12.5%. This study also analyzed online review data for washing machines and used regression analysis and factor analysis [5].
"F&B (FB, β = 0.036, p = 0.003)", "Purpose (P, β = 0.029, p = 0.017), and "Empathy (E, β = 0.087, p = 0.000)" are significant at the p < 0.05 level and positively related with customer satisfaction. In order to estimate the possible correlations among the predictors, a multicollinearity statistic was conducted. The tolerance level was less than 1.00, and the variance inflation factor (VIF) of the predictors was between 1.00 and 1.10, respectively, that is, the predictors were not significantly correlated to each other. Therefore, based on unstandardized β, the regression equation can be expressed as: The "Empathy (E)" factor holds the highest standardized coefficients, which means this experience aspect of the hotel customer is the most important factor associated with customer satisfaction significantly. Reviews such as "Fantastic natural surroundings and friendly staff give you the best time" and "The thing that made this place so great for us was the staff service, especially our personal Concierge" are related to the hotel experience based upon "Empathy" attributes.

Discussion
This study was conducted to enhance the customer's experience and satisfaction using online hotel reviews. For the online hotel review data analysis, the first process was extracting keywords by text mining and the second was calculating the frequency of words used by customers. Based on the frequency analysis, the degree and eigenvector centrality of top 99 frequent words were analyzed to search their connection and the most affected keywords among them. The CONCOR analysis was performed for grouping them. As a result, the top 99 keywords were divided into four groups, namely, "Intangible Service", "Physical Environment", "Purpose", and "Location". Moreover, they were visualized by drawing networks and nodes using NetDraw in UCINET 6.0. In addition, the study conducted factor analysis and linear regression analysis to understand the relationship between extracted factors and customer satisfaction. The factor analysis reduced the dimension of the original 64 keywords to 22 keywords and grouped them into five factors, which are "Access", "F&B", "Purpose", "Tangibles", and "Empathy". The clusters can be related between CONCOR analysis and factor analysis, such as "Intangible Service" with "Empathy", "Physical Environment" with "Tangibles", "Location" with "Access".
First of all, the group representing the highest beta coefficient was "Empathy" in the linear regression analysis, and the related words were 'staff', 'service', 'care', and 'friend' through the factor analysis. Especially, 'staff' and 'service' were most frequent words in the online hotel review, and through the CONCOR analysis, "Intangible Service" group was the biggest group compared with "Physical Environment", "Purpose", and "Location". According to Lee [49] the service from hotel staff is the most important attribute to making customers satisfied rather than the other luxurious or new facilities. In addition, Han and Chung [50] examined customers satisfied with excellent service by hotel staff rather than physical environment, such as room cleanness, comfort, and room condition. The results of this study show the same results as many prior studies show that intangible service has the greatest impact on customer experience and satisfaction. Therefore, Service by staff is an essential key attribute to create a good reputation in the service industry and can still be seen as a part of the hotel that must be managed at all times to keep up the image of the hotel. Therefore, it is important to improve the attitude of employees through systematic service training. In addition, providing an appropriate working environment to enhance employee satisfaction to produce better service to customers can be another way.
The second highest beta value was "F&B" in the linear regression analysis, and the related words were 'food', 'breakfast', 'drink', and 'dining' through the factor analysis and those keywords are recorded a very high position in the frequency analysis. In hotels with a high satisfaction index, F&B facilities should also be well equipped and restaurants providing high-quality food to customers are significant. Especially, as keywords for breakfast are derived, it will be helpful for hotel business to focus on breakfast rather than on other meals.
This study shows the academic implication that the study has extended its application area of the semantic network analysis. While given the significance of the hotel segment in the tourism industry, this study empirically explores hotel experience and satisfaction by big data analytics. Along the way, the hotel industry has the opportunity to gain an understanding of attributes on the online review, so as to infiltrate into this market and investigate corresponding marketing strategies for their significant advantages. Understanding online reviews as a manifestation of customers' experiences can help the hotel industry to identify the main attributes required to achieve positive post-purchase behaviors and to minimize negative intentions. Thus, the online reviews not only provide an efficient way for the hotel industry to collect feedback from hotel customers, but also provide an opportunity to discover how to generate positive intent after the experience. To create a high satisfaction score and a positive eWOM, the hotel industry should consider "F&B", "Purpose", and "Empathy". Among them, "Empathy" was the most influential attribute in the regression analysis. These key factors may be used to examine the customer satisfaction or to test theoretical models to have a better understanding of a hotel customer's behavior.
In practice, the analysis of online reviews can be used as a marketing tool by managers since customer review is an important source for a hotel to improve service and to create some promotion regarding profit. The analysis also provides the level of importance of these service attributes so the hotel industry can allocate their resources accordingly. The online review analyses can provide reliable satisfaction assessment. The hotel industry can also use this method to analyze their competitors' customer reviews so that they can benchmark themselves against competitors in terms of customer satisfaction. These reviews can be used for sustainable strategic marketing decisions against competitors.
However, this study shows limitations in data collection. Firstly, the data collected in this study is limited, because this study focuses on only the top 25 hotels in the world for the sample. That is why future research should collect online review data from more hotel industries to generalize the findings. Secondly, the collected text was analyzing based on the frequency of individual words, therefore, it is difficult to understand the additional meaning of words. In future studies, further analysis of positives and negatives and sentimental analysis is expected to be carried out to better understand the customer's experience and satisfaction. Therefore, it can provide stronger strategies to the hotel industry.