A Hybrid Method with TOPSIS and Machine Learning Techniques for Sustainable Development of Green Hotels Considering Online Reviews

: This paper proposes a hybrid method for online reviews analysis through multi-criteria decision-making, text mining and predictive learning techniques to ﬁnd the relative importance of factors a ﬀ ecting travelers’ decision-making in selecting green hotels with spa services. The proposed method is developed for the ﬁrst time in the context of tourism and hospitality by this research, especially for customer segmentation in green hotels through customers’ online reviews. We use Self-Organizing Map (SOM) for cluster analysis, Latent Dirichlet Analysis (LDA) technique for analyzing textual reviews, Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) for ranking hotel features, and Neuro-Fuzzy technique to reveal the customer satisfaction levels. The impact of green hotels with spa and non-spa services on travelers’ satisfaction is investigated for four travelling groups: Travelled solo, Travelled with family, Travelled as a couple and Travelled with friends. The proposed method is evaluated on the travelers’ reviews on 152 hotels in Malaysia. The ﬁndings of this study provide an important method for travelers’ decision-making for hotel selection through User-Generated Content (UGC) and help hotel managers to improve their service quality and marketing strategies. Regression (MLR) techniques for methods comparison. The experiments were performed using a tenfold cross-validation approach. The results


Introduction
Over the past decade, green practices have been important in the hospitality and tourism sector [1,2] and gained much market interest in this sector for green or eco-friendly products/services [3,4]. Because of the increased interest among customers in tourism, friendly products/services have attracted much attraction among hotel managers and service providers [5]. To be successful in this marketing, detecting the customer behavioral intentions in purchasing green products is important as it can assist them in the decision-making process to receive high-quality services [6]. In addition, it is important to investigate the factors for tourists' intentions to participate in green tourism [7].
Green tourism is defined as "tourism that enhances the local cultural elements, operating under the control of local communities, providing employment and maintaining the economic benefits within the local communities" [8]. Green tourism and its associated businesses have been one of the most profitable hospitality sectors for many tourist destinations. In such a progressively competitive market, the main emphasis of hoteliers and practitioners is to magnetize more green travelers (i.e., travelers who are willing to stay in environmentally friendly hotels) by motivating them to make repeat purchases through service strategies/efforts. Therefore, identifying important factors influencing green tourists' purchase decision-making behaviour and exploring their specific role are becoming more and more critical for any destination country and its related green economy sectors.
Customers' satisfaction has long been regarded as a significant factor in explaining customer purchase behavior. This variable actively contributes to forming favorable intentions toward a business and affecting loyalty and retention of customers [9]. If users assess their overall consumption experiences positively, their level of satisfaction and readiness/willingness to repurchase will rise [10]. Providing satisfactory experiences for tourists is a key success in the international tourism market, as well as in the wellness tourism industry. Studies in the wellness tourism market consider satisfaction as a cognitive or emotional judgment prior to purchase [11,12] and after consumption evaluation [13]. However, in marketing literature, satisfaction is defined as "a favorable reaction emerging from a positive assessment of consumption experiences" [14]. This definition is more commonly agreed upon by many researchers, and thus generally followed in existing literature [15]. Despite the importance of customer satisfaction, studies surrounding travellers' behavior intentions remains insufficient in the literature [16]. In spite of the significant progress in the tourism market, only a limited number of studies focused on users' behaviour, preference, and satisfaction toward green tourism products. In fact, a question still remains with regard to the causal relationship between the green hotels with wellness services and customer satisfaction.
The process of extracting consensus opinions from online reviews and analyzing big amounts of travellers' information in social networking sites has been a challenge in many studies in the tourism and hospitality context. In fact, extracting the dimensions of satisfaction through opinion mining from text-based review comments has been one of the main aims of these studies [17,18]. This study continues the previous work on the online reviews analysis for extracting useful information from the past users' experiences on the quality of green hotels services and their facilities for understanding the consumer preferences and demand. We focus on the online reviews and ratings of travellers on the green hotels with spa services and non-spa hotels. We further investigate the role of eco-friendly (green) products on the travellers' satisfaction. In addition, we try to reveal the impact of eco-friendly services on four types of travelers: Travelled solo, Travelled with family, Travelled as a couple and Travelled with friends. From a marketing perspective, the proposed method can be effective in customers' segmentation to assist the hotels' managers in better understanding the customers' need in eco-friendly hotels.
To perform the data analysis, we apply several data mining and learning techniques. To better extract the knowledge from the travellers' reviews, we adopt a clustering analysis. In addition, to effectively extract the dimensions of satisfaction from the hotels' reviews, we use an unsupervised learning approach, Latent Dirichlet Analysis (LDA), which is based on Bayesian learning algorithm. To find the importance level of spa service on the customers' satisfaction, we use Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) for ranking hotel features and a neuro-fuzzy algorithm, Adaptive Neuro-Fuzzy Inference System (ANFIS) in Matlab (R2016a, the Mathworks Inc., Natic, MA, USA), to learn from the extracted dimensions and general attributes of the eco-friendly hotels with spa and non-spa services. Therefore, the research hypothesis which this research aims Sustainability 2019, 11, 6013 3 of 21 to investigate is: "There is a difference in the customers' satisfaction level in Malaysian eco-friendly hotels with spa and non-spa services" Overall, the contributions of this work are as follows: We develop a new method through clustering, text mining, multi-criteria decision-making and supervised learning approaches for customer segmentation in eco-friendly hotels. Multi-criteria decision-making approaches are effective in assessing the alternatives where many criteria are available. The combination of these techniques for the customer segmentation in eco-friendly hotels is developed for the first time in the context of tourism and hospitality by this research. In contrast with previous research which is mainly based on qualitative and qualitative statistical analysis through the interview and survey, the proposed method uses machine learning techniques for revealing the customers' satisfaction level and predicting their preferences through customers' online reviews.
We use the clustering approach to segment the customers according to their online reviews on the quality aspect of green hotels. In addition, we use Self-Organizing Map (SOM) [19], which is a robust clustering technique based on the neural network approach for customers' segmentation from big datasets. i.
We use a text mining approach, LDA [20], to discover satisfaction dimensions from text-based online reviews. The LDA has shown its effectiveness in text-based reviews in e-commerce [21] and tourism and hospitality [17]. Detecting customers' behaviour from user-generated content is important as, in many social networking sites, this way is extensively used to collect customers' feedback on the products. In the context of non-green tourism, several studies have been conducted to reveal the customers' satisfaction from customers' online reviews [22,23]. However, there is a lack of research on the detection of customers' behaviour in green tourism from online reviews. This research fills this gap by developing a new method using the LDA with the aid of supervised machine learning technique. ii.
We use a Multi-Criteria Decision-Making approach, TOPSIS [24], for ranking hotel features in each segment discovered by SOM clustering technique. This technique has been widely used in decision-making problems [24][25][26]. In contrast with the other Multi-Criteria Decision-Making approach (e.g., Analytic Hierarchy Process and Analytic Network Process), which are based on pairwise comparisons, TOPSIS technique is suitable for data analysis of this research as the data are collected in a five-point Likert scale to represent the travelers' preferences. iii.
We use a neuro-fuzzy approach, ANFIS [27], to predict the customers' preferences through their online reviews in each segment discovered by SOM clustering technique. In this research, ANFIS, which takes the advantages of neural network and fuzzy logic approaches to learn the prediction models, is used for preference prediction through customers' numerical ratings.
The rest of the paper is organized as follows. Section 2 provides related work. Section 3 provides the proposed research method. In Section 4, we provide the results of our data analysis. In Section 5, we provide the discussions. Finally, discussions and conclusions are provided in Section 6.

Related Work
Customer segmentation has been important in marketing research [28,29]. Accordingly, many methods have been developed by machine learning, statistical, and decision-making techniques in this field [30][31][32]. Such methods have also been effective in the tourism context, especially in sustainable and wellness tourism [33].
In [32], the authors focused on the customer knowledge mining for tourism for new product development. They used the Apriori algorithm to discover association rules and clustering analysis for data mining. In [34], the authors conducted a study on tourist market segmentation with linear and non-linear techniques. They used SOM neural networks for segmenting tourist markets and backpropagation neural networks for tourists' classification. In [33], the authors developed a method for travellers' segmentation and choice prediction through online reviews. They used TOPSIS and machine learning techniques and evaluated the method on the travelers' online reviews of Wellington's hotels in New Zealand. In a spa hotel context, the authors in [35] focused on the effect of atmospheric cues and involvement on pleasure and relaxation. Accordingly, they proposed a new model through several constructs: Atmospheric cues, Centrality to lifestyle, Pleasure, Relaxation, Satisfaction, Self-expression and Word-of-mouth. They evaluated the model presented through the thermal spa hotels. They performed the data analysis using a structural equation model approach through Partial Least Squares (PLS). They found that atmospheric cues and involvement are important antecedents of relaxation and pleasure. In addition, they found that the feeling of relaxation is more important to lead to satisfaction than pleasure. In [36], the authors conducted a study on food and beverage service quality in spa hotels. They developed an instrument and validated through a self-administrated questionnaire which was distributed to the 331 customers at four different spa hotels in Balikesir. The results of their study showed six quality dimensions: assurance and employee knowledge, healthy and attractive food, empathy, tangibles, responsiveness of service delivery, and reliability. In a study by [37], the authors focused on stainable hotel practices and nationality, and investigated its effect on guests' satisfaction and intention to return. They performed 329 surveys and collected data from a unique upscale tourist destination in Mexico, San Miguel de Allende. They found that indicating green practices significantly impacts guests' satisfaction levels and return intentions for Americans, Mexicans, and other nationalities. In [38], the authors conducted a study on improving consumer satisfaction in green hotels. They investigated the roles of perceived warmth, corporate social responsibility motive, and perceived competence. They found that when service delivery is successful, the satisfaction is relatively higher for green hotels versus the non-green hotels. They also found the positive impact of perceived warmth and competence to be a mediator for the relationship between service outcomes and consumer satisfaction and behavioral intentions. In [39], the authors focused on the customers' satisfaction in eco-friendly hotels. They developed a new method based on multi-criteria collaborative filtering: Higher Order Singular Value Decomposition (HOSVD), ANFIS and SOM. Also, they used the Classification and Regression Tree (CART) technique to find important features in the segments identified by SOM. To achieve the research objectives, they evaluated the method on the data collected from TripAdvisor. They found that there is a positive relationship between the customers' satisfaction and quality of services in eco-friendly hotels in different customers' segments. In [40], the authors developed a method for market segmentation and travel choice prediction in spa hotels. They developed the method using SOM, HOSVD, and CART. They used SOM and HOSVD for clustering and similarity calculation tasks, CART for travel choice prediction. The method was evaluated on the data collected from TripAdvisor. The results of their study found six spa hotel customer segments including: Wellness Seekers, Steam Room Seekers, Spa Treatment Seekers, Mineral Baths Seekers, Jacuzzi Seekers, and Cosmetic Seekers.

Methodology
This research relies on Multi-Criteria Decision-Making (MCDM) and learning techniques, SOM, LDA, TOPSIS and ANFIS. The schematic diagram of the proposed method is shown in Figure 1. In the first step of data analysis, we used LDA to discover the main satisfaction dimensions with their relative importance form the text-based reviews. Then, we applied SOM to find the customers' segments from the numerical ratings. To rank the hotels' attributes, we applied TOPSIS on each segment and ANFIS to predict the customers' preferences based on the satisfaction dimensions and general quality aspect of hotels. It is believed that clustering by the use of SOM technique can significantly improve the effectiveness of ANFIS in predicting customers' preferences in each segment. In the following section, we briefly introduce the components of the proposed method.

Latent Dirichlet Allocation (LDA)
LDA was proposed in [20] as an unsupervised machine learning technique for feature extraction. LDA is considered as a generative probabilistic model of a corpus. This technique is widely used in topic modelling, which is able to discover the principal topics from big data, massive volumes of unstructured text data. In fact, LDA can be used to effectively discover a mixture of topics from the huge number of documents (i.e., online reviews). A mixture of topics can be those aspects which influence travellers' satisfaction on the service quality of a specific hotel. In this study, we use LDA to identify an optimum number of dimensions (also referred to as "topic" in LDA literature) with the related words in each dimension from the travellers' online reviews.

Latent Dirichlet Allocation (LDA)
LDA was proposed in [20] as an unsupervised machine learning technique for feature extraction. LDA is considered as a generative probabilistic model of a corpus. This technique is widely used in topic modelling, which is able to discover the principal topics from big data, massive volumes of unstructured text data. In fact, LDA can be used to effectively discover a mixture of topics from the huge number of documents (i.e., online reviews). A mixture of topics can be those aspects which influence travellers' satisfaction on the service quality of a specific hotel. In this study, we use LDA to identify an optimum number of dimensions (also referred to as "topic" in LDA literature) with the related words in each dimension from the travellers' online reviews.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Fuzzy logic and neural network are widely used in prediction problems. ANFIS is a useful technique based on fuzzy logic and neural network approaches [27]. It takes the advantages of both fuzzy set theory in applying rule-based systems and neural networks in automatic learning from data. A fuzzy inference system in the ANFIS technique consists of if-then rules, membership functions, inference mechanism (called fuzzy reasoning), and couples of input-output. In Figure 2, a structure of a fuzzy inference system is presented. As seen from this figure, in the first step, the inputs are fuzzified to produce their degrees of truth. In the second step, the degree of truth of the consequents is obtained by combining this information through inference rules. In the last step, final output is obtained by defuzzification.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Fuzzy logic and neural network are widely used in prediction problems. ANFIS is a useful technique based on fuzzy logic and neural network approaches [27]. It takes the advantages of both fuzzy set theory in applying rule-based systems and neural networks in automatic learning from data. A fuzzy inference system in the ANFIS technique consists of if-then rules, membership functions, inference mechanism (called fuzzy reasoning), and couples of input-output. In Figure 2, a structure of a fuzzy inference system is presented. As seen from this figure, in the first step, the inputs are fuzzified to produce their degrees of truth. In the second step, the degree of truth of the consequents is obtained by combining this information through inference rules. In the last step, final output is obtained by defuzzification.

TOPSIS
TOPSIS is an MCDM technique widely used for decision-making problems [24,41]. As a compensatory aggregation method, this technique is used to select the best based on hard cut-offs. In TOPSIS, m attributes for candidate networks can be presented in the form of a matrix Figure 2. A structure of a fuzzy inference system for revealing customer satisfaction.

TOPSIS
TOPSIS is an MCDM technique widely used for decision-making problems [24,41]. As a compensatory aggregation method, this technique is used to select the best based on hard cut-offs. In TOPSIS, m attributes for n candidate networks can be presented in the form of a matrix A (n×m) as show in in Equation (1). Therefore, the procedure for TOPSIS is presented in Figure 3.

Data Collection
This research has considered TripAdvisor for data collection [23,42]. The data were collected from the hotels' webpages, which have been provided by TripAdvisor. A crawler has been designed and the data were collected through the URLs of Malaysian hotels' information. The crawler was designed to obtain the main information such as hotel information, traveller information, the trip information, and the travellers' ratings and reviews on the hotels from TripAdvisor.
By the use of the crawler, in total, we collected 17,024 records from 152 hotels (4-star and 5-star hotels). The data were preprocessed and the useless records have been removed from the database. In addition, in this stage, we removed non-English reviews from the datasets. Furthermore, the records which do not include the ratings on the hotels features (e.g., check-in/front desk, value (cost-benefit), sleep quality, service, cleanliness, rooms, location) have been removed from the

Data Collection
This research has considered TripAdvisor for data collection [23,42]. The data were collected from the hotels' webpages, which have been provided by TripAdvisor. A crawler has been designed and the data were collected through the URLs of Malaysian hotels' information. The crawler was designed to obtain the main information such as hotel information, traveller information, the trip information, and the travellers' ratings and reviews on the hotels from TripAdvisor.
By the use of the crawler, in total, we collected 17,024 records from 152 hotels (4-star and 5-star hotels). The data were preprocessed and the useless records have been removed from the database. In addition, in this stage, we removed non-English reviews from the datasets. Furthermore, the records which do not include the ratings on the hotels features (e.g., check-in/front desk, value (cost-benefit), sleep quality, service, cleanliness, rooms, location) have been removed from the database. In addition, we have eliminated those tuples which do not provide the hotels' and travel information (e.g., travelling groups). We consider two types of green hotels, green hotels with spa services and green hotels with non-spa services, for the travellers of four main groups, Travelled with friends, Solo travelers, Travelled with family and Travelled as a couple. In Table A1 of Appendix A, full information on the travellers' ratings on seven criteria of hotels is presented. In Table 1, we have provided the information of four travelling groups in 152 green hotels with spa and non-spa services.

Results and Discussions
We first applied LDA for discovering the main dimensions of satisfaction from the text-based reviews. The results are presented in Figures 4 and 5, respectively, for green hotels with non-spa and spa services. These figures reveal the relative importance of each dimension/topic (satisfaction dimension) for two types of hotels in four traveling groups, Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple. The main satisfaction dimensions were: Spa treatments, Healing water, Mineral baths, Foot massage, Wellness, Sauna, Stress, Refreshing, Face treatment, Body treatment, Useful for the skin, Steam room, Ice cold water, Infrared sauna, Hot beds, Warm Jacuzzis, Warm atmosphere, Mineral water and Indoor pool. The relative importance of each dimension for green hotels with non-spa services is provided in Table A2 of Appendix A. Appendix, full information on the travellers' ratings on seven criteria of hotels is presented. In Table 1, we have provided the information of four travelling groups in 152 green hotels with spa and non-spa services.

Results and Discussions
We first applied LDA for discovering the main dimensions of satisfaction from the text-based reviews. The results are presented in Figure 4 and Figure 5, respectively, for green hotels with non-spa and spa services. These figures reveal the relative importance of each dimension/topic (satisfaction dimension) for two types of hotels in four traveling groups, Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple. The main satisfaction dimensions were: Spa treatments, Healing water, Mineral baths, Foot massage, Wellness, Sauna, Stress, Refreshing, Face treatment, Body treatment, Useful for the skin, Steam room, Ice cold water, Infrared sauna, Hot beds, Warm Jacuzzis, Warm atmosphere, Mineral water and Indoor pool. The relative importance of each dimension for green hotels with non-spa services is provided in Table  A2 of Appendix.  The data were divided into two main groups, green hotels with spa and green hotels with non-spa services. Then, we applied SOM clustering for each group. The learning rate for SOM was set to 0.05. In addition, the coefficient of determination (R 2 ) values [43] for SOM map quality were, respectively, 0.821 and 0.845 for green hotels with spa and green hotels with non-spa, indicating that high quality of segments have been generated by SOM. For both green spa and non-spa hotels, data were clustered in four segments with least similarity between the segments. Segment 1, Segment 2, Segment 3, and Segment 4 included, respectively, 4004 (39.4%), 2586 (25.4%), 1179 (11.6%), and 2406 23.6%) records for green spa hotels, and 1686 (24.6%), 2680 (39.1%), 698 (10.2%), and 1785 (26.1%) records for green non-spa hotels.
In Figure 6 (a-d) and Figure 7 (a-d), we provided the frequency of overall ratings on green spa and non-spa holes based on four traveling groups in four segments. Overall, in four segments of green spa hotels, it can be found that the majority of the travellers have received high-level service quality, especially in two groups, Travelled as a couple and Travelled solo. In addition, it can be found that, in four segments of green non-spa hotels, the satisfaction level is relatively lower compared to the green spa hotels. These segments are further analyzed using Adaptive Neuro-Fuzzy Inference System, as a neuro-fuzzy system, to reveal the relative importance of each dimension on the travellers' satisfaction. The data were divided into two main groups, green hotels with spa and green hotels with non-spa services. Then, we applied SOM clustering for each group. The learning rate for SOM was set to 0.05. In addition, the coefficient of determination (R 2 ) values [43] for SOM map quality were, respectively, 0.821 and 0.845 for green hotels with spa and green hotels with non-spa, indicating that high quality of segments have been generated by SOM. For both green spa and non-spa hotels, data were clustered in four segments with least similarity between the segments. Segment 1, Segment 2, Segment 3, and Segment 4 included, respectively, 4004 (39.4%), 2586 (25.4%), 1179 (11.6%), and 2406 23.6%) records for green spa hotels, and 1686 (24.6%), 2680 (39.1%), 698 (10.2%), and 1785 (26.1%) records for green non-spa hotels.
In Figures 6a-d and 7a-d, we provided the frequency of overall ratings on green spa and non-spa holes based on four traveling groups in four segments. Overall, in four segments of green spa hotels, it can be found that the majority of the travellers have received high-level service quality, especially in two groups, Travelled as a couple and Travelled solo. In addition, it can be found that, in four segments of green non-spa hotels, the satisfaction level is relatively lower compared to the green spa hotels. These segments are further analyzed using Adaptive Neuro-Fuzzy Inference System, as a neuro-fuzzy system, to reveal the relative importance of each dimension on the travellers' satisfaction.
In addition to the above analysis, we applied the TOPSIS technique on the numerical ratings for each travelling group (Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple) for green hotels with spa and non-spa services. The data of each cluster were considered in the matrices for TOPSIS analysis. Each value of these matrices considered a value ranged from 1 to 5. The TOPSIS technique is applied to rank the hotels' criteria selection in four segments. An example of ratings on 7 criteria of hotels is presented in Figure 8. The results of TOPSIS for ranking seven criteria of spa green hotels and non-spa green hotels in four segments are respectively presented in Tables 2  and 3. and non-spa holes based on four traveling groups in four segments. Overall, in four segments of green spa hotels, it can be found that the majority of the travellers have received high-level service quality, especially in two groups, Travelled as a couple and Travelled solo. In addition, it can be found that, in four segments of green non-spa hotels, the satisfaction level is relatively lower compared to the green spa hotels. These segments are further analyzed using Adaptive Neuro-Fuzzy Inference System, as a neuro-fuzzy system, to reveal the relative importance of each dimension on the travellers' satisfaction.  for each travelling group (Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple) for green hotels with spa and non-spa services. The data of each cluster were considered in the matrices for TOPSIS analysis. Each value of these matrices considered a value ranged from 1 to 5. The TOPSIS technique is applied to rank the hotels' criteria selection in four segments. An example of ratings on 7 criteria of hotels is presented in Figure 8. The results of TOPSIS for ranking seven criteria of spa green hotels and non-spa green hotels in four segments are respectively presented in Table 2 and Table 3.     In the last step of data analysis, the data of each cluster were used in the Adaptive Neuro-Fuzzy Inference System to reveal the importance level of each factor on the travellers' satisfaction level. In addition, the importance of spa services on travellers' satisfaction level in four travelling groups, Travelled with Friends, Solo Travelers, Travelled with Family, and Travelled as a Couple, was assessed by the Adaptive Neuro-Fuzzy Inference System in each cluster. To do so, a Sugeno-type fuzzy inference system was developed and the data for both green spa and non-spa hotels in four segments were trained by the Adaptive Neuro-Fuzzy Inference System. In the Adaptive Neuro-Fuzzy Inference System, we used a hybrid learning algorithm, which is a combination of least-squares algorithm and gradient descent algorithm for parameters identification in learning the prediction models. This approach also converges much faster than other learning approaches. In addition, we used a 5-layer Adaptive Neuro-Fuzzy Inference System in constructing the prediction models. The network architecture and model parameters in the 5-layer Adaptive Neuro-Fuzzy Inference System were: 200 training epochs, four training and four test datasets for each type of hotels, and Gaussian membership functions.
The results of ANFIS are presented in Figures 9-12. The results of revealing the impact of service quality on travellers' overall satisfaction showed that, in general, the level of satisfaction in green spa hotels is higher than green non-spa hotels. In all segments, the results reveal that the groups of travellers are more satisfied with the quality of services in green hotels with spa in relation to the green hotels with non-spa. The results obtained from the service quality ratings and travellers' experiences in their stay in hotels show that the hotels with spa services can contribute to high levels of satisfaction.
Adaptive Neuro-Fuzzy Inference System, we used a hybrid learning algorithm, which is a combination of least-squares algorithm and gradient descent algorithm for parameters identification in learning the prediction models. This approach also converges much faster than other learning approaches. In addition, we used a 5-layer Adaptive Neuro-Fuzzy Inference System in constructing the prediction models. The network architecture and model parameters in the 5-layer Adaptive Neuro-Fuzzy Inference System were: 200 training epochs, four training and four test datasets for each type of hotels, and Gaussian membership functions.
The results of ANFIS are presented in Figures 9-12. The results of revealing the impact of service quality on travellers' overall satisfaction showed that, in general, the level of satisfaction in green spa hotels is higher than green non-spa hotels. In all segments, the results reveal that the groups of travellers are more satisfied with the quality of services in green hotels with spa in relation to the green hotels with non-spa. The results obtained from the service quality ratings and travellers' experiences in their stay in hotels show that the hotels with spa services can contribute to high levels of satisfaction.      To show the effectiveness of using clustering and the ANFIS model in the prediction of customers' preferences, we performed several experiments on the data collected from TripAdvisor and compared them with the other supervised learning methods. Specifically, we selected the Neural Network (NN), Support Vector Regression (SVR), and Multiple Linear Regression (MLR) techniques for methods comparison. The experiments were performed using a tenfold cross-validation approach. The results for method comparisons were provided for R 2 and Mean Absolute Error (MAE) for each method. The results revealed that the method which uses clustering with the aid of the SOM and ANFIS techniques (MAE = 0.073; R 2 = 0.923) has provided better

Discussion
Sustainability has been found to be an important issue among tourism and hospitality stakeholders and become an increasingly popular field of research since the late 1980s [44]. Sustainability is viewed as holding considerable promise to address the problems of tourism negative impacts. The studies on sustainable tourism find that consumers are nowadays more demanding than before considering hotels with green initiatives and features [45]. Previous studies also show that there is a positive relationship between the customers' satisfaction and behavioral intentions [36,38]. Furthermore, the findings of previous research on tourism reveal the positive impact of green practice on customers' satisfaction and loyalty [46].
Customer segmentation has been an important issue in marketing, especially in tourism [39]. It helps the managers effectively discover the customers' satisfaction level from their services and detect their choice preferences [40]. Previous studies on customers' segmentation and preference prediction are mainly developed through survey data [47]. However, it is found that the use of machine learning techniques can be more effective in big data analysis and customer segmentation [48]. This has been further investigated in previous studies with the aid of multi-criteria decision-making approaches [33,49]. In a tourism context, this issue was also extensively investigated [33,39,50]. However, this issue is fairly investigated in tourism and hospitality through social big data. In addition, the use of machine learning techniques to analyze social big data in eco-friendly hotels for customers' preference learning and segmentation is investigated in a few number of studies.
Accordingly, this research focused on the use of machine learning and multi-criteria decision-making techniques to develop a new method for customer segmentation in eco-friendly hotels. The method was evaluated on the travellers' online reviews on 152 hotels in Malaysia. From a marketing perspective, the results of our study demonstrated that green initiatives lead to greater consumer satisfaction, which is in line with previous findings reporting the positive impact of eco-friendly hotels on customers' satisfaction [39]. In addition, our findings demonstrated that the level of satisfaction in green spa hotels is higher than green non-spa hotels. Interestingly, the results revealed that, in all travelling groups (Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple), there is a positive relationship between the quality of services in green hotels and their satisfaction level. It was further demonstrated that the satisfaction levels in these groups for spa green hotels are higher than the satisfaction levels in the non-spa green hotels. From a methodological perspective, the results showed that the use of machine learning techniques with the aid of multi-criteria decision-making approaches can better predict the customers' preferences in eco-friendly hotels in relation to the method which solely relies on predictive learning techniques. In addition, the results revealed that the methods which use a clustering approach can better reveal the customers' preferences in eco-friendly hotels. This shows the effectiveness of clustering in customer segmentation for customers' preferences prediction [39]. This was confirmed with the comparisons of the proposed method with the other predictive learning techniques in the literature.

Conclusions
This paper investigated customer satisfaction in green hotels through travellers' online review analysis. We used machine learning techniques to analyze the online text-based reviews and numerical data analysis. SOM was used for travelers' segmentation in both green spa and non-spa hotels. LDA was used for the identification of the main satisfaction dimensions (topics) from the text-based reviews. We also used TOPSIS to rank the main criteria for green hotel selection from numerical reviews. The neuro-fuzzy system was then used to reveal the relative importance of the hotel features on the customers' satisfaction and find the importance role of spa services and its impact on the customers' satisfaction. We investigated the role of spa services on the travellers' overall satisfaction in four main travelling groups, Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple. The data were collected from Malaysian green spa and non-spa hotels in the TripAdvisor platform. The results showed that, in general, the travellers' satisfaction was relatively higher than in green spa hotels in relation to the green non-spa hotels. This result was proved in four travelling groups, Travelled with Friends, Solo Travelers, Travelled with Family and Travelled as a Couple. In our experimental analysis, we observed that travellers are relatively more satisfied in green spa hotels and accordingly provide a higher degree of overall ratings to the hotels with spa services. This research has some limitations. First, this study has considered only the data from TripAdvisor to investigate the role of spa services on customers' satisfaction. The data from other social networking sites such as trivago.com, booking.com, and trip.com can further confirm the results of this study. Second, this study only considered the Malaysian green spa and non-spa hotels to validate the proposed hypothesis. The data from hotels registered in TripAdvisor for other developed and developing countries may better reveal the significance of spa services on customers' satisfaction. Third, although our method was implemented to assess the travellers' satisfaction in green hotels, the proposed method can also be implemented for other types of hotels (e.g., non-green hotels with spa services) and then comparisons are made for the travellers' satisfaction levels in different hotels (i.e., comparisons for travellers' satisfaction levels in green hotels with spa services and non-green hotels with spa services). Fourth, four travelling groups have been considered for hypothesis evaluation. Future studies can investigate the role of spa services on other travelling groups such as business travellers. Fifth, this study does not investigate the role of spa services on the travelers' satisfaction from the gender perspective. This can be an interesting investigation for future studies in the tourism context. Sixth, our method was based on non-incremental learning techniques. The use of deep learning approaches, Random Forest (RF) and incremental learning or the development of our method with incremental updates, especially in LDA and neuro-fuzzy learning techniques, can be effective in online reviews analysis. This limitation could be a rich avenue for future research. Besides the TOPSIS method, we can also consider group decision-making (GDM) and other multiple criteria decision-making methods such as the double normalization-based multiple aggregation (DNMA) method [51] and the gained and lost dominance score (GLDS) method [52], to solve such problems. Finally, multicollinearity phenomenon may occur due to the high correlations among the features of dataset which should be solved through appropriate learning techniques.