1. Introduction
In the present tourism industry, customer satisfaction and loyalty play a critical role in the success of businesses, and therefore the effective management of customer feedback has become a key strategy to a gain competitive advantage [
1,
2]. With the proliferation of digital platforms, customers tend to share their hotel experiences through online reviews, and these reviews provide valuable insights for businesses to evaluate service quality and understand customer expectations [
3,
4,
5]. Businesses with high service standards, such as five-star hotels, can use this feedback to improve guest experiences, tailor their services to customers’ needs, and strengthen their brand reputation. As the volume and variety of online feedback increases, the use of advanced analytical tools to draw meaningful conclusions from large datasets becomes imperative. This transformation in consumer behaviour is supported by recent studies showing that online hotel reviews significantly influence both brand perception and booking decisions [
6,
7].
Online reviews are not only a reflection of experiences, but also content that has a direct impact on the online reputation of the business [
8,
9]. Careful analysis of these reviews provides important feedback to hotel managers in terms of improving service processes, understanding customer needs, and gaining a competitive advantage [
10]. Such analyses play a critical role in the continuous improvement of service quality, especially for businesses with high standards, such as five-star hotels. Natural language processing technologies that have developed in recent years have made it possible to analyze large volumes of customer data more quickly and effectively; thus, managerial decision-making processes can be supported in a data-based manner [
11]. Online reviews carry important clues not only about service performance, but also about consumer values and expectations, the environmental sensitivities of consumers, and the social responsibility practices of the business [
12]. A systematic analysis of implicit sustainability indicators in the comments, such as ‘cleanliness, protection of the natural environment, energy and water saving, use of local products, reduction of waste’ [
13,
14], provides a meaningful data source not only in terms of service quality evaluations but also in terms of inferences regarding the sustainability performance of hotel businesses.
Sustainable tourism is a multidimensional approach that aims to increase the well-being of local people while encouraging the protection of natural resources [
15]. This approach includes economic sustainability and social participation, along with environmental responsibility, and it leads to an increasing number of tourists turning to environmentally friendly practices [
12,
16]. However, there is not enough empirical evidence in the literature on how tourists’ perceptions of sustainability practices are formed and how these perceptions are reflected in their hotel preferences [
17,
18].
The assumption that customer reviews are not limited to evaluations of service quality but also reflect their perceptions of the hotel’s environmental and social responsibilities can be evaluated within the scope of stakeholder theory [
19]. This perspective considers customers not only as consumers but also as active stakeholders who direct the sustainability practices of the business, and evaluates customer reviews as natural reflections of these expanded stakeholder demands. Understanding how sustainability principles manifest themselves in customer experiences is increasingly becoming a strategic issue not only for environmentally friendly businesses but also for the entire tourism sector [
14]. However, previous studies have mostly focused on eco-hotel businesses with an environmental certification; how sustainability practices are reflected in customer reviews in widely preferred large-scale hotel businesses has been largely neglected [
17,
18]. In addition, many studies have resorted to traditional methods, such as manual coding or regression, to analyze online reviews. These methods are both time-consuming and subjective in their interpretation of large datasets.
At this point, deep learning-based sentiment analysis offers a powerful tool in analyzing implicit perceptions of both service quality and sustainability themes by classifying customer comments objectively and automatically [
3,
11,
20]. However, the use of these techniques from a sustainability perspective in the hotel industry is quite limited. This deficiency indicates that sustainability should be evaluated not only as an environmental initiative but also in an integrated manner, alongside customer satisfaction and loyalty.
The main purpose of this study is to examine online customer reviews of five-star hotels using a deep learning-based sentiment analysis method to provide a data-based framework for customer perceptions focused on both service quality and sustainability. In the study, in which a total of 15,522 customer reviews of selected five-star hotels in Antalya were analyzed, emotional expressions and thematic structures in customer reviews were classified using deep learning-based sentiment analysis, topic modelling, and information extraction techniques. In particular, the effects of sustainability-related elements, such as cleanliness, natural environment, energy efficiency, waste management, and personnel behaviour, on customer satisfaction were analyzed. Thus, the study not only makes the emotional dimensions of customer satisfaction visible but also makes a holistic contribution to the sustainable tourism literature. In addition, the study also reveals that perceptions of sustainability are important in hotel businesses that are not certified eco-hotels but have an important place in the sector.
3. Material and Method
Effective customer feedback management and analysis is of great importance to gain a competitive advantage and improve customer experiences. This study evaluated customer opinions by analyzing 15,522 customer reviews of selected five-star hotels in Antalya. The methodology of the study includes the following basic steps (
Figure 1). Firstly, the dataset was created, and then the raw data was pre-processed by text processing. The study was divided into text mining and machine learning phases. Feature extraction algorithms were applied to extract relevant features from user reviews, while text mining techniques were used to identify important keywords and themes within the reviews. Using machine learning methods, customer hotel reviews were classified by sentiment analysis; the outputs of this stage were labelled as positive, negative, and neutral emotions. In addition, emotional experiences were revealed by classifying the reviews using the deep learning-based sentiment analysis method.
The aim of this study is to analyze hotel customer reviews using artificial intelligence-based analysis methods and to determine how they are shaped in the context of service quality, customer satisfaction, and sustainable tourism. The main objectives of the study are as follows:
To identify indicators of sustainability themes in hotel customer reviews.
To categorize customer reviews into positive, negative, and neutral sentiment categories with deep learning-based sentiment analysis.
To identify prominent themes in customer reviews and relate them to customer satisfaction using LDA topic modelling.
To determine the impact of sustainable hospitality practices on customer loyalty.
3.1. Dataset
This study focuses on hotels in Antalya (Turkey), one of the world’s major tourism destinations, which must adapt to new technologies for travel planning and destination selection. With a 648,285-bed capacity [
37], Antalya hosted 16,084,737 tourists in 2023, making it the fourth most visited city in the world [
38]. In addition to its significant contribution to the national economy, it is one of the world’s most competitive destinations with its quality tourism investments.
The study’s dataset comprises customer reviews from the OtelPuan website. OtelPuan, which is a part of the ETS Group serving Turkish tourism, is a website that hosts many reviews and has applications that encourage customers to comment. The website verifies and shares comments and ratings in accordance with the principle of impartiality [
39].
In this study, the data was collected only from the OtelPuan platform, and three five-star hotels were analyzed. However, different data sources (e.g., Booking.com, TripAdvisor) can be included in future studies to expand the scope of the research and increase the generalizability of the findings. Similarly, analyzing hotels from different regions and in different categories would provide the opportunity to evaluate customer opinions from a broader perspective. Accordingly, the current findings of this study need to be interpreted in the context of this specific region and platform.
In this study, three five-star hotels in Antalya with the highest ratings on the OtelPuan website were analyzed. After determining these hotels, the highest-rated hotels were ranked, and information was obtained about the three highest-rated hotels. As it was not considered appropriate to provide the names of the hotels, HB1 (hotel business), HB2, and HB3 were coded. The hotel information is presented in
Table 1.
Both high scores and the number of reviews for the hotels were considered in the study evaluation. These two factors reflect the overall customer satisfaction and the breadth of the hotels’ customer base. When the detailed scores for the hotels were analyzed, HB1 hotel was shown to have the highest score, with 6.252. This high score indicates that the hotel has a very high level of customer satisfaction and that the service, facilities, and overall experience are satisfactory. However, although the number of comments for HB1 was 4304, which represents an appropriate sample size, the fact that the number of comments is not very high indicates that the evaluation of the hotel may be relatively limited, although this does not reduce the reliability of the available comments. The high score rate, which is supported by the reviews, indicates that the hotel has created a very favourable impression regarding the customer experience. However, it is assumed that if the ratings were supported by additional positive reviews, customer satisfaction would be further reinforced.
The hotel with the second highest score in the study is HB2. This hotel’s high score indicates that the services offered by the hotel are considered satisfactory and valuable by the customer. The 4966 reviews for the hotel provide sufficient data to evaluate the hotel’s customer experience and include feedback from a wide range of customers. As a result, both the high score and the high number of reviews indicate that the hotel has strong customer satisfaction.
Among the hotels included in the study, HB3 stands out with a medium–high score of 8.70. This score indicates that the hotel has a good level of customer satisfaction but needs improvement in some areas. When we look at the number of comments, we can see that HB3 has the highest number, at 6252 comments, indicating that a wide range of customers expressed their opinions of the hotel. Although the high number of reviews indicates that the hotel has a wide customer base and receives feedback from this audience, the slightly lower score compared to other hotels may indicate that some customers’ expectations are not fully met.
HB1 stands out due to its high score and is understood to have a superior position in terms of customer satisfaction when evaluated based only on the number of reviews and likes. HB2 is characterized as having strong satisfaction, with a high score and a large customer base, while HB3 reached a large customer base, with the highest number of reviews, but has a slightly lower score compared to other hotels, indicating that it needs improvement in some areas. It is thought that these data contain important information about both the service quality and customer experiences provided by the hotels and can provide guidance to potential customers in their decision-making processes. In this context, it is suggested that hotels direct their strategic communication policies in light of these evaluations.
During the data cleaning process, various filtering techniques were applied to detect spam comments and content that may have been generated by a bot. Comments with very short or meaningless phrases were removed, comments with excessive repetitive patterns were examined, and hotel review consistency was checked. In addition, the z-score method and statistical outlier analysis were used to detect anomalies before sentiment analysis and topic modelling.
The reviews of the selected hotels are shown in
Table 2.
The customer reviews used in this study were obtained anonymously from the OtelPuan platform. Within the scope of the study, only publicly available and accessible data were used, and only comments that did not contain personal data were analyzed. In addition, to comply with ethical norms, comments were anonymized, and any information that could directly identify individuals was removed. Regarding the ethical use of data, relevant academic rules and guidelines for open access data analysis were taken into consideration.
While the data were collected from only three five-star hotels in Antalya, these hotels were selected based on their high review volume, verified authenticity through the OtelPuan platform, and relevance for evaluating customer satisfaction in luxury tourism. Although the dataset is geographically limited, the methodology developed in this study—particularly the integration of deep learning and text mining for sustainability detection—is scalable and can be applied to other locations and categories of accommodation. Future studies are encouraged to apply the same framework to multi-regional or cross-platform datasets to enhance generalizability.
3.2. Text Mining
Text mining, a technique within data mining, involves extracting valuable insights from extensive amounts of text data. It treats text as a valuable source of data and seeks to derive structured information from it. The primary focus of text mining is to derive statistical findings from textual data, often employing natural language processing (NLP) for feature extraction [
40,
41].
Text mining studies involve the analysis of text as a data source, aiming to extract structured data through various techniques such as text classification, segmentation, topic extraction, part-of-speech tagging, sentiment analysis, text summarization, and entity relationship modelling [
42].
Special preprocessing techniques were applied considering the morphological structure of Turkish. Stemming, suffix parsing, and stopwords extraction were performed. In addition, the Zemberek library was used to reduce intra-word variation. These preprocessing steps helped the model to process word meanings more accurately and improve the accuracy of sentiment analysis.
3.2.1. Text Pre-Processing
Data pre-processing involves transforming raw data into a suitable format for analysis. For this purpose, the dataset was cleaned by removing all unnecessary, noisy, and missing data, as well as unwanted attributes [
43].
The main purpose of punctuation removal in text mining is to facilitate the analysis of text. Punctuation does not affect the semantic structure of texts, but it makes the work of analysis tools more difficult. Therefore, removing punctuation makes text easier to process and analyze. Special characters can clutter data and make it difficult to analyze text [
44]. Identifying characters in text that are not letters or numbers and removing them prior to analysis can help reduce clutter. In text analytics, emojis can provide important clues that reflect users’ emotions and attitudes. However, some case studies have also shown that removing emojis that are not effective improves the accuracy of text classification. In this study, emojis were removed from the comments, since a text-based system was developed. Words such as prepositions, conjunctions, and pronouns that occur frequently in the text but do not make sense for classification were removed from the text. Removing these words before analyzing the text increases the performance of the analysis. Another step is stemming. This process involves condensing words derived from the same root in the text into a single word by removing duplicates [
43].
3.2.2. Feature Extraction
In text mining, feature extraction involves the extraction of valuable information from text. The goal of this process is to convert unprocessed textual data into numerical features that can be utilized by machine learning algorithms. The data must be expressed as a feature vector that can be processed and made comprehensible by the computer. One of the fundamental processes in text mining is transforming a text into a feature vector that accurately represents it [
45]. Feature extraction focuses on extracting attributes that represent the meaning, structure, or content of text data, and usually includes the following methods:
The Bag of Words (BoW) approach considers each word in the text as a feature and tallies the frequencies of the words. Each document is transformed into a word vector. The TF-IDF (term frequency–inverse document frequency) approach assigns significance to words by computing how frequently a specific word appears in a document (Term Frequency) and how uncommon it is across all documents (Inverse Document Frequency). The n-grams approach captures word sequences (grams) of a specified length (n) as features. For instance, it captures pairs of words (2 g or bigrams) or trios of words (3 g or trigrams). The study utilizes the N-gram approach.
N-gram is a technique utilized for discovering, comparing, or identifying the frequency of occurrences within a dataset. It involves examining groups or phrases consisting of n consecutive words within a text string, effectively breaking down the string into n slices of characters. N-gram-based classification is a process based on the frequency of occurrence of character-based n-grams in a document. Several different lengths of n-grams are taken, and 2 g, 3 g, and 4 g are used. The unigram model relies on the frequency of individual words, without considering the order. The bigram model considers the preceding word, and the trigram model considers the two preceding words. N-grams are a valuable tool for predicting letter sequences in speech recognition tasks [
46,
47].
3.2.3. Term Weighting
Documents must be expressed in numbers to be understood by computers. This step is crucial in the machine learning process for analyzing documents. Term frequency (TF) refers to the frequency of a term’s repetition within a document. TF, as a method, is utilized to determine the weight of terms in a document. Due to the varying lengths of different documents, a term will occur more frequently in longer documents compared to shorter ones. Therefore, TF is usually divided by the document length as a normalization method [
48,
49]. Equation (1) is used to calculate the TF value.
TF-IDF, which stands for term frequency–inverse document frequency, is utilized to represent a document in vectors. This is a statistically derived weighting factor that signifies the significance of a term within a document by calculating its frequency. Essentially, TF-IDF serves as a numerical gauge of a word’s relevance to the document. TF-IDF assumes that words are independent of each other and expresses the semantic relationship between words [
50,
51]. The TF-IDF value is determined by the following calculation:
TF-IDF, or term frequency–inverse document frequency, identifies the nature of a word (a term, conjunction, punctuation mark, or stop word) by analyzing its frequency across multiple documents [
49]. Equation (3) is used to calculate the IDF value.
The occurrence frequency of word t in the dth document is denoted as frequency (t, d), while the count of occurrences of word t in the documents comprising corpus d is represented by Count(d∈D:t∈d). Terms with low frequency yield a high IDF score, whereas high-frequency terms result in a low IDF score. A high TF-IDF value indicates that the term is frequently present in a small subset of documents. Conversely, if the term appears in all documents, its TF-IDF value is minimized [
52]. Upon computing the TF and IDF values for every word in the text, the weight of each word is determined using the formula specified in Equation (4).
3.3. Machine Learning
Machine learning is a field within artificial intelligence that empowers computers to learn from data without explicit programming, enabling them to make decisions or predictions based on this acquired knowledge. Using algorithms and models, machine learning is focused on detecting patterns and connections in data and utilizing this knowledge to make predictions or decisions with new data. There are three primary types of machine learning: supervised, unsupervised, and reinforcement learning [
53]. In supervised machine learning, the model is trained on labelled input and output pairs, learning to predict the correct output using the input data. In unsupervised machine learning, the model operates on unlabelled data, seeking to uncover concealed structures and patterns within the data. In reinforcement machine learning, the model executes a sequence of actions within an environment and receives rewards based on the outcomes of these actions. The goal is to develop a strategy that maximizes rewards over time [
54].
This study utilizes the deep learning method, which is a type of supervised machine learning algorithm. The deep learning method draws attention with its easy-to-understand and interpretable structure, its ability to work with both categorical and continuous data, and its low data preprocessing requirements. This method, which can provide effective results even in small datasets, minimizes the risk of overfitting with pruning techniques. In addition to fast computation and training processes, it is also capable of feature selection, which makes it a flexible and user-friendly modelling tool [
55].
In this study, deep learning-based models are preferred for sentiment analysis. While traditional machine learning methods (e.g., Naïve Bayes, Support Vector Machines, or Random Forest) generally have lower computational cost, deep learning models are more successful, especially on large datasets.
While traditional methods usually require feature engineering based on n-grams and word frequency, deep learning-based models (e.g., RNN and LSTM) are better able to learn the contextual relationships between words and provide higher accuracy rates in sentiment analysis. Therefore, in our study, we primarily focus on using deep learning models that perform better in the context of natural language processing (NLP).
Moreover, our goal in this study is not only to achieve the highest accuracy rate but also to develop a model that can analyze the content of customer reviews in more detail. Deep learning methods are preferred in this study as they are better able to learn semantic relationships in the data compared to traditional machine learning algorithms. However, in future studies, a comprehensive comparison with traditional machine learning algorithms is recommended to evaluate the model selection in a more systematic way.
Although classical machine learning algorithms (e.g., SVM, Naïve Bayes) are frequently used in the sentiment analysis literature, these models are usually based on context-free, n-gram-based representations and cannot adequately reflect the morphosyntactic richness, especially in Turkish. The deep learning-based approach used in this study was preferred to contextually analyze sustainability-orientated themes in the comments. A comparative analysis with classical methods is a research direction that will be undertaken in future studies.
Deep Learning
Deep learning is a machine learning method that can automatically learn complex tasks through artificial neural networks operating on large datasets. This approach learns hierarchical representations of data through neural network architectures that transmit information across layers and perform advanced mathematical operations to make sense of high-level features [
56,
57].
Deep learning, which provides effective results in many tasks, such as feature extraction, pattern recognition, classification, and prediction, can automate and self-optimize learning processes by minimizing human intervention [
58]. Artificial Neural Networks (ANNs) are a computational technology inspired by the information processing of the human brain. Deep learning is a sub-branch of machine learning that enables multilayer neural networks to learn from large amounts of data [
59].
Deep learning models optimize the learning process by automatically updating the weight values through a backpropagation algorithm. The learning process starts by randomly assigning weights as the data moves from the input layer to the output layer [
60]. This process is called feedforward, and each layer processes the input data into more abstract and meaningful features. Once the difference between the model’s predicted outputs and the actual values is calculated, the backpropagation mechanism kicks in, and the error signal propagates backward to the previous layers. The weight values are updated to minimize the error. This iterative process allows the model to make more accurate predictions over time [
61,
62].
Deep learning is also widely used in natural language processing tasks, such as text classification, text generation, machine translation, sentiment analysis, and speech recognition. RNN-based models, developed to understand and process the complexity of natural language, learn language structure by analyzing the context relationships between sequential data [
63].
In this study, recurrent neural networks (RNNs) are used to analyze sequential datasets. RNN models can store information from previous steps, considering the sequential nature of the data. Unlike traditional feed-forward neural networks, RNNs retain information from previous time steps and use it to predict the current step. This feature is particularly advantageous for applications such as time series, audio data, and natural language processing. During the model training, the weight values are dynamically updated using stochastic gradient descent (SGD) and back-propagation algorithms [
64,
65].
RNN models use feed-forward and back-propagation mechanisms to analyze the dependencies between input and output and capture long-term relationships. In traditional feed-forward networks, each piece of input data is processed independently, whereas in RNNs, previous inputs are stored in the model’s memory and influence current predictions. For example, by processing the letters of a word individually, the RNN model can make inferences about the completion of the word [
66,
67].
The deep learning model used for sentiment analysis is based on a recurrent neural network (RNN) architecture and specifically designed with long short-term memory (LSTM) cells. The dataset was randomly divided into three subsets: 60% for training, 10% for validation, and 30% for testing. This approach was preferred over a simple train–test split to improve model robustness and reduce overfitting. The validation set was used for hyperparameter tuning, and the test set was used for the final evaluation. The dropout technique (0.2–0.5%) was used to prevent overfitting, and the learning rate of the model was set to 0.001 with the Adam optimization algorithm. The success of the model was verified by cross-validation and comparison with different baseline models (e.g., SVM, Naïve Bayes). The results, supported by the cross-validation findings, show that the model has high accuracy and can reliably classify customer reviews.
To evaluate the generalization ability and robustness of the deep learning model, a 10-fold cross-validation technique was employed. The dataset was randomly divided into 10 equal parts. In each iteration, the model was trained on nine subsets and validated on the remaining one. This process was repeated 10 times, ensuring that each subset served as the validation set once.
The average results obtained from the cross-validation are as follows: R2 = 0.973, MAE = 0.011, and RMSE = 0.018. These values closely align with those from the test set, indicating consistent and reliable performance across different data partitions. This confirms that the deep learning model used in this study maintains high accuracy and stability, reinforcing its suitability for sentiment prediction in hotel reviews.
3.4. Sentiment Analysis
Sentiment analysis involves systematically analyzing the emotional content of text data and lies at the intersection of natural language processing (NLP), data mining, and machine learning. Sentiment analysis was developed to discover the emotion-laden meaning of language in texts and to quantitatively evaluate these emotional expressions. This method automatically categorizes positive, negative, or neutral emotional states in texts, enabling the identification of emotional trends in large-scale datasets [
11].
Sentiment analysis is performed through dictionary-based methods, machine learning-based methods, and hybrid approaches. In dictionary-based methods, a sentiment value (e.g., positive or negative) is assigned to each word in a designated dictionary [
68]. The overall sentiment score of the text is calculated by matching its words with those in the dictionary and summing their emotional contributions. However, this method has limitations, such as it treating words independently of context, and may be inadequate for understanding more complex language structures.
Machine learning-based methods rely on machine learning algorithms trained on a pre-labelled text dataset. During the training process, the model learns specific word patterns in the texts and their associations with sentiment classes. The trained model predicts the sentiment states in new and unlabelled texts [
69]. Algorithms like Naive Bayes, Support Vector Machines (SVM), and Logistic Regression are frequently employed in machine learning-based sentiment analysis. This method is successful in capturing the more nuanced and contextual meanings of language.
Hybrid methods are approaches that aim to obtain more powerful and accurate results by combining more than one sentiment analysis technique [
70]. These methods blend the benefits of dictionary-based and machine learning-based approaches with the aim of mitigating the limitations inherent when using each method on its own.
Sentiment analysis is an effective technique for revealing emotional tendencies in large-scale text data. It is an important tool for understanding the emotional depths and nuances of language, whether used by businesses to understand customer satisfaction or in social media analyses [
71]. Sentiment analysis helps to obtain more comprehensive and accurate results using a combination of different methods, making an important contribution to data-driven decision-making processes.
3.5. Topic Modelling
Topic modelling is a machine learning technique that analyzes large amounts of text data to automatically identify key themes or topics in the content. This method plays a particularly effective role in making sense of complex and unstructured text data. At present, it is frequently preferred for analyzing large volumes of text data, such as social media, customer reviews, and news articles.
The goal of topic modelling is to discover implicit (hidden) topics in the text and transform these topics into an interpretable format. This approach helps to better understand the content of documents, speeding up information retrieval and supporting classification processes. Topic modelling, which is especially used in fields such as data mining, knowledge management, and natural language processing (NLP), is performed with various algorithms such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorisation (NMF).
Latent Dirichlet Allocation (LDA), one of the most widely used topic modelling methods, identifies hidden topics in documents by analyzing the word distributions in each document. To identify so-called “hidden” topics, LDA calculates the probability that the words in documents belong to certain topics. This approach assumes that each document may consist of a mixture of different topics. Thus, it becomes clear which documents are more dominated by certain topics. For example, in a news dataset, LDA can be used to extract topics such as health, sports, economy, etc., and determine to which topics each document is related.
Topic modelling has many applications in different sectors. In social media analysis, it is possible to understand which topics are popular and which topics attract more attention through user comments and posts. In this way, brands can better understand users’ demands and complaints and make strategic decisions. In analyses of customer reviews, topic modelling can be used to classify users’ general opinions, thus providing valuable insights into customer satisfaction and brand reputation. In news analyses, popular topics can be identified through news headlines and content, and the topics on the agenda can be followed more systematically.
3.6. Identification of Sustainability-Related Content
In order to link customer feedback to sustainable tourism themes, specific keywords and patterns associated with sustainability were extracted and tracked throughout the sentiment and topic modelling process. These included terms such as “eco-friendly”, “green”, “clean”, “waste”, “energy”, “natural”, “quiet”, “organic food”, “recycling”, and “environment”. The presence of these terms was used to detect underlying sustainability-related sentiments, even when the reviews were not explicitly focused on green tourism. Additionally, we applied filtering techniques to isolate comments that directly or indirectly referenced sustainability dimensions. This approach allowed us to explore how guests perceive and value sustainable practices in hotels not formally recognized as eco-accommodations.
4. Results
A distinct data collection comprising customer reviews of specific hotels from otel.com was used for the investigation. Deep learning was used to analyze and evaluate sentiment. For the analysis, the Python programming language was employed.
Data cleansing is the study’s initial phase. To derive meaning from unstructured data, text mining projects must first carry out data cleansing. This procedure involves prepping the data for analysis by eliminating unnecessary information.
Word stems and word roots comprised the second step. Word stemming in text mining is the process of identifying significant word roots to facilitate text analysis. Numerous applications, including text summarization, sentiment analysis, subject identification, and machine translation, employ this procedure. Word stemming facilitates more precise and effective text analysis. This procedure makes sure that words, even those with varied inflexions, still imply the same thing. Turkish texts are stemmed using a variety of techniques. These include statistical techniques, dictionary-based techniques, and mixed methods.
The dataset was randomly divided into three subsets: 60% for training, 10% for validation, and 30% for testing. The training set was used to fit the deep learning model, while the validation set served to tune hyperparameters and monitor overfitting during training. The final performance of the model was evaluated using the independent test set, which was not used during training or validation. This three-way split approach enhances the model’s robustness and generalization ability, ensuring a more reliable evaluation compared to a simple train-test division.
Parameter settings in deep learning involve optimizing the structure of the tree by setting certain hyperparameters. The deep learning-based RNN model used for sentiment analysis was specified with a word embedding dimension between 100 and 300, and LSTM or GRU cells were used to process sequential data. The hidden layers of the model contained from 128 to 512 hidden units, and a dropout rate of 0.2 to 0.5 was applied to prevent overfitting. In the training process, the Adam optimization algorithm was used to train the model, with a learning rate of 0.001, and binary_crossentropy or categorical_crossentropy loss functions were preferred. Whether the model was bidirectional or not was determined depending on the dataset used; the maximum sequence length was kept between 100 and 200, and the vocabulary size was limited to 10,000–50,000 words. During the training process, the model performance was optimized between 10 and 50 epochs by changing the batch size between 32 and 128.
The metrics for comparing classification performance and their respective formulas are presented below [
72,
73].
Precision indicates the confidence level of the classifier result and measures the accuracy of predictions across all classes. The ratio of samples with positive labels to all samples with positive classifications was used to compute this.
Recall, another name for sensitivity, is the percentage of correctly detected positive samples among all the real positive samples. This measures how accurate the positive forecasts are, providing an indication of how accurate the predictions are [
74]. A sensitivity value of one would be present in an optimal classifier.
Accuracy is the most often utilized statistic in the categorization process. It shows how often the classifier generates accurate predictions by expressing the percentage of successfully categorized samples to all samples [
75]. The potential values for this rate are 0 and 1, where 0 represents the lowest value and 1 the maximum.
The system is optimized for either accuracy or recall using the F-measure, which is computed using precision and recall metrics. This provides an overall performance metric of the classifier and is widely used in the literature to compare classifier results [
76].
The number of comments that the classifier categorized as positive is represented by the variable TP, or true positive rate, in the equations. The false positive rate, or FP for short, is the percentage of comments that are labelled as positive but are not. The number of comments that the classifier categorized as negative is shown by the symbol TN, which stands for genuine negative rate. The false negative rate, or FN for short, is the percentage of comments that are labelled as negative but are not. The support vector classification model’s performance measurement findings in this study are displayed in
Table 3.
The metrics were calculated using the test set, which constituted 30% of the total dataset. The model was trained on 60% and validated on 10% of the data to ensure robust hyperparameter tuning and prevent overfitting.
The data in
Table 3 and
Figure 2 evaluates the model performance of the three hotels with various metrics: precision, recall, accuracy, and F score (F1 score). Hotel H1 demonstrates high accuracy and precision, with 94.5% precision and 98.5% recall rates, while also showing a strong performance, with an overall accuracy of 96.3% and an F1 score of 96.4%. Hotel H2 shows a very high performance, with 97.3% accuracy and 98.6% sensitivity, while achieving an overall accuracy of 92.9% and an F1 score of 97.9%, indicating that the model successfully identified hotel H2, and the balance is good. Hotel H3 shows high accuracy, with 97.2% precision and 97.3% sensitivity rates, but lags the other hotels, with an overall accuracy of 89.6%, and a balanced performance, with an F1 score of 97.2%. In general, H1 stands out with a high accuracy rate, H2 has the highest precision and sensitivity values, and H3 has a high F1 score despite its low accuracy rate. These results reveal that the model identified H1 and H2 more effectively, while H3 presented some classification difficulties.
4.1. Results of Sentiment Analysis
Table 4 shows the sentiment classification of customer reviews of the three hotels.
Table 4 shows that all three hotels have a high proportion of positive feedback in their customer reviews. HB3 has the highest number of positive comments (4626), while HB1 has the lowest number of positive comments (3443). In terms of negative comments, HB3 has the highest number of negative comments (1172), while HB1 has the lowest number of negative comments (753). In terms of neutral comments, HB3 received the most comments (454), while HB1 received the fewest neutral comments (108).
These results provide important clues about the overall customer satisfaction achieved by the hotels, and it can be said that HB3 has a wider range of customer feedback and probably experiences more customer interactions. However, its higher number of negative comments compared to other hotels may indicate that the hotel has some shortcomings in terms of service quality or meeting customer expectations.
Figure 3 shows a comparison of the sentiment analysis results for HB1, HB2, and HB3 hotels. The data provides the ability to score each hotel based on the rate of positive, negative, and neutral comments.
When each hotel is analyzed individually, it can be seen that the rate of positive comments for HB1 is 80%. This hotel has a high customer satisfaction rate, indicating that it provides the kind of service and quality that its patrons desire. Even though it is deficient in several important areas, the brand’s name and reputation make up for this, as indicated by most positive reviews (80%), indicating that the expectations of most customers are fulfilled. Although the proportion of positive to negative comments indicates that many customers comment positively or neutrally, with complaints being relatively rare, the fact that 17.5% of comments were negative suggests that there are already much worse perceptions of this hotel online. On the other hand, the low percentage of neutral remarks (2.5%) suggests that visitors do not typically fail to express a strong opinion on whether they are satisfied or not.
When looking at comments about HB2, approximately 76% are positive. According to this assessment, the hotel provides good service and typically leaves its guests satisfied. This grade also provides an important impression of the corporate reputation of the hotel. A total of 19.5% of comments are critical of the hotel. However, the percentage of negative comments is slightly higher than that for HB1. This means there is scope for the hotel to grow in different areas, but overall satisfaction remains high. Neutral comments for HB2 are the highest among the hotels, at 4.5% This shows that the trends in some customer experiences are both good and bad.
The rate of positive reviews for HB3 is 74%; this is a lower rate than the other two hotels. However, it can be said that the satisfaction levels and service quality experienced by HB3’s guests are high. The negative review rate is 18.75%, which shows that some guests of the hotel are not satisfied with the services. However, compared to the other two hotels, the neutral review rate is lower (5.25%).
According to the analysis of positive, negative, and neutral comments for the three hotels, differences were observed in terms of customer satisfaction. HB1 has the highest customer satisfaction rate and is the leader in terms of positive comments. HB2, on the other hand, has a high rate of positive feedback but may require improvement in some areas to increase customer satisfaction. HB3 has a lower percentage of positive reviews, and a higher percentage of neutral reviews compared to the other two hotels, indicating potential areas for improvement to increase customer satisfaction. Overall, all three hotels are successfully managing customer satisfaction, with HB1 standing out with the highest satisfaction rate, while there are opportunities for HB3 to improve customer satisfaction.
4.2. Results of Text Mining
In this section, we explore the occurrence rates of positive, negative, and neutral words in fan comments. The lists highlight the top five most-used words for each sentiment category, chosen from an analysis of comments.
Table 5 shows the frequency of the positive words identified in these hotel reviews.
In these three hotels, staff, food, and service quality stand out among the common elements that customers express satisfaction with. In the first hotel (HB1), customers were particularly satisfied with the staff and dining experiences, with a high level of positive feedback on these elements. In the second hotel (HB2), it is understood that in addition to the food, staff attention and activities for children increase customer satisfaction. In the third hotel (HB3), in addition to food and staff, customers were also satisfied with the reasonable price and location. Overall, these comments point to the success of the hotels in terms of service quality and responsiveness to customer expectations.
Table 6 shows the frequency of negative words identified in these hotel reviews.
The negative comments expressed by customers in these three hotels indicate a need for improvement in certain services and facilities. In the first hotel (HB1), the condition of the rooms, staff attention, cleanliness, and the pool were criticized. In the second hotel (HB2), in addition to staff attention and room conditions, negative comments were made on specific issues, such as food quality, pool condition, and window cleaning. In the third hotel (HB3), negative feedback on the condition of the rooms, restaurant and service quality, customer relations, and staff behaviour stand out. This feedback indicates that hotel management needs to make improvements in areas such as hygiene, staff training, and service quality to increase customer satisfaction.
Table 7 shows the frequency of neutral words identified in these hotel reviews
Neutral reviews by customers at these three hotels describe the overall experience and offer evaluations without positive or negative sentiment. At the first hotel (HB1), the food, prices, and guest experience were discussed in a neutral manner, and the crowded dining areas were also noted. The second hotel (HB2) emphasized the rooms’ features, the helpfulness of the staff, and whether the hotel would be overall recommended. In the third hotel (HB3), factors such as the condition of the rooms, the parking and beverage services, and performance were evaluated in a neutral manner. These comments describe various aspects of the hotel services and facilities and provide an overall perspective on potential areas for improvement or noteworthy elements.
4.3. Results of Topic Modelling
Topic modelling is a technique used to analyze documents to uncover prevalent themes and effectively group them into clusters. In this case, the Latent Dirichlet Allocation (LDA) method was employed for topic modelling. The comments from hotel reviews were analyzed and organized into three distinct topics.
Not only was word frequency used to determine the factors, but the Latent Dirichlet Allocation (LDA) method was used for topic modelling. In this way, the main themes in customer reviews were identified, and their relationship with customer satisfaction was analyzed. In addition, word importance weights (TF-IDF) were calculated to determine the importance of the factors.
This analysis is visually represented in
Figure 4, which shows the topic modelling for HB1.
Figure 4 shows that customer reviews of a hotel are analyzed under three main headings using topic modelling. The first heading (46.3%) includes comments that focus on the quality of the hotel, the service level, and customer satisfaction in general, with terms such as “hotel,” “good,” “satisfied,” “food,” and “staff” being prominent. The second heading (28.1%) is centred around words such as “food,” “staff,” “rooms,” “rooms,” and “facility”, and reflects evaluations of the hotel’s physical facilities, such as food and room quality. The third heading (25.6%) includes comments on the hotel’s esthetics, restaurant, and food variety, using terms such as “quality,” “design,” “variety,” and “restaurant.” This analysis reveals that customer comments are grouped around three main themes: overall service quality, physical facilities, and the esthetic features of the hotel, and allows for a detailed assessment of the impact of each of these themes on customer satisfaction.
Figure 5 shows the results of a topic modelling study on hotel reviews. The reviews are categorized into three main topics, and the top 30 terms for each topic are presented. The first topic (43.5%) focuses on the overall satisfaction with the hotel’s food, rooms, and services, with terms such as “food,” “hotel,” and “staff” being prominent. The second topic (40.5%) focuses on issues related to social relations, such as the attitude of hotel employees towards guests and the family atmosphere, where words such as “hotel,” “staff,” and “thank” stand out. The third topic (16.1%) deals with service-orientated aspects such as the attitudes of hotel employees and the physical location of the hotel, where terms such as “staff,” “hotel”, “thank,” and “room” are prominent. Overall, this analysis shows in detail how customer feedback addresses different aspects of hotel services and what factors influence customer satisfaction.
Figure 6 shows the results of the topic modelling study of hotel reviews. These are categorized into three main topics, which analyze the main themes of the customer reviews. The first topic (36.2%) covers elements related to overall satisfaction, such as the hotel experience, general atmosphere, and staff relations, where terms such as “hotel,” “us,” “everything,” and “day” are prominent. The second topic (34.6%) includes comments on hotel amenities, service quality, and seasonal factors, where terms such as “good,” “recommend,” “staff,” and “performance” are important. The third topic (29.2%) refers to more specific service elements, such as hotel prices, cleanliness, and facilities for children; terms such as “good,” “hotel,” “price,” and “staffed” are noteworthy. This analysis was conducted to identify the impact of customer feedback on hotel services and to examine the contribution of different service elements to customer satisfaction.
The results of the topic modelling with LDA show that customer comments are clustered around certain themes. For example, at HB1, the factors most associated with customer satisfaction were staff attention, food quality, and cleanliness. In HB2, on the other hand, comments focused on social interaction, a family-friendly environment, and hotel services, indicating that customer satisfaction is addressed from different perspectives. In HB3, factors such as price–performance balance, location-based advantages, and the comfort of the rooms came to the fore. In addition, in order to better understand the relationship between the identified topics and customer satisfaction, sample customer comments are provided for each topic. For example, in HB1, statements such as ‘The cleaning and food were very good, and the staff was very attentive’ indicate that the hotel left a positive impression in the eyes of the customer.
The sentiment and topic modelling results revealed several sustainability-orientated themes within the guest reviews. For instance, terms such as “cleanliness”, “quiet environment”, “natural landscape”, and “eco-conscious staff” appeared prominently in positively labelled comments. In HB1 and HB2, guests frequently mentioned satisfaction with green areas, recycling bins, energy-saving systems, or the general environmental atmosphere of the hotel. Topic clusters also captured concerns about waste, food variety (organic/healthy options), and noise pollution. These findings suggest that, even when hotels do not have an official eco-label, customers perceive and value sustainable features, which strongly correlate with positive sentiment and customer loyalty.
4.4. Results of Word Cloud
A word cloud was created for each hotel review, and word frequency was calculated for each sentiment. Word frequency quantifies the number of times a certain letter appears in a text. This measure indicates which letters in the text are used more or less frequently. Word frequency analysis is used in many fields, such as linguistics, cryptography, and computer science. For each mood (positive and negative), word clouds and frequencies were created and analyzed in detail.
This was made more understandable through data visualization using the Seaborn 0.13.2 library of the Python 3.11.5 language. Data visualization is the presentation of abstract information in graphical form. It is intended to make complex and scattered data presented in classical format easily understandable and interpretable, with easily perceivable visuals. Word clouds are usually textual visuals created with different colours and patterns after an analysis of word frequencies [
77,
78].
A total of 4304 reviews were written for H1, and the otel.com score was 9.6 out of 10. In the study, the comments were analyzed using text mining techniques, and 3443 of them were classified as positive, 753 as negative, and 108 as neutral. The word cloud of customer reviews for HB3 is shown in
Figure 7.
In
Figure 7, the word cloud provides a visual representation of the terms that hotel customers use most frequently in their reviews. Prominent words in the image include terms such as “hotel”, “food”, “staff”, “satisfied”, and “service.” These keywords reflect the elements that customers value most in their hotel experience. Words such as “staff”, and “satisfied” indicate that staff satisfaction and overall service quality are frequently emphasized in customer reviews. Furthermore, words such as “food”, “restaurant”, and “clean” reveal that specific service components such as food quality and cleanliness also play an important role in customer satisfaction. In analyzing customer feedback, this word cloud provides valuable insights into the areas where hotel service quality needs to be improved. The word cloud of customer reviews for HB2 is shown in
Figure 8.
In
Figure 8, the word cloud shows the frequency and importance of the words in hotel guests’ comments. Prominent words in the image include “hotel”, “staff”, “food”, and “friendly”, indicating that guests most often comment on the service quality of the hotel staff and food. The large number of words such as “staff”, and “friendly” indicate that the staff are characterized by their positive qualities and hospitality. Other frequently used phrases include “service”, “recommend”, and “quality”, while adjectives such as “successful”, “beautiful”, and “spacious” reflect the overall level of satisfaction with the hotel. This indicates that guests generally had a positive experience and found the hotel successful in terms of both service quality and physical facilities.
The word cloud of customer reviews for HB3 is shown in
Figure 9.
In
Figure 9, the word cloud reflects the analysis of hotel guests’ comments based on their experiences. The most prominent words in the image include phrases such as “hotel”, “food”, “satisfied,” “staff”, and “performance.” The prominence of these words indicates guests’ satisfaction with the hotel’s service quality, food options, and overall performance. Words such as “satisfied”, and “nice” imply that visitors provide generally positive feedback and that there is a high level of satisfaction, particularly with the staff and service quality. However, factors such as “price” and “location” also appear to be important, suggesting that guests consider factors such as price and location when choosing a hotel. Overall, the word cloud gives the impression that the positive aspects of the hotel predominate, and that customer satisfaction and service quality are high.
The three analyzed word clouds provide important insights into hotel service quality and customer satisfaction by reflecting comments regarding hotel guests’ experiences. Keywords such as “hotel”, “staff”, and “food” are prominent in all three word clouds, indicating that guests are focused on service quality, staff, and food options. Words such as “satisfied”, and “performance” in
Figure 8 emphasize the overall satisfaction of customers and the performance of the hotel, while the words “friendly” and “recommend” in
Figure 7 highlight the hospitality of hotel staff and positive reviews. In
Figure 6, words such as “service”, and “clean” emphasize the importance of specific elements, such as service quality and cleanliness. What the three word clouds have in common is that they show that customers generally have a positive experience and are satisfied with the service quality of the hotels. However, factors such as “price”, and “location” also stand out in
Figure 8, implying that price and location play an important role in hotel selection. Overall, these word clouds present the strengths of the hotel services and the salient elements of customer satisfaction in a comparative way.
4.5. Evaluating the Role of Sustainability Perceptions in Hotel Guest Satisfaction
During the sentiment and topic modelling processes, a set of sustainability-related terms were tracked (e.g., “clean”, “natural”, “green”, “eco”, “quiet”, “recycling”, “energy-saving”). These keywords were used to identify customer reviews that implicitly or explicitly referenced environmentally friendly practices and sustainable hotel features. A filtered subset of the dataset containing these sustainability-orientated comments was then analyzed to assess their emotional tone and potential indicators of customer loyalty.
The findings reveal that 84% of sustainability-tagged reviews were classified as positive, a notably higher proportion than the overall positive sentiment rate in the full dataset. This suggests that customers tend to respond more favourably when they perceive hotels as environmentally conscious or offering nature-friendly services. In addition to the emotional tone, these reviews more frequently included expressions related to loyalty behaviours, such as willingness to return (“I will come again”, “next year again”) and active recommendations (“I would definitely recommend”, “I advised my friends”).
Moreover, topic modelling further supported this connection: terms associated with sustainability appeared prominently within clusters linked to high satisfaction themes, including hotel cleanliness, green areas, noise control, and eco-conscious staff behaviour. This indicates that even in hotels not officially certified as eco-friendly, guests detect and value environmentally responsible practices.
These results demonstrate that sustainability is not merely an ethical or operational consideration but a tangible factor influencing emotional satisfaction and customer retention. In other words, when hotels incorporate green practices—whether through energy-efficient infrastructure, waste reduction, or maintaining natural ambiance—they not only contribute to environmental goals but also enhance perceived service quality and brand loyalty.
Thus, sustainability-orientated practices emerge as key differentiators in the hospitality experience, reinforcing both customer satisfaction and competitive advantage. These findings underscore the strategic importance of integrating sustainable development goals into hotel management, not only for ecological responsibility but also to deepen guest engagement and build long-term brand equity.
5. Discussion
The sentiment analysis results provide operational insights into the emotional dynamics of customer feedback and the critical service factors affecting satisfaction. This study analyzed 15,522 online reviews of three five-star hotels in Antalya using deep learning-based sentiment analysis. The findings indicate that key drivers of satisfaction include hotel facilities, food quality, and staff professionalism. Positive customer reviews frequently highlight elements such as cleanliness, entertainment, and attentive service, whereas negative comments often focus on food variety, pricing, and service deficiencies. These patterns offer hotel managers a reliable basis to assess and enhance their service quality, while also contributing to the broader understanding of online reputation management in sustainable tourism.
In the study, deep learning was used for sentiment analysis classification. The model performance of the three hotels was evaluated using various metrics: precision, recall, accuracy and F1 score. H1 was a strong performer, with a high precision of 94.5% and recall of 98.5%, an overall accuracy of 96.3%, and an F1 score of 96.4%. H2 achieved a very high performance, with 97.3% precision and 98.6% sensitivity values, achieving an overall accuracy of 92.9% and an F1 score of 97.9%. This reveals that the model effectively evaluated H2 and provided a good balance. H3 was behind the other hotels, with an overall accuracy of 89.6%, although it achieved a precision of 97.2% and a sensitivity of 97.3%, and a balanced performance, with an F1 score of 97.2%. In general, H1 stands out with a high accuracy rate, while H2 has the highest precision and sensitivity values. Although H3 has a high F1 score, its accuracy is lower, indicating that the model faced some classification difficulties with H3.
From a theoretical perspective, this study provides a novel framework by integrating sustainability-oriented sentiment analysis with deep learning applications in the hospitality sector. Among the positive words that stand out in customer comments about hotels, it can be seen that the expressions hotel, food, staff, satisfied, restaurant, service, entertainment, cleanliness, and helpful are frequently used. These words are important elements that reflect positive guest thoughts and reinforce guests’ experiences. Gao et al. showed that positive online hotel reviews are related to features such as service, value, rooms and cleanliness [
9]. Phillips et al. stated in their study in Switzerland that positive online comments have a high impact on hotel performance and that room quality, internet, and building design are key factors [
7]. Ban et al. stated that positive online hotel reviews are related to features such as intangible service, physical environment, location, access, food and beverages, and empathy [
6]. Chaves et al. found that, in online evaluations of small and medium-sized hotels in Portugal, room, staff, and location were the primary features, and cleanliness, friendliness, helpfulness, and centrality of location were other main features [
79].
These findings have several practical implications for hotel managers. For example, HB2 has the highest F1-score, indicating that the model classified this hotel more accurately. However, this is not directly equivalent to customer satisfaction. Instead, it may indicate that reviews of a hotel with a high F1-score contain more prominent positive and negative themes. Hotel managers can use these analyses to identify the prominent elements in customer reviews and improve service quality accordingly.
When the general averages are examined, the average satisfaction score given by customers to hotels using the scoring system is 91%; however, when the emotional states in the comments are examined, it is seen that the average satisfaction rate is 77%. This difference between the scores given by customers to hotels and the comment-based analysis of their satisfaction is due to negative emotions. Among the words expressing negative emotions, food and variety, staff, food, price, crowd, and management stand out. These words show that the hotel inadequately meets customer expectations in some areas. This may create a negative situation for the corporate reputation of hotels. Criticisms about staff, cleanliness, food quality, and general service standards may make potential customers less likely to choose the hotel and make existing customers less likely to return. Rose and Blodgett stated that negative online hotel reviews are more likely to be related to controllable factors [
80]. Min et al. stated that negative online hotel reviews may be related to issues such as poor customer service, lack of empathy, and ineffective management [
81]. Phillips et al. [
7] revealed that negative online reviews were related to hotel features such as room quality, internet service, and building quality. Gao et al. [
9] stated that features such as service, rooms, and cleanliness were noted in negative comments. According to Boo and Busser [
21], negative online review content is related to hotel features such as location, service, and overall experience. Similarly, Liang et al. also stated that negative online hotel reviews are more likely to be related to hotel features such as location, price, and service quality [
10]. Again, Tsai et al. also emphasized that negative online hotel reviews may be related to certain hotel features, such as location, price, and customer service [
34].
While the information obtained from customers about the company’s products and services on online platforms helps businesses develop new strategies to gain a sustainable competitive advantage, it also makes it easier for them to manage their corporate reputation [
82]. Corporate reputation is important for hotel businesses as it helps them differentiate themselves from their competitors and send signals to their stakeholders about their performance [
9]. As a result of this study, hoteliers can gain insight into researching online reviews and their relationship to overall customer ratings. It is important for hotel managers to improve their products and services in line with customer feedback. The results of this research may enable hotel managers to improve the factors affecting online corporate reputation based on customer comments.
This study extends the literature by introducing a deep learning-based framework that is capable of identifying implicit sustainability-related sentiments in customer reviews. Unlike studies focused only on eco-labelled hotels, our approach identifies sustainability-orientated sentiment across mainstream hotel reviews. Most prior studies on sustainable tourism rely on data from explicitly green-labelled hotels or surveys focused on environmental attitudes. In contrast, our approach leverages unstructured textual data from mainstream five-star hotels and uncovers how environmental and sustainability factors influence customer satisfaction even in conventional hospitality settings. This positions the study as a bridge between sentiment analytics and sustainable service management. This novel approach provides insights into guest expectations and opens new pathways for incorporating sustainability metrics into broader hotel reputation and service evaluations. Through using AI-based sentiment analysis to extract sustainability perceptions from customer reviews, hotels can develop real-time dashboards that monitor guest sentiment regarding eco-practices, cleanliness, staff behaviour, and environmental ambiance. This enables more targeted improvements, in line with sustainable tourism goals.
This study also has some limitations that could be addressed in future research. Firstly, the online comments used in this study were collected from the OtelPuan platform. To obtain more sufficient information, online reviews on multiple platforms (such as booking.com, TripAdvisor, and Ctrip) could be used. Secondly, this study covers three hotels located in a certain region of Antalya. The number of hotels, number of regions, and types of hotels could be increased to increase the efficiency of future work. Third, this study used online review data. More in-depth results could be obtained using a variety of data sources, including hotels’ internal data and review data. It would also be interesting to apply a similar approach to other businesses that make up the tourism industry, such as travel agencies, airlines, cruise ships, and restaurants. Although this dataset is limited to a specific region, the analytical model proposed in this study offers a scalable foundation for future applications across different geographies and hotel categories.
The findings of the study can be interpreted in a way that can contribute to sustainable hospitality policies. For example, sustainability-orientated comments can be analyzed to determine the impact of environmentally friendly practices on customer satisfaction. In future research, it is recommended to evaluate the comments of customers staying in environmentally friendly hotels as a separate category and to examine the impact on customer loyalty in these hotels. The selection of these hotels—based on their high review volume and relevance to luxury tourism—is methodologically justified. However, it is acknowledged that customer expectations, satisfaction drivers, and sustainability perceptions may vary across different hotel categories (e.g., boutique or eco-certified hotels) and geographical or cultural contexts.
While the current findings offer a meaningful starting point, future studies would benefit from expanding the dataset to include a broader and more diverse range of hotels across multiple regions. Such an approach would allow for a more comprehensive understanding of customer sentiment and loyalty, particularly in relation to sustainability practices. This expansion would also enhance the external validity of the model and support the development of more generalizable and actionable strategies for the tourism industry.