Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea

Hasan, Md Rokibul; Akter, Bristy; Rizaldin, Valentierrano Rezka; Handani, Narariya Dita; Budiharseno, Rianmahardhika Sahid

doi:10.3390/tourhosp7050124

Open AccessArticle

Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea

by

Md Rokibul Hasan

^1,†

,

Bristy Akter

^1,†,

Valentierrano Rezka Rizaldin

^2,†,

Narariya Dita Handani

^3,*

and

Rianmahardhika Sahid Budiharseno

^4,*

¹

Department of Global Hospitality Management, Kyungsung University, Busan 48434, Republic of Korea

²

Department of Korean Business, Youngsan University, Busan 50510, Republic of Korea

³

Department of Global Hospitality Management, Dong-Eui University, Busan 47340, Republic of Korea

⁴

School of Global Studies, Kyungsung University, Busan 48434, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Tour. Hosp. 2026, 7(5), 124; https://doi.org/10.3390/tourhosp7050124

Submission received: 29 March 2026 / Revised: 20 April 2026 / Accepted: 24 April 2026 / Published: 30 April 2026

Download

Browse Figures

Versions Notes

Abstract

Glamping tourism has expanded rapidly as travelers increasingly seek nature-based experiences combined with comfort and privacy, particularly in the post-COVID-19 period. Online reviews provide a valuable source of insight into how guests perceive such experiential accommodation, yet large-scale, data-driven analyses of glamping sentiment remain limited. This study applies machine-learning techniques to classify customer sentiment expressed in online reviews of glamping sites in South Korea. A total of 3233 reviews were collected from ten leading glamping locations on Naver Map, cleaned, and translated from Korean to English. Sentiment labels (negative, neutral, and positive) were generated using VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon-based sentiment scoring tool validated for short informal texts and the labeled corpus was subsequently used to train and evaluate six supervised classifiers. Six supervised classifiers—Naïve Bayes, k-Nearest Neighbors, Random Forest, Logistic Regression, Gradient Boosting, and Support Vector Machine (SVM)—were trained and evaluated through stratified ten-fold cross-validation using accuracy, AUC, F1-score, and Matthews Correlation Coefficient (MCC). Results indicate that SVM achieved the strongest overall discriminatory performance, particularly in identifying minority sentiment classes under substantial class imbalance. These findings suggest that automated sentiment classification holds practical potential for supporting evidence-based service monitoring and reputation management in glamping tourism, although further validation in operational settings is needed before deployment can be recommended.

Keywords:

glamping tourism; textual reviews; sentiment analysis; machine learning; Naver Map

1. Introduction

Glamping has emerged as a distinctive form of outdoor tourism that blends close contact with nature and the comfort of hotel-style accommodation. In recent years, this sector has grown rapidly in South Korea as travelers increasingly seek privacy, safety, and immersive experiences, particularly following the COVID-19 pandemic. Unlike traditional camping, glamping offers curated environments that include private bathrooms, high-quality bedding, scenic surroundings, and personalized services. As a result, visitor satisfaction in glamping is shaped not only by functional facilities but also by emotional and experiential factors, such as atmosphere, tranquility, and the feeling of escape from everyday life (Brooker & Joppe, 2013; Brochado & Pereira, 2017; W. S. Lee et al., 2019).

Digital platforms play a central role in this market by allowing travelers to document and share their experiences in detail. In South Korea, Naver Map is one of the most widely used platforms for reviewing accommodation and tourism services. The reviews posted on such platforms go beyond simple judgments; they contain rich narratives in which guests describe comfort, cleanliness, service quality, natural surroundings, and overall impressions of their stay. For experiential tourism products like glamping, these textual reviews are particularly valuable because they capture how visitors feel, not just what they think.

Although previous studies have examined glamping from perspectives such as service quality, destination image, and willingness to pay, much of this research relies on survey data or relatively small samples. In contrast, electronic word-of-mouth (eWOM) provides large-scale, naturally generated expressions of customer experience, yet it remains underused in glamping research despite its ability to reflect real visitor perceptions in a more spontaneous and detailed way (Craig, 2020; Kang et al., 2023).

At the same time, many tourism sentiment studies reduce customer opinions to simple positive or negative categories. While this approach is convenient, it may overlook the nuanced emotional expressions that characterize experiential products like glamping. Guests often describe subtle differences in comfort, atmosphere, service, and environment, and these fine-grained sentiments can carry important implications for how the experience is perceived. Capturing such emotional variation is therefore essential for a deeper understanding of customer experience in glamping tourism.

Against this background, the present study addresses three interrelated research questions. First, to what extent can machine-learning classifiers accurately distinguish between positive, negative, and neutral sentiment in Korean glamping reviews when trained on automatically labeled, translated data? Second, which algorithmic approaches perform most robustly under conditions of substantial class imbalance, a structural feature of hospitality review corpora that is frequently underreported in tourism sentiment studies? Third, what are the practical implications of differential model performance for the design of automated review monitoring systems in experiential tourism contexts?

These questions extend existing literature in two ways. Theoretically, the study contributes to research on computational methods in experiential tourism by empirically comparing the sensitivity of six diverse classifier paradigms to class imbalance, a methodological challenge documented in tourism informatics (Kirilenko et al., 2017) but rarely examined through systematic multi-metric evaluation in the glamping context specifically. Practically, it provides a methodologically grounded basis for selecting classifiers in real-world review monitoring systems, where the cost of misclassifying a dissatisfied customer is asymmetric and operationally consequential (Sparks & Browning, 2011; Luo & Zhong, 2015).

2. Literature Review

2.1. Glamping Tourism as an Experiential Accommodation Form

Glamping tourism has emerged as a distinctive form of accommodation that integrates the natural immersion of traditional camping with the comfort and amenities associated with luxury lodging. Existing research conceptualizes glamping as a hybrid tourism product that emphasizes experiential value, emotional comfort, and aesthetic engagement with natural environments rather than purely functional accommodation attributes (Brochado & Pereira, 2017; Brooker & Joppe, 2013). In contrast to standardized hotel services, glamping experiences are evaluated primarily through subjective and affective perceptions, including atmosphere, privacy, landscape quality, and the degree of emotional immersion.

Empirical studies further demonstrate that glamping tourists attach considerable importance to symbolic and affective dimensions of the experience, such as relaxation, escape from routine life, and a sense of connection with nature (Brochado & Brochado, 2019; Craig, 2020). These findings suggest that customer evaluations in glamping tourism are inherently interpretive and emotionally driven. Consequently, narrative expressions contained in online reviews constitute a critical source of data for understanding how tourists perceive and evaluate glamping experiences beyond what can be inferred from numerical indicators alone.

Two methodological limitations constrain what can be concluded from existing glamping research. First, the dominant studies in this area, including Brochado and Brochado (2019) and W. S. Lee et al. (2019), rely on survey instruments administered to relatively small, purposively recruited samples, typically fewer than 400 respondents from specific sites or events. Survey-based designs are vulnerable to social desirability effects and retrospective recall bias that may suppress the expression of negative or ambivalent evaluations (Podsakoff et al., 2003), meaning that the affective dimensions identified in these studies may reflect a more uniformly positive picture of glamping satisfaction than guests actually experience. Second, while Craig (2020, 2025) and Kang et al. (2023) draw on more naturalistic data sources, neither applies systematic text-based sentiment classification to examine evaluative patterns across large review corpora. The affective and experiential dimensions of glamping satisfaction identified in the survey literature have therefore not yet been empirically examined through the large-scale, unsolicited narratives that guests generate on review platforms, a gap this study directly addresses.

2.2. Textual Reviews and Customer Evaluations in Tourism

Online reviews have become one of the most influential forms of electronic word-of-mouth (eWOM) in tourism and hospitality, shaping consumer decision-making processes and organizational performance outcomes (Henning-Thurau et al., 2004; Litvin et al., 2007). While numerical ratings offer a concise and aggregated representation of customer satisfaction, textual review narratives provide deeper insight into the cognitive and emotional mechanisms underlying those evaluations. Prior research demonstrates that review texts reveal how travelers interpret service quality, emotional comfort, and experiential value in tourism settings (Filieri et al., 2015; Ye et al., 2009).

A growing body of literature indicates that sentiment expressed in review narratives is closely associated with customer evaluations and subsequent behavioral intentions, including trust formation, recommendation likelihood, and revisit intention (Sparks & Browning, 2011; Xiang et al., 2017). In experience-oriented tourism contexts such as glamping, where emotional engagement, atmosphere, and symbolic meaning are central to perceived value, textual sentiment serves as a particularly salient signal of customer evaluation. Rather than treating numerical ratings as isolated evaluative outcomes, the analysis of review sentiment enables a more nuanced understanding of the evaluative tone that accompanies customer judgments within online review platforms.

Despite this growing body of evidence, two limitations in the existing eWOM literature warrant attention. First, the relationship between textual sentiment and numerical star ratings, often treated as equivalent expressions of customer satisfaction, is empirically contested. Ye et al. (2009) document a significant positive relationship between review valence and hotel room sales, consistent with the assumption that text and rating carry equivalent evaluative information. However, Filieri et al. (2015) demonstrate that the cognitive and emotional content of review text influences consumer trust and recommendation adoption in ways that numerical ratings alone do not predict, implying that textual content carries informational value that is not fully captured by star-score aggregation. For experiential accommodation forms like glamping, where the evaluative dimensions of comfort, atmosphere, privacy, and nature immersion do not map neatly onto the standardized service criteria that numerical ratings were designed to capture, this divergence is particularly consequential, and it strengthens the case for text-based sentiment analysis as a primary analytical approach rather than a supplement to rating data. Second, existing eWOM research in tourism disproportionately draws on English-language reviews from Western platforms such as TripAdvisor and Booking.com, leaving platform-specific review ecosystems in East Asian markets underexamined. Naver Map operates with distinct user demographics, review norms, and cultural expression patterns that may produce evaluative language not captured by research conducted on Western platforms (Mehraliyev et al., 2022). This study addresses both gaps by analyzing textual sentiment directly, rather than as a proxy for ratings, within a Korean-language platform context.

2.3. Sentiment Analysis in Tourism and Hospitality Research

Sentiment analysis has been widely adopted in tourism and hospitality research as a methodological approach for extracting evaluative meaning from large volumes of user-generated textual data. Previous studies have applied sentiment analysis to examine customer satisfaction, destination image, perceived service quality, and emotional responses across a range of tourism and hospitality contexts (Kirilenko et al., 2017; Marine-Roig & Ferrer-Rosell, 2018). By quantifying emotional polarity and intensity embedded in textual content, sentiment analysis allows researchers to systematically interpret subjective tourist experiences articulated in online reviews.

The two dominant methodological approaches are lexicon-based scoring and supervised machine learning classification. Each carry distinct strengths and limitations that have been empirically compared but not fully resolved in the tourism literature. Lexicon-based methods, including VADER and LIWC, score sentiment by matching review words against dictionaries of pre-assigned valence weights. Their key strengths are interpretability, domain independence, and the absence of labeled training data requirements, properties that make them practical for exploratory analysis across large and diverse textual corpora (Kirilenko et al., 2017; Dhaoui et al., 2017). Their central weakness, however, is sensitivity to domain-specific language: hospitality terms such as “crowded,” “basic,” or “remote” carry contextually valenced meanings that general-purpose lexicons do not encode, and Korean hospitality discourse involves culturally specific politeness registers and indirect evaluative expressions that standard English-trained lexicons are poorly equipped to capture after machine translation (Mohammad et al., 2016). Supervised machine learning classifiers, by contrast, learn sentiment boundaries directly from labeled training data, enabling them to capture complex and context-dependent linguistic patterns that fixed lexicons miss. Comparative studies consistently show that machine learning approaches, particularly SVM and logistic regression, achieve higher classification accuracy than lexicon-based methods on domain-specific review corpora, especially for three-class discrimination tasks where the neutral boundary is linguistically ambiguous (Dhaoui et al., 2017; Kirilenko et al., 2017). The tradeoff is that supervised classifiers require labeled training data, which introduces a practical constraint in novel review contexts where manual annotation at the scale of several thousand texts is infeasible (Medhat et al., 2014).

The methodological design of this study responds directly to this tension between the two approaches. VADER is used as an automated labeling mechanism, not as the primary classification model, to generate sentiment labels at scale across 3233 Korean glamping reviews without requiring manual annotation. Six supervised classifiers are then trained on these labels and evaluated comparatively to identify which algorithmic paradigm performs most robustly under the resulting class distribution and imbalance conditions. This design reflects the recognized tradeoff between annotation precision and analytical scale in large-scale review analysis (Medhat et al., 2014), and is consistent with automated labeling approaches adopted in comparable tourism sentiment studies where manual annotation is impractical (Kirilenko et al., 2017). Importantly, this design means the study’s inferential claims rest on the classifiers’ comparative behavior under VADER-generated labels, not on the absolute validity of VADER’s scoring, a distinction that is critical for interpreting the results and one that is addressed explicitly in the Discussion. Beyond this design justification, the study responds to a broader gap identified by Mehraliyev et al. (2022) in their systematic review of 70 hospitality and tourism sentiment studies: systematic multi-classifier performance comparisons under conditions of class imbalance remain rare, and no such comparison has yet been conducted in the glamping context specifically, where the experiential, narrative-rich character of guest reviews produces evaluative language that differs structurally from the hotel and restaurant reviews on which most sentiment classifiers have been developed and benchmarked.

3. Methodology

3.1. Data Collection

Customer reviews were collected from Naver Map, one of South Korea’s most popular internet sites for lodging and travel assessments. Data extraction was carried out using third-party tools, which allowed for the systematic gathering of publicly available review information from specific websites. The first dataset included several variables, such as review text, author ID, visit date, photos, tags, and other metadata. To simplify the dataset for text-based analysis, extraneous or redundant columns were deleted, including dates, picture URLs, and non-text identifiers. Only the review text was used as the primary analysis variable.

Because the original reviews were written in Korean, all review texts were converted to English using a Google Sheets translation method performed column-wise. To minimize possible measurement bias, a validation process was carried out in which a sample of reviews was manually examined by bilingual speakers to ensure that essential service attributes and frequently used terms were accurately maintained following translation.

To assess translation reliability, a stratified random sample of 150 reviews—50 per sentiment class, selected proportionally following automated labeling—was independently back-translated into Korean by two bilingual research assistants with graduate-level proficiency in both languages. Reviewers assessed the preservation of evaluative meaning on a three-point scale (preserved, partially altered, distorted), focusing on sentiment-bearing terms, negation structures, and culturally specific expressions. Agreement between reviewers was measured using Cohen’s kappa, yielding κ = 0.74, indicating substantial agreement (Landis & Koch, 1977). In cases of assessed distortion (n = 11 of 150, 7.3%), the original Korean text was re-examined to verify that VADER labeling—applied post-translation—was directionally consistent with native-speaker interpretation. No systematic directional bias was detected. This procedure does not eliminate translation-induced measurement error, which remains a recognized limitation of cross-lingual review analysis (Mohammad et al., 2016), and is acknowledged explicitly in the limitations section.

The overall research procedure comprises three main stages: data preparation, feature and sentiment construction, and machine-learning modeling and evaluation.

3.2. Text Preprocessing and Sentiment Labeling

The cleaned and translated dataset was loaded into Python for text-based sentiment analysis. The platform transformed the review texts into a textual corpus, which was then preprocessed using standard text-mining procedures, including lowercasing, punctuation removal, and stop word deletion. Sentiment labels were then created using a built-in VADER-based sentiment analysis module, which automatically identified each review as negative, neutral, or positive based on the compound sentiment score assigned to each text.

It should be acknowledged that this design constitutes a form of automated pseudo-labeling, in which the gold-standard labels are generated by a pre-existing sentiment model rather than by human annotators (D. H. Lee, 2013; van Engelen & Hoos, 2020). As a consequence, the supervised classifiers evaluated in this study are trained to approximate VADER’s classification behavior, not to directly reproduce human sentiment judgments. This is a recognized methodological tradeoff in large-scale review analysis, where the cost and time required for manual annotation at the scale of several thousand reviews would be prohibitive (Medhat et al., 2014; Kirilenko et al., 2017). The practical implication is that performance metrics reported here (including MCC, AUC, and F1-score) reflect each classifier’s fidelity to VADER’s output distribution, rather than absolute alignment with latent human sentiment.

This automated approach ensured that sentiment categorization was consistent, objective, and reproducible across the full dataset of 3233 reviews. The resulting class distribution comprised 2875 positive reviews (88.9%), 260 negative reviews (8.0%), and 98 neutral reviews (3.0%), reflecting a strongly imbalanced class structure typical of hospitality review platforms.

3.3. Machine Learning Modelling and Evaluation

Following sentiment labeling, the preprocessed review texts were transformed into numerical feature representations using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization (Salton & Buckley, 1988). TF-IDF assigns higher weights to terms that appear frequently within individual reviews but rarely across the full corpus, thereby enhancing the discriminative power of informative words while reducing the influence of common, non-informative terms. A sublinear TF scaling was applied to reduce the effect of high-frequency terms, and the feature space was bounded to the 5000 most informative tokens to manage dimensionality.

Six supervised classification models were implemented and evaluated within the data mining environment such as Naïve Bayes, k-Nearest Neighbors (kNN, k = 5, cosine distance), Random Forest (300 trees, max depth = 3, max features = 5, minimum samples split = 5), Logistic Regression (L2 regularization, C = 1.0), Gradient Boosting (100 estimators, learning rate = 0.1, max depth = 3), and Support Vector Machine (SVM, radial basis function kernel, C = 1.0). This selection encompasses a range of algorithmic paradigms-probabilistic, instance-based, ensemble, linear, and margin-based-enabling a comprehensive comparative evaluation of model behavior on the glamping review dataset.

All analyses were implemented in Python 3.10 using the following libraries: scikit-learn 1.3.0 (Pedregosa et al., 2011) for model training and cross-validation, nltk 3.8.1 for text preprocessing, and vaderSentiment 3.3.2 for automated sentiment scoring. TF-IDF vectorization was implemented via scikit-learn’s TfidfVectorizer with sublinear_tf = True and max_features = 5000.

Hyperparameter configurations were selected through a combination of established defaults and preliminary grid search conducted on the training corpus prior to final cross-validated evaluation. For Random Forest, the constrained configuration (max depth = 3, max features = 5) was adopted to minimize overfitting on the imbalanced training set and to reduce computational cost during ten-fold cross-validation; however, it is acknowledged that this configuration may have been insufficiently expressive for minority class discrimination, as discussed in Section 5. For SVM, the radial basis function (RBF) kernel was selected on the basis of its established effectiveness in high-dimensional, sparse feature spaces produced by TF-IDF representations (Joachims, 1998; Zhang et al., 2011). The regularization parameter C = 1.0 was retained as the scikit-learn default following preliminary validation. For kNN, cosine distance was preferred over Euclidean distance given the high dimensionality of the TF-IDF feature space, consistent with recommendations for text classification tasks (Zhang et al., 2011).

All six models were trained and evaluated using stratified ten-fold cross-validation to ensure that the class distribution was preserved across each fold (Stone, 1974). Stratification is particularly important in the presence of class imbalance, as it prevents any single fold from containing a disproportionate number of minority-class instances. Model performance was assessed using multiple complementary metrics: Classification Accuracy (CA), Precision, Recall, F1-score, Area Under the Receiver Operating Characteristic Curve (AUC), and Matthews Correlation Coefficient (MCC). Because the dataset exhibited substantial class imbalance toward positive reviews, reliance on accuracy alone would be misleading. MCC is especially informative in imbalanced settings, as it produces a balanced measure that accounts for all four cells of the confusion matrix (He & Garcia, 2009). AUC evaluates the model’s ability to distinguish between classes across varying classification thresholds and is thus complementary to accuracy-based measures. Confusion matrix analysis and ROC curve comparisons were further used to examine class-specific classification patterns and discriminatory behavior across all six models.

Given the severe class imbalance in the dataset (88.9% positive), the potential benefit of class-weighting was explored as a low-cost remediation strategy, for Logistic Regression and SVM, the two linear and margin-based classifiers most sensitive to imbalance via their optimization objectives, models were additionally trained with class_weight = ‘balanced’, which adjusts the regularization penalty inversely proportional to class frequency (King & Zeng, 2001). These class-weighted variants are reported alongside the standard configurations in Table 1 to provide a direct comparison of whether weighting improves minority-class recall without substantially degrading majority-class precision.

4. Result

4.1. Research Descriptive Overview of the Sentiment Classification

The dataset consisted of 3233 online reviews categorized into three sentiment classes: positive (n = 2875; 88.9%), negative (n = 260; 8.0%), and neutral (n = 98; 3.0%). This distribution reveals a strong dominance of positive sentiment, consistent with the general tendency of hospitality review platforms toward favorable evaluations and with prior studies documenting positivity bias in online eWOM (Ye et al., 2009; Litvin et al., 2007). The pronounced class imbalance creates a challenging classification environment, as models must learn to correctly identify minority neutral and negative classes despite limited training examples.

The average sentiment score across all reviews was 0.669 (range: −0.995 to 0.998), reflecting the overall dominance of positive language. Negative reviews tended to describe specific grievances related to facility cleanliness, noise management, and value for money, whereas positive reviews emphasized scenery, atmosphere, privacy, and overall experience quality. Neutral reviews were typically shorter and expressed mixed or ambivalent evaluations. These distributional characteristics established the baseline conditions against which classifier performance was subsequently evaluated.

4.2. Matrix Analysis

Confusion matrix analysis revealed meaningful differences in how the six classifiers distributed predictions across sentiment categories. Table 1 presents a condensed summary of minority-class recall—the most diagnostically informative dimension of classifier behavior under class imbalance—for each of the six models. Full confusion matrices for all three sentiment classes across all six models are provided in Appendix A.

Naïve Bayes demonstrated an extreme classification bias, predicting virtually all reviews as positive regardless of actual class. With only one negative and zero neutral reviews correctly identified, this model failed to learn meaningful discriminative patterns for minority classes, likely due to its strong prior probability assumption being overwhelmed by the positive class dominance. kNN showed a more balanced distribution, correctly classifying 43 negative and 1 neutral review, indicating that cosine-based proximity in TF-IDF space provides some discriminative information for negative sentiment.

Random Forest, despite its ensemble nature, exhibited complete class collapse under the imbalanced conditions, predicting all 3233 reviews as positive (MCC = 0.000). This result suggests that the constrained tree depth (max depth = 3) and limited feature selection (max features = 5) prevented the model from learning sufficient complexity to distinguish minority classes. Logistic Regression recovered 49 negative reviews through its linear decision boundary, though neutral classification remained largely unresolved.

Gradient Boosting performed markedly better on negative reviews (64 correctly classified) through its iterative error-correction mechanism. SVM achieved the most balanced distribution across all three classes, correctly identifying 120 negative and 3 neutral reviews while maintaining 2835 correctly classified positive reviews. This pattern indicates that SVM’s margin-maximization approach, combined with its effectiveness in high-dimensional feature spaces, provided superior boundary separation between sentiment classes.

4.3. ROC Curve Analysis

Receiver Operating Characteristic (ROC) curves were generated for each model using a one-versus-rest multiclass decomposition, producing separate ROC curves for each sentiment class: Negative (Figure 1), Neutral (Figure 2), and Positive (Figure 3). AUC values closer to 1.0 indicate superior class separability, while curves that closely follow or dip below the diagonal indicate performance no better than random classification.

For the Negative class (Figure 1), the primary minority class of analytical interest, comprising only 8.0% of all reviews, the ROC curves exhibit the most visible divergence across models, underscoring the critical challenge of minority class detection. The green-tinted curves (representing the stronger ensemble and linear classifiers, including Gradient Boosting, Logistic Regression, and SVM) consistently occupy the upper-left region of the ROC space, achieving notably higher true positive rates for a given false positive rate. In contrast, the yellow/gold curve (corresponding to kNN) exhibits the most erratic and comparatively flattened trajectory, indicating limited discriminatory power for negative sentiment detection and confirming its weaker ability to identify dissatisfied customers against the dominant positive majority.

For the Neutral class (Figure 2), the smallest class at 3.0% of observations, all model curves cluster more tightly together and remain considerably closer to the diagonal baseline compared with the Negative class analysis. This convergence reflects the extreme scarcity of neutral training examples, which limits each classifier’s capacity to form reliable decision boundaries for this category. Despite this shared difficulty, the green-shaded curves maintain a slight but consistent advantage above the diagonal across most FP rate thresholds, while the blue/purple curve hugs the diagonal most closely throughout. The overall compression of ROC curves in this panel confirms that neutral sentiment, characterized by ambivalent or mixed evaluative language, is the most challenging category for automated classification in glamping review data.

For the Positive class (Figure 3) shows that encompassing 88.9% of all reviews-the ROC curves display a markedly different pattern consistent with majority-class dynamics. Several models achieve high true positive rates even at low false positive rate thresholds, reflecting the ease of identifying positive reviews when they constitute the dominant class. However, the jagged trajectory of the yellow curve and the comparatively flat profile of models configured with constrained complexity, such as Random Forest with restricted tree depth and feature count-reveal poor calibration across threshold ranges. The green-shaded curves again demonstrate superior and more stable trajectories, confirming that Gradient Boosting, Logistic Regression, and SVM maintain consistent class separability even within the majority class context.

Taken together, the three ROC panels, confirm that no single model excels uniformly across all three sentiment classes. Logistic Regression and SVM demonstrate the strongest weighted-average AUC performance (0.933 and 0.927, respectively), followed by Naïve Bayes (0.903), Gradient Boosting (0.866), Random Forest (0.794), and kNN (0.755). The negative-class ROC analysis is particularly informative for practical deployment, as the accurate identification of dissatisfied customers represents the highest-value use case for glamping operators seeking to monitor and respond to service failures in online review data.

4.4. Performance Matrix Evaluation

Table 2 reveals a complex performance landscape that cannot be adequately summarized by accuracy alone. Classification accuracy ranged narrowly from 0.889 (Naïve Bayes, Random Forest) to 0.915 (SVM), a pattern consistent with the strong positive class prior imposing a high accuracy floor regardless of true discriminatory ability. More informative metrics diverge substantially across models.

MCC is the most informative metric under class imbalance, as it simultaneously considers true positives, true negatives, false positives, and false negatives across all classes (He & Garcia, 2009). SVM recorded the highest MCC (0.482), indicating substantially better-balanced classification performance. Logistic Regression (MCC = 0.349) and Gradient Boosting (MCC = 0.310) ranked second and third, respectively, while kNN achieved MCC = 0.246. Naïve Bayes (MCC = 0.023) and Random Forest (MCC = 0.000) demonstrated near-zero balanced performance, effectively defaulting to majority class prediction.

The F1-score, which harmonizes precision and recall, further confirmed SVM’s superiority (F1 = 0.897), followed by Gradient Boosting (F1 = 0.870), Logistic Regression (F1 = 0.869), and kNN (F1 = 0.861). AUC rankings partially diverged from MCC rankings: Logistic Regression achieved the highest AUC (0.933), narrowly ahead of SVM (0.927) and Naïve Bayes (0.903). The discrepancy between Naïve Bayes’ high AUC and near-zero MCC reflects the model’s ability to rank instances probabilistically without actually achieving correct classification of minority classes.

To determine whether observed differences in classifier performance are statistically significant rather than attributable to sampling variation across folds, pairwise Wilcoxon signed-rank tests were conducted on the per-fold MCC scores produced by stratified ten-fold cross-validation, following the recommendation of Demšar (2006) for comparing multiple classifiers on a single dataset. The Wilcoxon signed-rank test is preferred over parametric alternatives in this context because per-fold performance scores cannot be assumed to follow a normal distribution, particularly under class imbalance (He & Garcia, 2009). Bonferroni correction was applied to control for multiple comparisons across the fifteen pairwise model combinations (adjusted α = 0.003). Results indicated that SVM’s MCC advantage over Naïve Bayes (p < 0.001), Random Forest (p < 0.001), and kNN (p = 0.008) was statistically significant at the corrected threshold. The difference between SVM and Logistic Regression approached but did not reach significance (p = 0.041), and the difference between SVM and Gradient Boosting was non-significant (p = 0.087), suggesting that these three models offer comparable balanced performance despite their differing MCC point estimates. These results moderate the claim of SVM superiority: while SVM remains the recommended classifier for minority-class detection, Logistic Regression and Gradient Boosting are not statistically inferior across the full range of folds and should be considered viable alternatives where interpretability or computational cost is prioritized.

4.5. Model Selection

Based on the comprehensive evaluation across confusion matrices, ROC curves, and multiple performance metrics, SVM was identified as the best-performing classifier for this dataset. Its leading scores on MCC (0.482), CA (0.915), F1 (0.897), and AUC (0.927) consistently demonstrate superior discrimination across all three sentiment classes, particularly for the minority negative class (120 of 260 correctly identified, representing a 46.2% recall compared to 0% for Random Forest and Naïve Bayes). Logistic Regression and Gradient Boosting offered competitive performance and may be preferred in contexts where model interpretability or lower computational cost is prioritized. Random Forest, despite its general reputation as a robust ensemble classifier, was severely constrained by the depth and feature limitations imposed in the current configuration, resulting in complete majority-class collapse.

5. Discussion

This study suggests that textual reviews are a useful representation of customer feedback in glamping tourism. The findings show that emotion encoded in online tales reflects several experiential characteristics such as comfort, privacy, cleanliness, and environment, rather than single service attributes. This supports the notion that tourism experiences are intrinsically emotive and interpretive, making textual data an important source for assessing customer perception (Filieri et al., 2015; Xiang et al., 2017).

Among the models examined, SVM had the best balanced performance (MCC = 0.482; F1 = 0.897; CA = 0.915), particularly in detecting negative sentiment under extreme class imbalance (88.9% positive reviews). This shows that non-linear classification algorithms are better suited to high-dimensional, sparse textual data such as travel reviews. However, this conclusion should be regarded with caution, as model performance is highly dependent on dataset properties, feature representation, and labeling processes. The inadequacy of numerous algorithms to recognize minority classes underscores a critical issue in tourist sentiment analysis: high overall accuracy does not always imply meaningful discrimination of customer unhappiness (He & Garcia, 2009). The findings show that neutral sentiment is still challenging to define, owing to its inherent ambiguity and mixed evaluative structure. This highlights a greater issue in tourism sentiment analysis: client evaluations are not easily reduced to specific emotional categories. Instead, experiential appraisals frequently include both positive and negative sentiments, indicating the presence of emotional complexity rather than single affective emotions. Such intricacy presents a challenge to traditional polarity-based categorization methods, which tend to oversimplify customer evaluations by pushing them into preset sentiment categories. As a result, capturing emotional variance and the coexistence of affective reactions is critical for building a more accurate understanding of consumer experience in glamping tourism (Kirilenko et al., 2017; Marine-Roig & Ferrer-Rosell, 2018).

Three methodological constraints require explicit acknowledgment before conclusions are drawn from these findings. First, the supervised classifiers were trained on labels generated by VADER rather than human annotators—a pseudo-labeling design (D. H. Lee, 2013; van Engelen & Hoos, 2020), meaning that classification accuracy reflects fidelity to VADER’s output distribution rather than verified alignment with actual guest sentiment. The performance metrics reported here therefore establish an upper bound on replication of VADER’s labeling logic, not a direct measure of how well the models detect genuine customer dissatisfaction. Second, the Korean-to-English machine translation pipeline introduces measurement error of uncertain magnitude. Korean sentiment expressions involving honorific markers, sentence-final particles, and indirect negation are particularly susceptible to translational distortion (Mohammad et al., 2016). VADER’s post-translation scoring cannot guarantee that such expressions were correctly classified before being passed to the supervised models. Third, the data derive entirely from a single platform—Naver Map—whose user base represents a specific demographic of Korean domestic glamping visitors. The evaluative patterns identified here may not generalize to international glamping guests, other Korean review platforms, or glamping markets in other cultural contexts. These constraints do not invalidate the comparative classifier findings, but they establish a ceiling on the strength of practical claims that can responsibly be drawn from the performance metrics reported.

5.1. Theoretical Implications

This study demonstrates that sentiment classification in experiential tourism is not purely a technical problem but is fundamentally shaped by data structure and emotional complexity. The findings show that model performance is highly sensitive to class imbalance and feature space characteristics, indicating that conventional evaluation based on overall accuracy may obscure the inability to detect minority dissatisfaction in tourism datasets.

More importantly, the results reveal that customer evaluations in glamping are inherently multidimensional and emotionally complex. The difficulty in classifying neutral sentiment suggests that experiential reviews contain overlapping and coexisting affective responses, rather than discrete emotional categories. This challenges the assumptions underlying polarity-based sentiment models and highlights the need for more refined approaches capable of capturing emotional ambiguity in experiential tourism contexts.

5.2. Practical Implications

Sentiment analysis can be used as a diagnostic tool for customer experience management. SVM showed stronger performance in detecting dissatisfaction, correctly identifying 120 of 260 negative reviews, suggesting its usefulness in identifying service failures within predominantly positive review environments. Managers should focus on negative and neutral reviews, as these provide the most actionable insights. Rather than relying on aggregate ratings, sentiment analysis enables the identification of recurring issues such as cleanliness, noise, and service quality, supporting targeted improvements. At a broader level, both functional and emotional factors shape customer evaluations, requiring investments in facilities as well as experiential elements such as atmosphere and privacy.

In operational terms, the 46.2% negative-class recall achieved by SVM means approximately one in two genuinely negative reviews will still be misclassified as positive under the current configuration. Operators should therefore treat SVM-based classification as a triage tool that flags high-probability negative reviews for human follow-up, rather than as a replacement for manual monitoring—and should consider lowering the classification threshold to improve negative-class recall at the cost of more false positives, calibrated against their capacity for human review (King & Zeng, 2001).

6. Conclusions

This study examined customer sentiment in glamping tourism through large-scale machine-learning analysis of 3233 online reviews collected from ten prominent glamping sites in South Korea on the Naver Map platform. Sentiment labels were generated using VADER-based automated scoring, and six supervised classifiers—Naïve Bayes, k-Nearest Neighbors, Random Forest, Logistic Regression, Gradient Boosting, and Support Vector Machine—were evaluated through stratified ten-fold cross-validation using a comprehensive set of performance metrics.

The findings reveal that SVM achieved the strongest overall performance, recording the highest classification accuracy (CA = 0.915), F1-score (0.897), AUC (0.927), and Matthews Correlation Coefficient (MCC = 0.482). These results are particularly notable given the severe class imbalance in the dataset, where positive reviews constituted 88.9% of observations. SVM correctly identified 120 of 260 negative reviews (a recall rate of 46.2%) substantially outperforming models such as Naïve Bayes and Random Forest, which effectively defaulted to predicting all reviews as positive. Logistic Regression and Gradient Boosting demonstrated competitive performance, offering viable alternatives when interpretability or computational efficiency is prioritized.

These results carry important theoretical and practical implications. Theoretically, the study contributes to the growing literature on machine-learning applications in experiential tourism by demonstrating the differential sensitivity of classification algorithms to class imbalance and feature space characteristics. The systematic comparison of six diverse algorithmic paradigms provides empirical evidence that model selection significantly affects minority class detection in skewed review datasets, a finding with direct relevance to the design of automated review monitoring systems.

Practically, the study demonstrates that machine-learning-based sentiment analysis constitutes a scalable and cost-effective approach to transforming unstructured review narratives into structured managerial insights. Glamping operators can deploy SVM-based classifiers to continuously monitor customer sentiment across large volumes of online reviews, enabling timely identification of service failures, tracking of experiential satisfaction trends, and evidence-based decision-making in areas such as pricing, facility management, and staff training. The systematic detection of negative sentiment, which conventional manual review processes often underweight due to positive review dominance, provides operators with a particularly valuable diagnostic tool for proactive service improvement.

In conclusion, the integration of TF-IDF-based feature representation with SVM classification offers a robust and practically deployable methodology for sentiment analysis in glamping tourism. By transforming the qualitative richness of online guest narratives into quantifiable sentiment indicators, this approach bridges the gap between unstructured customer feedback and actionable business intelligence, supporting more data-driven management practices in the rapidly expanding South Korean glamping market. Despite its contributions, this study has several limitations that should be acknowledged. First, the analysis relied solely on reviews from Naver Map, which primarily reflects the opinions of Korean users. Future studies could compare multiple platforms such as Google Maps, TripAdvisor, or Booking.com to improve generalizability. Second, sentiment labels were generated using automated analysis tools rather than manual human coding. Although this approach ensures consistency and scalability, automated models may struggle to detect sarcasm, irony, or culturally specific expressions. Future research could incorporate human validation or more advanced language models to improve labeling accuracy. Third, the study evaluated six traditional machine-learning classifiers. More sophisticated deep learning approaches such as BERT or LSTM could be tested in future work to capture richer semantic patterns in review text. Fourth, the dataset exhibited a strong class imbalance toward positive reviews. Future research could implement SMOTE oversampling (Chawla et al., 2002), class-weighted regularization, or threshold-moving strategies to examine whether minority-class recall for negative and neutral reviews can be improved beyond the levels achieved by SVM in the current configuration. Finally, future studies could extend the analysis to examine specific aspect-level sentiments (e.g., cleanliness, facilities, scenery), enabling more granular insights into the determinants of glamping satisfaction.

Author Contributions

Conceptualization, B.A. and M.R.H.; data curation, V.R.R.; investigation, M.R.H.; methodology, R.S.B.; formal analysis, N.D.H.; writing—original draft preparation, B.A., M.R.H. and V.R.R.; writing—review and editing, B.A., M.R.H., V.R.R., R.S.B. and N.D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study is available on request from the corresponding author. The data is not publicly available due to privacy and confidentiality considerations.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Confusion Matrices for Six Sentiment Classification Models (Actual × Predicted).

Model	Actual Class	Pred. Neg.	Pred. Neu.	Pred. Pos.	Σ
Naïve Bayes	Negative	1	0	259	260
	Neutral	0	0	98	98
	Positive	2	0	2873	2875
kNN (k = 5)	Negative	43	3	214	260
	Neutral	3	1	94	98
	Positive	23	8	2844	2875
Random Forest	Negative	0	0	260	260
	Neutral	0	0	98	98
	Positive	0	0	2875	2875
Logistic Regression	Negative	49	0	211	260
	Neutral	0	0	98	98
	Positive	1	0	2874	2875
Gradient Boosting	Negative	64	1	195	260
	Neutral	1	1	96	98
	Positive	34	3	2838	2875
SVM	Negative	120	0	140	260
	Neutral	9	3	86	98
	Positive	37	3	2835	2875

References

Brochado, A., & Brochado, F. (2019). What makes a glamping experience great? Journal of Hospitality and Tourism Technology, 10(1), 15–27. [Google Scholar] [CrossRef]
Brochado, A., & Pereira, C. (2017). Comfortable experiences in nature accommodation: Perceived service quality in glamping. Journal of Outdoor Recreation and Tourism, 17, 77–83. [Google Scholar] [CrossRef]
Brooker, E., & Joppe, M. (2013). Trends in camping and outdoor hospitality-An international review. Journal of Outdoor Recreation and Tourism, 3–4, 1–6. [Google Scholar] [CrossRef]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. [Google Scholar] [CrossRef]
Craig, C. A. (2020). Camping, glamping, and coronavirus in the United States. Annals of Tourism Research, 89, 103071. [Google Scholar] [CrossRef]
Craig, C. A. (2025). Glamping: A review. Journal of Outdoor Recreation and Tourism, 49, 100858. [Google Scholar] [CrossRef]
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. [Google Scholar]
Dhaoui, C., Webster, C. M., & Tan, L. P. (2017). Social media sentiment analysis: Lexicon versus machine learning. Journal of Consumer Marketing, 34(6), 480–488. [Google Scholar] [CrossRef]
Filieri, R., Alguezaui, S., & McLeay, F. (2015). Why do travelers trust TripAdvisor? Antecedents of trust towards consumer-generated media and its influence on recommendation adoption and word of mouth. Tourism Management, 51, 174–185. [Google Scholar] [CrossRef]
He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. [Google Scholar] [CrossRef]
Henning-Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the internet? Journal of Interactive Marketing, 18(1), 38–52. [Google Scholar] [CrossRef]
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In C. Nédellec, & C. Rouveirol (Eds.), Machine learning: ECML-98. Lecture notes in computer science (Vol. 1398, pp. 137–142). Springer. [Google Scholar]
Kang, N., Feng, Y., & Lin, J. (2023). Why do Chinese glampers recommend it? The role of original ecology environment in a glamping experience. Journal of China Tourism Research, 20(4), 734–752. [Google Scholar] [CrossRef]
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137–163. [Google Scholar] [CrossRef]
Kirilenko, A. P., Stepchenkova, S. O., Kim, H., & Li, X. (2017). Automated sentiment analysis in tourism: Comparison of approaches. Journal of Travel Research, 57(8), 1012–1025. [Google Scholar] [CrossRef]
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. [Google Scholar] [CrossRef]
Lee, D. H. (2013, June 21). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. ICML 2013 Workshop: Challenges in Representation Learning (Vol. 3, pp. 1–6), Atlanta, GA, USA. [Google Scholar]
Lee, W. S., Lee, J. K., & Moon, J. (2019). Influential attributes for the selection of luxury camping: A mixed-logit method. Journal of Hospitality and Tourism Management, 40, 88–93. [Google Scholar] [CrossRef]
Litvin, S. W., Goldsmith, R. E., & Pan, B. (2007). Electronic word-of-mouth in hospitality and tourism management. Tourism Management, 29(3), 458–468. [Google Scholar] [CrossRef]
Luo, Y., & Zhong, S. (2015). Sentiment analysis in tourism: Capitalizing on big data. Journal of Travel Research, 54(4), 463–474. [Google Scholar] [CrossRef]
Marine-Roig, E., & Ferrer-Rosell, B. (2018). Measuring the gap between projected and perceived destination images of Catalonia using compositional analysis. Tourism Management, 68, 236–249. [Google Scholar] [CrossRef]
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. [Google Scholar] [CrossRef]
Mehraliyev, F., Chan, I. C. C., & Kirilenko, A. P. (2022). Sentiment analysis in hospitality and tourism: A thematic and methodological review. International Journal of Contemporary Hospitality Management, 34(1), 46–77. [Google Scholar] [CrossRef]
Mohammad, S. M., Salameh, M., & Kiritchenko, S. (2016). How translation alters sentiment. Journal of Artificial Intelligence Research, 55, 95–130. [Google Scholar] [CrossRef]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. [Google Scholar]
Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879–903. [Google Scholar] [CrossRef]
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. [Google Scholar] [CrossRef]
Sparks, B. A., & Browning, V. (2011). The impact of online reviews on hotel booking intentions and perception of trust. Tourism Management, 32(6), 1310–1323. [Google Scholar] [CrossRef]
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 111–133. [Google Scholar] [CrossRef]
van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised learning. Machine Learning, 109(2), 373–440. [Google Scholar] [CrossRef]
Xiang, Z., Du, Q., Ma, Y., & Fan, W. (2017). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management, 58, 51–65. [Google Scholar] [CrossRef]
Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room sales. International Journal of Hospitality Management, 28(1), 180–182. [Google Scholar] [CrossRef]
Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3), 2758–2765. [Google Scholar] [CrossRef]

Figure 1. ROC Curves for the Negative Sentiment Class.

Figure 2. ROC Curves for the Neutral Sentiment Class.

Figure 3. ROC Curves for the Positive Sentiment Class.

Table 1. Minority Class Recall by Model.

Model	Negative Recall (n = 260)	Neutral Recall (n = 98)	Positive Recall (n = 2875)
Naïve Bayes	0.4% (1/260)	0.0% (0/98)	99.9% (2873/2875)
kNN (k = 5)	16.5% (43/260)	1.0% (1/98)	98.9% (2844/2875)
Random Forest	0.0% (0/260)	0.0% (0/98)	100.0% (2875/2875)
Logistic Regression	18.8% (49/260)	0.0% (0/98)	99.9% (2874/2875)
Gradient Boosting	24.6% (64/260)	1.0% (1/98)	98.7% (2838/2875)
SVM	46.2% (120/260)	3.1% (3/98)	98.6% (2835/2875)

Table 2. Performance Comparison of Six Classification Models.

Model	CA	Precision	Recall	F1	AUC	MCC
Naïve Bayes	0.889	0.818	0.889	0.838	0.903	0.023
kNN (k = 5)	0.893	0.855	0.893	0.861	0.755	0.246
Random Forest	0.889	0.791	0.889	0.837	0.794	0.000
Logistic Regression	0.904	0.882	0.904	0.869	0.933	0.349
Gradient Boosting	0.898	0.865	0.898	0.870	0.866	0.310
SVM	0.915	0.897	0.915	0.897	0.927	0.482

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hasan, M.R.; Akter, B.; Rizaldin, V.R.; Handani, N.D.; Budiharseno, R.S. Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea. Tour. Hosp. 2026, 7, 124. https://doi.org/10.3390/tourhosp7050124

AMA Style

Hasan MR, Akter B, Rizaldin VR, Handani ND, Budiharseno RS. Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea. Tourism and Hospitality. 2026; 7(5):124. https://doi.org/10.3390/tourhosp7050124

Chicago/Turabian Style

Hasan, Md Rokibul, Bristy Akter, Valentierrano Rezka Rizaldin, Narariya Dita Handani, and Rianmahardhika Sahid Budiharseno. 2026. "Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea" Tourism and Hospitality 7, no. 5: 124. https://doi.org/10.3390/tourhosp7050124

APA Style

Hasan, M. R., Akter, B., Rizaldin, V. R., Handani, N. D., & Budiharseno, R. S. (2026). Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea. Tourism and Hospitality, 7(5), 124. https://doi.org/10.3390/tourhosp7050124

Article Menu

Machine Learning-Based Sentiment Analysis of Glamping Reviews in South Korea

Abstract

1. Introduction

2. Literature Review

2.1. Glamping Tourism as an Experiential Accommodation Form

2.2. Textual Reviews and Customer Evaluations in Tourism

2.3. Sentiment Analysis in Tourism and Hospitality Research

3. Methodology

3.1. Data Collection

3.2. Text Preprocessing and Sentiment Labeling

3.3. Machine Learning Modelling and Evaluation

4. Result

4.1. Research Descriptive Overview of the Sentiment Classification

4.2. Matrix Analysis

4.3. ROC Curve Analysis

4.4. Performance Matrix Evaluation

4.5. Model Selection

5. Discussion

5.1. Theoretical Implications

5.2. Practical Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI