Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin

Lu, Xu; Lu, Jingqun; Huang, Shan; Zhan, Mingsong

doi:10.3390/buildings16091695

Open AccessArticle

Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin

¹

School of Architecture and Urban Planning, Shenyang Jianzhu University, Shenyang 110168, China

²

School of Architecture and Civil Engineering, Shenyang University, Shenyang 110044, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(9), 1695; https://doi.org/10.3390/buildings16091695

Submission received: 23 March 2026 / Revised: 11 April 2026 / Accepted: 22 April 2026 / Published: 25 April 2026

(This article belongs to the Special Issue Urban Wellbeing: The Impact of Spatial Parameters—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In people-centered urban planning, enhancing the well-being of residents and tourists is one of the core objectives. Tourist emotion serves not only as a key indicator of the tourism experience but also indirectly reflects the quality of a city’s public spaces and built environment. In recent years, user-generated content has provided abundant data for understanding human emotional responses in urban environments, while deep learning models offer new technological pathways for extracting spatial–emotional associations from such data. However, existing research lacks a systematic evaluation of emotion analysis models from an urban spatial perspective and their application to uncover the relationship between emotional distribution and spatial characteristics in specific urban contexts. Based on a dataset of 9419 manually annotated travel reviews from Harbin, this study developed a multi-level evaluation framework and conducted a systematic comparison of seven emotion analysis models. This study then screened for the optimal model combinations based on two dimensions—spatial location and emotion polarity—to create a model matching matrix for mapping Harbin’s emotion map. Subsequently, a regression analysis was performed to examine the relationship between emotions and built environment elements. The results show that the ERNIE model demonstrated the best overall performance. Road density, green space density, and accommodation facility density were positively correlated with emotion, while POI diversity showed a negative correlation. This study demonstrates that emotion analysis technology can serve as a valuable analytical tool for identifying spatial patterns of sentiment, thereby offering empirical support for optimizing spatial design parameters and advancing a more people-centered approach to urban development.

Keywords:

urban well-being; emotion analysis; deep learning; user-generated content; spatial parameters

1. Introduction

Since the 20th century, China’s urbanization rate has been steadily increasing, rising from 18% to over 67% during 1978–2024 [1]. However, as globalization continues to spread, urban development has long been centered on construction while neglecting human emotions, severely impacting human well-being [2]. Therefore, accurately assessing the spatial mechanisms through which the urban environment influences human emotions is a core issue for sustainable urban development [3,4]. However, existing research predominantly evaluates urban well-being through spatial parameters of the physical environment, lacking in-depth exploration of people-centered public perception [5]. The deepening of the transition from the material dimension to the perceptual dimension is key to understanding the Man–Land Relationship, thereby enhancing human well-being [6]. Therefore, adopting “emotion” as a crucial lens for understanding human well-being, and emphasizing the exploration and generation mechanisms of public emotion, is key to comprehending and optimizing design decisions while enhancing the quality and image of cities.

Tourist emotion serves as an effective perspective for measuring human spatial well-being [7]. Its superiority stems from the unique interaction between tourists and urban space, primarily manifested in two aspects. First, the relatively dispersed distribution of tourist trajectories constitutes a systematic sampling of urban space [8], revealing patterns of human perception across different urban environments. Secondly, tourists are individuals who do not reside in the city long-term, and their emotional responses to urban spaces tend to be more immediate and intense, making them easier to perceive, collect, and analyze [9]. Although scholars have explored the mechanisms linking emotions and urban spaces from residents’ perspectives [5,10,11], research on mapping and interpreting urban emotion maps from tourists’ viewpoints remains relatively scarce [7]. To this end, this study adopts tourist emotion as its entry point to reveal the mechanisms through which urban spaces influence human well-being.

The acquisition of emotion data sources has undergone a transformation from traditional methods to online data. Traditional research methods mainly relied on questionnaires to construct subjective indicators such as cognition [12], or inferred public emotion by analyzing travelogs, interviews and other texts on tourism websites [13].

The proliferation of user-generated content (UGC) data necessitates more efficient methods for quantifying emotion due to its massive scale [7,14,15]. Early research primarily relied on lexicon-based methods (such as SentiWordNet and HowNet) and traditional machine learning approaches (such as SVM and TF-IDF), which exhibited limited generalization capabilities. The rise in deep learning technologies (such as CNNs and LSTMs) has enabled an end-to-end automatic learning of textual features, significantly enhancing models’ ability to understand deep semantic meanings. This has established them as the mainstream approach in emotion analysis. In recent years, pre-trained models such as BERT and ERNIE have achieved breakthrough progress in tasks like emotion analysis by acquiring powerful general semantic representation capabilities through self-supervised learning on ultra-large-scale corpora.

However, existing research primarily focuses on comparing model performance across general domains, with a lack of systematic evaluation and classification of different spatial features and emotion polarities. This limitation hinders the precise application and optimized selection of these advanced technologies in tourism emotion analysis practices.

This study aims to establish a deep learning model evaluation and selection framework for urban spatial emotion analysis. This framework will establish an optimized technical approach tailored to the characteristics of urban text data by systematically comparing differences in recognition performance, efficiency, and robustness across various architectural models. Using Harbin as a case study, this empirical research reveals the spatial distribution characteristics of its tourism emotion. It explores the potential connections between tourism emotion and urban space, providing data insights to enhance urban well-being.

Theoretically, this study establishes a model evaluation and selection methodology applicable to urban emotion assessment, bridging the methodological gap between existing spatial parameter research and big data emotion analysis. In practice, research findings can provide data support for urban design and planning, helping to identify key spatial elements that influence emotional well-being. This offers a scientific basis for optimizing public spaces, enhancing the allocation of service facilities, and improving the urban living environment, thereby promoting the realization of urban well-being among city users. This study demonstrates that emotion analysis technology can serve as an effective tool for bridging spatial design decisions with human subjective well-being, thereby advancing people-centered urban development. This study aims to address the following key issues:

How can we evaluate and select multiple deep learning models to establish a novel emotion analysis method suitable for urban well-being assessment?
What are the spatial differentiation patterns of urban emotion maps and the underlying mechanisms linking them to the built environment?

2. Literature Review

2.1. Urban Well-Being

Human well-being (HWB) remains a vague concept without a universally accepted definition and is often subject to differing interpretations [16]. The World Health Organization defines health as “Individuals or groups must be able to fulfill their aspirations, meet their needs, and adapt to or cope with their environment in order to achieve a state of complete physical, mental, and social complete health [17].” Human well-being is clearly closely linked to the environment, especially in urban areas [18].

The impact of urban environmental factors on well-being has been extensively demonstrated across multiple dimensions. Research indicates that urban environmental elements are key components of natural infrastructure for enhancing residents’ well-being [19]. These mechanisms of action extend beyond ecological regulation. They also have a profound impact by promoting mental health—such as stress relief and attention restoration—and social cohesion [20]. Therefore, as urban density increases and urban development continues, it is crucial to study the role of urban environmental factors in promoting urban well-being [18].

The assessment of urban well-being has evolved into a comprehensive framework encompassing multiple dimensions. Current international research generally examines four interconnected dimensions. Physical health focuses on examining the relationship between green space exposure and physical activity, chronic disease risk, and all-cause mortality [21]. Mental health represents the fastest-growing research area in recent years, primarily grounded in restorative environmental theory to explore how green spaces improve stress levels, emotional well-being, and cognitive function [22]. Social well-being examines how public spaces foster social interaction, enhance community belonging, and strengthen social capital [23]. Environmental well-being situates human health within the framework of ecosystem services, emphasizing the foundational role of supply services like clean air, comfortable climate, and biodiversity in supporting residents’ quality of life [24]. These dimensions collectively form a holistic perspective for understanding “urban well-being.” Therefore, within this framework and from a mental health perspective, this study systematically evaluates the distribution of emotional spaces in cities and their relationship with the built environment, providing empirical evidence for understanding urban well-being.

In summary, while urban well-being research traditionally focuses on residents, transient populations such as tourists are equally important yet underexplored. Tourists offer more objective environmental perceptions, unclouded by long-term habituation. However, psychological and emotional studies from their perspective remain scarce. This study addresses this gap by adopting tourist emotions as a measurable well-being indicator. Developing a widely quantifiable “emotional mapping” technology that integrates spatial characteristics and behavioral trajectories enables more vivid, real-time assessments of how urban environments impact the well-being of diverse populations. This approach provides intelligent insights for designing and governing truly people-centered cities.

2.2. Emotion Map

Emotion is a vital product of human interaction with the built environment, profoundly shaping how we perceive, experience, and evaluate places [25]. Scholars widely recognize that emotions constitute an indispensable component in the evaluation of urban spaces, reflecting people’s direct responses to these environments [26]. The emotion map serves as a vital tool for exploring the relationship between humans and the urban environment, offering a visual representation of the distribution of emotions across urban spaces [27].

In the tourism sector, emotional imagery serves as a key factor influencing destination satisfaction and the willingness to revisit [28]. Emotion, as the primary driver of tourism consumption, provides a crucial link for exploring models for building tourism-friendly cities. Several researchers have interpreted the mechanisms linking emotions and urban spaces from the residents’ perspective [5,10,11,29]. In contrast, research on mapping and interpreting urban emotion maps from the tourist’s perspective remains relatively limited. Therefore, constructing affective maps from the tourist’s perspective serves as a crucial complement to urban well-being research.

Research on urban emotion maps has gradually achieved breakthrough progress. The earliest emotion analysis models originated from user review analysis on e-commerce websites and have since evolved into a vast and diverse array of models [30,31]. With the widespread use of UGC data and the proliferation of deep learning technologies, a multitude of emotion analysis models have emerged. Commonly used models for emotion analysis include RNN, CNN, LSTM, BERT, Inception-v4, DeepSentiBank, and SnowNLP [7,14,32,33,34]. For example, in the early stages of urban emotion research, Resch, B. et al. introduced TwEmLab, an interdisciplinary approach for extracting citizens’ emotions in different locations within a city [35]. Following the widespread adoption of pre-trained models, Tas, D. et al. utilized a fine-tuned BERT model to perform aspect-based sentiment analysis on geo-located crowd-sourced urban evaluations, enabling the identification and spatial visualization of positive and negative urban aspects at the city scale [36]. Zhang B et al. used the Shanghai COVID-19 outbreak as a case study to pioneer the large-scale application of ERNIE in emotion monitoring on Weibo during emergency situations in megacities, validating ERNIE’s superior semantic understanding capabilities in specialized domains [37].

However, in the field spatial analysis, no systematic research consensus has yet been established regarding which emotion analysis model should be selected for different spatial types. To this end, this study comprehensively compares seven representative emotion analysis models and systematically evaluates their performance and variations across different urban spatial contexts. It further explores a multi-scenario, highly robust urban emotion analysis framework.

2.3. Method Evolution

This study selected seven representative models across three categories—SnowNLP from traditional machine learning tools, LSTM, BiLSTM, CNN, and RNN from classical deep learning models, and BERT, RoBERTa, and ERNIE from pre-trained models—to conduct a systematic performance comparison (Table 1).

The core paradigm of traditional machine learning methods is “feature engineering + classification algorithms.” First, unstructured text is converted into structured feature vectors through manual design or statistical methods. Common text representation methods include Bag-of-Words, TF-IDF, and N-gram models [38]. Subsequently, these feature vectors are fed into classical classification algorithms for training and prediction. The most commonly used ones include NaiveBayes, SVM, and MaximumEntropy [39]. Building on this foundation, the growing demand for emotion analysis applications has led to the emergence of a series of specialized, packaged emotion analysis tools or libraries. Represented by the SnowNLP library, which is widely used in Chinese language processing, it is based on traditional machine learning algorithms such as Naive Bayes and has been pre-trained on a specific dataset primarily consisting of shopping reviews [40].

Deep learning models rely on their specific neural network architecture theories to achieve breakthroughs in emotion analysis. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) provide powerful end-to-end learning paradigms for emotion analysis from the theoretical perspectives of “local feature combination” and “sequence dependency modeling,” respectively, replacing the feature engineering dependent on manual design in traditional methods [41,42]. Among these, BiLSTM (Bidirectional Long Short-Term Memory Network) is a standard and important variant of the basic LSTM architecture, addressing long-range dependencies in recurrent neural networks through gating mechanisms. Its bidirectional nature enables the model to encode a word’s context simultaneously from both forward and backward directions, thereby capturing richer sequence information than unidirectional LSTMs. This makes it the premier sequence modeling approach in the era of deep learning [43].

Pre-trained models such as BERT, RoBERTa, and ERNIE mark a theoretical paradigm shift from “task-specific modeling” to “general representation fine-tuning.” Their core theoretical innovation lies in the self-attention mechanism within the Transformer architecture. This mechanism enables the model to dynamically assess the importance of all words in the input sequence while processing each word, thereby modeling contextual relationships globally and in parallel. Theoretically, this approach outperforms RNNs’ sequential processing [44]. Self-supervised pre-training tasks such as Masked Language Modeling (MLM) enable models to learn universal linguistic patterns and world knowledge from vast amounts of unlabeled data, forming deep context-aware word vector representations.

3. Method

3.1. Study Area and Data Preparation

3.1.1. Study Area

The subject of this study is Harbin City in Heilongjiang Province, located in northeastern China, which possesses unique climatic characteristics and abundant tourism resources. In recent years, Harbin’s tourism industry has steadily increased its influence. During the 2024–2025 winter season, Harbin City welcomed a cumulative total of 90.357 million tourists, representing a year-on-year increase of 9.7%. Inbound tourist arrivals increased by 94.2% year-on-year, with total tourism revenue showing significant growth [45]. Moreover, Northeast China is renowned for the warmth, humor, and hospitality of its dialects. Therefore, when evaluating emotion in travel reviews, this study finds that the dialect-specific emotional characteristics become one of the focal points of this study. The study area encompasses the inner ring of Harbin, covering five administrative districts: Daoli District, Daowai District, Nangang District, Songbei District, and Xiangfang District (Figure 1).

3.1.2. Data Acquisition and Preprocessing

This study primarily uses reviews from UGC as its data source. MaFengWo and Ctrip are China’s two major travel social networking sites, allowing users to share reviews about travel experiences and locations. Using the Skieer 8 software, this study collected and filtered 225 attraction names and geographic coordinates in Harbin, along with 49,010 valid reviews, from the Ctrip travel platform (Table 2).

3.1.3. Data Annotation

To ensure the rigor and feasibility of the research, all reviews were stratified and sampled based on their ratings and publication dates. A total of 20% of the reviews were manually annotated for deep learning, while the remaining 80% were used for large-scale predictions after training. Then, the manually annotated data was stratified and sampled at an 8:1:1 ratio to form a training set of 7842 entries, a validation set of 980 entries, and a test set of 980 entries. The training set is used for the initial training of the emotion analysis deep learning model. The validation set is used for hyperparameter tuning and early stopping during model training to prevent overfitting. The test set is used for the final, fair performance evaluation of all models.

This study employed a manual annotation strategy for data preprocessing. This study recruited three college students as annotators to perform independent manual annotation. This study provided an annotation guide, which included an introduction to the annotation task, an introduction to the emotion polarity, the meanings of robustness tag, and operation methods. The annotators were first trained. The researchers explained the annotation guide to the annotators. Discussions were held with the annotators to solve their queries. Then, 50 randomly selected reviews from the corpus were pre-annotated, and the annotation results were discussed until a consensus understanding was achieved [46].

Every annotator applied two labels per review:

Emotion polarity: Negative/neutral/positive;
Robustness tag: Whether the text contains noise or tourism domain language.

Each post was annotated by the three annotators. The labeling process took one week. After annotation, the results were collected, organized, and manually checked for omissions and errors, which were then corrected by the annotators. The complete annotation results were ultimately obtained.

3.2. Emotional Assessment System

3.2.1. Construction Evaluation System

This study selected multiple indicators from four aspects—overall performance, category-level performance, robustness, and efficiency—to evaluate model performance (Table 3).

Overall performance indicates the fundamental accuracy of a model’s emotion classification and is central to its usability. Accuracy, as the most intuitive macro-level metric, refers to the proportion of samples that the model predicts correctly overall. It reflects comprehensive classification capabilities and provides a preliminary benchmark for evaluating model performance.

Given that emotion data often exhibits an imbalanced distribution where positive instances outnumber negative ones, this study introduced a category-level performance evaluation dimension and employed the Macro-F1 score as the core metric. This dimension is evaluated using two fundamental metrics: precision and recall. Precision refers to the proportion of samples predicted by the model to belong to a certain class that actually belong to that class, measuring the reliability of the model’s prediction results. Recall refers to the proportion of samples correctly identified by a model within a given class relative to the total number of genuine samples in that class, measuring the model’s coverage capability for a specific category. Macro-F1 is calculated by first computing the F1 score (the harmonic mean of precision and recall) for each emotion category, then taking its unweighted arithmetic mean. This metric ensures equal weighting for all key categories in the evaluation, preventing the neglect of critical minority groups and thereby providing a more comprehensive assessment of the model’s classification performance.

Robustness is evaluated across two dimensions, noise tolerance and domain adaptation, respectively assessing the model’s stable performance in non-standard text and specialized contexts. Noise-interference robustness (NIR) aims to evaluate a model’s stable performance when confronted with NIR conditions. This study selected 666 reviews from the dataset that contained typos, dialectal expressions, internet slang, and other forms of noise to construct a noise-interference subset. Domain adaptation robustness (DAR) aims to evaluate a model’s ability to understand specialized knowledge and specific contexts within the tourism domain. This study selected 2553 reviews containing expressions specific to the tourism domain from the dataset to construct a domain adaptation subset. The core screening criteria for this subset include tourism-specific terms such as attractions, cuisine, and activities, along with specific expressions found in travel reviews. This study computed the accuracy scores of each model on the subsets and compared them with the accuracy on the standard test set, and calculated the performance retention rate; a higher retention rate indicates stronger robustness against noise.

The efficiency of a model determines the speed of its research iterations and the feasibility of its practical deployment. Specific metrics include Model Parameter Count (MPC), Average Training Time Per Epoch (ATTPE), and Convergence Epochs (CE). The MPC reflects both the model’s complexity and its storage requirements. The ATTPE measures the computational cost required to progress from the initial dataset to the completion of model fitting. CE records the number of training cycles required for the model to achieve optimal performance on the validation set, directly reflecting its speed of learning from data and reaching stability. All efficiency metrics were measured under a standardized hardware and software testing environment to ensure fair comparisons. The comprehensive dimension of “efficiency” aims to provide a critical basis for “performance–cost” trade-offs in model selection for both research and practical applications.

To comprehensively and fairly evaluate the overall capabilities of emotion analysis models across different technical approaches, this study constructed a multidimensional evaluation system based on the Analytic Hierarchy Process (AHP) (Table 4). The system encompasses overall performance, category-level performance, efficiency, and robustness. A judgment matrix was constructed by comparing each of the four primary indicators at the indicator level with the target level. Then, using the sum-product method, a column-wise normalized matrix was formed to derive the weight values for each indicator.

3.2.2. Model Training

The seven representative models were trained, validated, and tested separately. Based on the evaluation framework, their overall performance, category-level performance, noise robustness, and efficiency metrics were determined.

To ensure the fairness and scientific rigor of model evaluation, this study implemented strict control variable principles in its experimental design.

(1): The training, validation, and testing of all deep learning models were conducted in a unified and controlled experimental environment. The code for this study was implemented using Python 3.10. The deep learning framework employed was PyTorch 2.4 (CUDA 12.1). All computational tasks were performed on a single GPU server equipped with an NVIDIA GeForce RTX 3060 8 GB VRAM (NVIDIA Corporation, Santa Clara, CA, USA) to ensure computational efficiency for model training, particularly for fine-tuning large-scale pre-trained models.
(2): During model training, hyperparameters were tuned separately for each model based on validation set performance. For pre-trained models (BERT, RoBERTa, and ERNIE), the AdamW optimizer was used with a weight decay of 0.01. The initial learning rate was searched from {2 × 10⁻⁵, 3 × 10⁻⁵, 5 × 10⁻⁵}. The optimal learning rates were 2 × 10⁻⁵ for BERT and RoBERTa, and 3 × 10⁻⁵ for ERNIE. For classical deep learning models (BiLSTM, LSTM, and RCNN), the Adam optimizer was used with an initial learning rate searched from {1 × 10⁻⁴, 5 × 10⁻⁴, 1 × 10⁻³}, and the optimal value was 1 × 10⁻³ for all three models. The optimal hyperparameters were selected based on the highest Macro-F1 score on the validation set.
(3): All models utilized the same manually annotated training set, validation set, and test set, ensuring complete consistency between the learning materials and evaluation benchmarks. Additionally, during fine-tuning and training, all models were configured with the same random seed (seed = 42) and training epochs (epoch = 20).
(4): All models were trained using an early stopping strategy with a fixed patience threshold of five epochs. This means that the model was considered converged when the validation set loss fails to decrease for five consecutive training epochs. The training is automatically terminated and the model reverts to the parameters that produced the lowest validation loss.

Through the aforementioned rigorous experimental controls, this study systematically compared the inherent capability differences among various model architectures under identical tasks and datasets. The results serve as a reliable basis for model selection.

3.3. Relationship Between the Built Environment and Emotions

This study examined the mechanisms through which various elements of the built environment influence tourists’ emotions focusing on the inner urban area within Harbin’s Third Ring Road. Through spatial unit delineation and indicator quantification, it conducted a correlation analysis between emotional value and built environment factors [47]. Since attractions within Harbin’s Third Ring Road are distributed relatively uniformly, this allows for the observation of emotional heterogeneity resulting from different built environments. Therefore, this study used urban roads as boundaries to divide the area within the Third Ring Road into 167 study units of similar size. This approach enables a more detailed analysis of the mechanisms linking tourist emotions to the built environment (Figure 2).

Finally, the mean emotion scores and built environment indicators for each study unit were calculated separately, and a correlation analysis was conducted. The following six built environment indicators were selected: POI diversity, road density, accommodation facility density, green space density, building density, and volume ratio. This study used SPSS 31.0.2.0 software to conduct a Pearson correlation analysis to preliminarily assess the degree of linear association between various built environment factors and tourist emotions. Additionally, a linear trend line was added to the scatter plot to help identify potential patterns of relationships between variables.

4. Results

4.1. Model Performance Evaluation

4.1.1. Comparison of Multidimensional Performance Metrics

Overall performance analysis indicated that all deep learning models outperformed the traditional tool SnowNLP in accuracy, demonstrating varying degrees of performance improvement. Pre-trained models based on the Transformer architecture demonstrated significantly higher accuracy than other deep learning models, with RoBERTa achieving the highest score of 0.9173, while both BERT and ERNIE exceeded 0.9. Classical deep learning models of deep learning ranked second, with BiLSTM achieving the best performance at 0.8888.

In category-level performance analysis, models exhibited significant performance differences when confronting the challenge of imbalanced data. Using Macro-F1 as the key metric, this study found that R-CNN and LSTM exhibited significant positive class bias, with Macro-F1 barely exceeding 0.3 and even falling below SnowNLP’s performance. This indicated that such models tend to learn patterns from the majority class in the data while struggling to effectively identify minority class samples. By incorporating bidirectional context information, the BiLSTM model achieved an improved metric of 0.6587 and delivered effective F1 scores across all minority categories, mitigating the bias inherent in unidirectional models. In comparison, pre-trained models demonstrated the strongest fine-grained discrimination capabilities, with both ERNIE and RoBERTa achieving Macro-F1 scores exceeding 0.75. This confirmed that the robust semantic prior knowledge acquired through extensive pre-training effectively addresses imbalanced data.

Robustness analysis examined model stability across two dimensions, noise-interference and domain adaptation, revealing that pre-trained models still performed best while traditional machine learning approaches yielded the poorest results. ERNIE demonstrated superior performance in noise-interference robustness, achieving a score of 0.9333. BERT and RoBERTa demonstrated superior DAR, achieving a score of 0.9228. This indicated that pre-trained models demonstrated the most stable and reliable understanding of non-standard text and domain-specific contexts. The robustness of the classical deep learning models remained above 80%, with BiLSTM achieving the best performance and a robustness score of 0.8799.

The efficiency comparison results differed from the robustness results, with classical deep learning models outperforming pre-trained models. LSTM achieved the highest efficiency score with the fewest parameters and the fastest convergence speed. Pre-trained models generally scored low on efficiency metrics due to their massive parameter counts and extended ATTPE. These results demonstrated the efficiency advantages of lightweight models versus the computational costs incurred by large models to achieve high performance.

Based on the comprehensive evaluation results (Figure 3), model selection should be determined by balancing the core requirements of the specific application scenario. When pursuing optimal overall accuracy, fine-grained discrimination capability, and robustness, and with sufficient computational resources available, pre-trained models such as ERNIE should be prioritized. In scenarios with severely constrained resources or where only coarse-grained trend analysis is required, the BiLSTM—a lightweight alternative offering high efficiency and guaranteed accuracy—can be considered. For tasks involving the precise identification of minority emotions, the Macro-F1 metric must be prioritized, and models such as R-CNN or basic LSTM should be avoided. Additionally, SnowNLP can serve as a rapid baseline tool for initial exploration. Overall, for in-depth research and practice in tourist emotion analysis, pre-trained models are the recommended optimal technical approach.

4.1.2. Comparative Model Assessment Based on the Evaluation Framework

For each model, the standardized scores across the four primary metrics—overall performance, category-level performance, efficiency, and robustness—are multiplied by the respective weights determined via the aforementioned AHP method. The four weighted scores are then summed to yield the model’s comprehensive emotion evaluation score. The formula can be expressed as

Comprehensive Score = \sum_{i = 1}^{4} ({I n d i c a t o r}_{i} S c o r e \times {I n d i c a t o r}_{i} W e i g h t)

This study performed the aforementioned calculations on seven models (Table 5). The results showed that the pre-trained model group significantly outperformed others, with ERNIE demonstrating the best overall performance. Although the BiLSTM model was not pre-trained, its bidirectional structure demonstrated relative advantages in emotion classification, with overall performance second only to the pre-trained models. The composite scores of other classical deep learning models and traditional machine learning models ranged between 0.51 and 0.58, revealing a significant gap.

4.2. Presentation of Prediction Results

4.2.1. Quantitative Distribution of Predicted Emotion

This study conducted an empirical emotion analysis of tourist reviews in Harbin using each of the trained models (Table 6). The emotion categories of the prediction results were as follows: among the predictions from the pre-trained models and BiLSTM, the ratio of the three emotion categories was approximately 10:1:1, which closely matched the ratio observed in the manually annotated data from stratified sampling. This indicated that these models maintained strong generalization capabilities in real-world scenarios, effectively identifying the distribution of emotion polarity in tourist reviews. However, the prediction results of the LSTM and RCNN models exhibited severe imbalance, categorizing nearly all reviews as positive emotion and demonstrating a pronounced tendency toward excessive optimism. Meanwhile, the traditional machine learning tool SnowNLP showed insufficient capability in identifying positive and neutral emotion, with its predicted proportion of negative emotions skewed significantly upward.

To further investigate the quantitative distribution characteristics of different models in emotion prediction, this study plotted the relationship between the number of reviews and emotion scores for each attraction across the seven models (Figure 4). Among these, the plots generated by BERT, RoBERTa, ERNIE, and BiLSTM exhibited nearly identical distribution patterns. Overall, most scenic spots were clustered in areas identified as low-tourist but high-emotion, suggesting that while these sites may lack popularity, they have earned favorable tourist evaluations. In contrast, a limited number of scenic spots achieved both high visibility and strong positive emotion. For example, Harbin Polar Park and the Songhua River Sightseeing Cableway maintained an average emotion score above 0.9 despite receiving nearly 3000 reviews, indicating high levels of tourist satisfaction and sustained attention. Notably, negative emotion predominantly appeared in scenic spots with few reviews. For instance, niche attractions such as Water Pavilion Cloud Sky and Bella Castle Kingdom each received approximately 10 reviews, with average emotion scores as low as −0.4. This suggests that a small review base makes the overall emotion mean more susceptible to individual negative evaluations, rendering niche sites with inadequate services or experiences more vulnerable to negative feedback.

4.2.2. Generation of Emotion Maps

This study spatially visualized emotion scores for Harbin City based on the geographic locations of reviews about scenic spots in ArcGIS 10.8. Since the data distribution is predominantly concentrated in urban built-up areas, the analysis results presented here are based solely on 33,868 scenic spot reviews within Harbin’s Third Ring Road. This study performed kernel density analysis on positive, neutral, and negative emotion points to obtain spatial distribution maps of each emotion category predicted by the respective models. To make the spatial distribution of emotion more intuitive, this study further performed weighted spatial interpolation on emotion values based on the proportion of emotion categories. After converting emotions into numerical values, this study obtained a comprehensive emotional spatial distribution map predicted by various models (Table 7).

The spatial distribution of affect prediction results across different models exhibited significant variations, directly impacting the reliability of spatial analysis. Among these, the prediction distributions of BERT, RoBERTa, ERNIE and BiLSTM were well-grounded and highly consistent, mutually corroborating each other. They exhibited high reliability and stability, making them suitable for in-depth spatial analysis. In contrast, LSTM and RCNN models suffered from severe class imbalance, causing their predictions to be overly biased toward positive emotion. This resulted in emotion distribution maps that merely reflect the spatial density of review points, failing to effectively reveal emotion differences. Additionally, the predictions from SnowNLP, a traditional machine learning tool, closely resembled those of pre-trained models when examining emotion distribution maps across various categories.

4.3. Integration of Optimal Models and Generation of a Composite Emotion Map

4.3.1. Comparative Model Performance Across Emotion Polarities

To investigate the performance differences among the seven emotional models across the three emotion polarities, this study conducted a more detailed statistical comparison of the models’ performance under different polarities. Overall, all models performed exceptionally well in recognizing positive emotion, with RoBERTa ranking first with an F1 score of 0.9626. In terms of negative emotion detection, there was significant variation in performance among the models; RoBERTa remained the leader, achieving an F1 score of 0.8118, demonstrating strong capability in capturing negative emotion. In contrast, most models struggled to identify neutral emotion, with only ERNIE achieving an F1 score of 0.5357. All other models scored below 0.5, indicating that neutral emotion—due to its ambiguity and context dependence—presented a common challenge in emotional classification. In addition, while LSTM and R-CNN performed well in predicting positive emotion, their F1 scores for neutral and negative emotion were both close to zero. This indicated that such models exhibited significant class prediction bias when dealing with imbalanced data—the models tended to predict all samples as belonging to the more prevalent positive class, causing the recognition mechanism for the minority class to fail almost entirely. This phenomenon further supports the conclusions drawn earlier regarding the model’s performance on imbalanced datasets (Figure 5).

4.3.2. Evaluation of Model Accuracy Across Spatial Rings

Given the performance differences among the seven emotion models across spatial layers, this study used a comparative analysis to identify the distinctive strengths of each model in emotion recognition at different spatial locations. The study area was divided into three concentric zones—Zone 1, Zone 2, and Zone 3—and the prediction accuracy of each model within each zone was calculated. Overall, the accuracy rates of the various models showed a pattern of increasing from the innermost to the outermost rings: accuracy is generally lower within the first ring, improves significantly in the second ring, and reaches its peak in the third ring. This phenomenon may stem from the high degree of functional mix in the city center, the diverse composition of visitors, and the wide range of emotional expressions, all of which place higher demands on the model’s ability to distinguish between them; in contrast, the outer rings are primarily focused on scenic area functions, with relatively clear emotional tendencies, making it easier for the model to make accurate judgments. In the core area of the First Ring Road, BERT led by a wide margin with an accuracy rate of 0.9091. In the second and third ring zones, the advantages of pre-trained models are particularly evident, with accuracy rates reaching approximately 0.9 in both cases. This pattern of spatial differentiation suggests that, in practical applications, an appropriate emotion analysis model should be selected based on the urban functional characteristics of the target area (Figure 6).

4.3.3. Generation of the Composite Emotion Map

The system evaluated the model’s applicability based on two dimensions: emotional polarity and spatial location. Finally, by combining the evaluation results from the two dimensions described above, we constructed a model matching matrix and listed the three models with the best performance for each specific combination of conditions in each cell. This matrix provided a visual representation of the performance advantages and applicability of different models under the dual constraints of “emotion and spatial context.” Based on the matrix, the optimal model prediction for each cell is selected to generate an integrated emotion map (Figure 7).

Analysis clearly showed that all emotion clusters were located on the northern side, with positive and negative emotion zones alternating east–west across the Sun Island area along the Songhua River. Positive emotion was highly concentrated in two core areas: one was the natural leisure hub represented by Sun Island Scenic Area; the other was the central historical and cultural core comprising St. Sophia Cathedral, Central Avenue, the Flood Control Monument, and numerous museums and art galleries. Negative emotions were scattered across multiple peripheral areas: first, the modern commercial entertainment zone on the west side, centered around Harbin Sunac Resort and Harbin Snow Wonderland; second, the extended experience zone surrounding Sun Island Scenic Area, featuring the Russian-style town, Snow Expo, and scenic cable cars; third, the Dragon Tower urban landmark district on the east side. Additionally, two weaker negative clusters existed on either side of the core area: one centered around Gogol Street and Shanhaiguan Street, which are distinctive commercial districts, and the other centered around the Northeast Tiger Forest Park as a themed ecological attraction. This “core–periphery, east–west alternation” distribution pattern clearly revealed the intrinsic connection between visitors’ emotional responses and the type, location, and experiential quality of tourism resources.

4.4. Correlation Mechanism Between Emotion and Built Environment Factors

Finally, this study further explored the linear relationship between tourists’ emotions and various elements of the built environment. Pearson correlation analysis showed that emotional value was positively correlated with road density and green space density, with Pearson coefficients of 0.210 and 0.236, respectively. This suggests that tourists’ emotions may have a positive correlation with accessibility and green open spaces. There is a weak positive correlation between tourist emotion and the density of accommodation facilities. This suggests that an increase in the number of accommodation facilities had a limited effect on enhancing tourist emotion; tourists may focus more on service quality and value for money rather than simply on density. There is a weak negative correlation between tourist emotion and POI diversity. This suggests that areas with excessive functional mixing or a high degree of commercialization may have a slightly negative impact on the tourist experience due to overcrowding, noise, or homogenized competition. However, the correlations between building density, floor area ratio, and emotion scores were not significant, with Pearson coefficients below 0.01. This suggests that these physical indicators of the built environment might influence tourists’ perceptions primarily through indirect or nonlinear pathways, rather than by directly affecting their immediate emotions (Figure 8).

5. Discussion and Conclusions

5.1. Discussion

This study established a multidimensional evaluation framework that encompasses overall performance, category-level performance, robustness, and efficiency. It systematically compared the performance of seven technical approaches—ranging from traditional machine learning to pre-trained models—in the task of emotion analysis of city tourism reviews, and conducted an empirical analysis and visualization of the city’s emotion map based on the emotion prediction results. The study further analyzed the mechanisms linking built environment factors to emotions. The research confirmed the following.

First, the ERNIE model, leveraging its knowledge-enhanced pre-training strategy, demonstrated the best performance in tasks involving noise interference and domain adaptation, achieving an overall score of 0.7612, making it suitable for city emotion mapping in the tourism sector. This finding is consistent with the conclusions of Zhang B et al., who found that ERNIE performed best on Chinese emotion analysis tasks [37]. However, the findings of this study differ from those of Rehman A U et al. [48]. The latter argued that the CNN-RNN hybrid architecture performs strongly in text emotion classification, whereas this study found that traditional deep learning models such as R-CNN perform poorly in tourism emotion analysis. The primary reason for this discrepancy lies in the differences in the characteristics of the data sources. The CNN-RNN hybrid architecture performs exceptionally well on general-purpose text datasets with relatively balanced distributions, whereas travel review datasets exhibit typical characteristics such as significant class imbalance, colloquial expressions, and a mix of dialects and internet slang. These characteristics of travel review data place higher demands on the model’s robustness and its ability to learn from a small number of samples. Therefore, the methodological framework of this study is better suited to emotion analysis tasks in the specific field of travel reviews, enabling a more accurate capture of the emotional characteristics of urban tourism spaces.

Second, Harbin’s emotional map reveals a distinct spatial differentiation pattern characterized by a “core–periphery, east–west alternation” structure: positive emotions are highly concentrated in the Sun Island Natural Recreation Core and the Central Street historical and cultural core along the Songhua River, while negative emotions are scattered across the modern commercial and entertainment districts to the east and west, the themed park extension area, and the vicinity of urban landmarks. This finding is consistent with the results of studies on the perception of urban spatial emotions by Huang Shan et al. [29], as well as the conclusions of Wang Meng’s study on the emotions of tourists in historic and cultural districts. The accurate identification of the aforementioned characteristics of emotion space differentiation confirms that the emotion analysis model developed in this paper is highly suited to tourism review data.

Third, this study confirmed that emotion scores are significantly positively correlated with road density, park density, and accommodation facility density, and weakly negatively correlated with POI diversity. In contrast, neither building density nor floor area ratio shows a significant correlation with emotion scores. The findings of this study are consistent with those of Mouratidis K regarding residents’ emotions, both indicating that access to green spaces is consistently linked to higher levels of subjective well-being among urban residents [49]. However, different from the conclusion of this study that building density has no significant effect on visitors’ emotions, Frey, V.N. found that residents in high residential density areas were significantly more likely to suffer from poor mental health [50]. This difference stems primarily from variations in the study subjects and the measurement scales used. Residents’ perceptions of building density are more closely tied to the convenience of daily life, whereas tourists place greater emphasis on the cultural atmosphere and visual experience of a space. Consequently, the mechanisms through which building density influences the emotions of these two groups differ. Furthermore, a study by LR Larson et al. found that park quantity and quality were positively associated with well-being [51]. The study of Sahar Samavati et al. also demonstrated that the higher people’s satisfaction with green/natural visibility and traffic connectivity is, the stronger their happiness is, and the closer they are to POI-intensive places such as shops and the city center, the less happy they are [52]. All of these support our finding that park density and road density were positively correlated with emotion scores while POI diversity was negatively correlated with it. However, research by Bina Ram et al. suggested that park accessibility and transportation convenience have no overall impact on mental health and well-being. This indicates that changes to the built environment alone are insufficient to improve mental health and well-being [53].

Finally, this study provides a decision-making basis for planning management. Positive sentiments are concentrated in natural recreation cores and historical–cultural hubs. Planning efforts should prioritize the maintenance of high-quality facilities, effective crowd management, and the preservation of authentic cultural atmospheres in these areas to sustain visitor satisfaction. Negative sentiments are scattered around peripheral commercial entertainment zones and iconic landmarks. Recommendations include improving price transparency, reducing queuing times, enhancing wayfinding systems, and diversifying on-site experiences to mitigate visitor disappointment. Furthermore, the multi-model evaluation framework developed in this study can be embedded into a real-time or near-real-time dashboard. By periodically collecting and analyzing visitor reviews, destination managers can identify emerging negative hotspots and evaluate the effectiveness of implemented interventions, enabling a shift from reactive problem-solving to proactive, evidence-based planning.

5.2. Conclusions

This study systematically compares the overall performance of various emotion analysis models on tourism review data by constructing a multi-level, multidimensional model evaluation framework. The main findings are as follows:

(1): Comparison of Multi-Model Performance

The seven emotion analysis models show significant differences in performance, with ERNIE demonstrating the best overall performance and excelling across all evaluation metrics. However, each model exhibits distinct performance characteristics: RoBERTa and BERT demonstrate a clear advantage in negative emotion recognition, capturing negative emotional expressions with greater accuracy; SnowNLP plays a significant complementary role in evaluating positive emotion in city centers and is particularly sensitive to positive emotional responses in areas such as cultural districts; RoBERTa performs more accurately in emotion analysis in outlying areas and can effectively identify distinct emotional patterns in entertainment districts. This suggests that a single model is unlikely to fully address the complex task of evaluating the urban emotion map. To achieve a high-precision depiction of urban emotion spaces, it is necessary to develop a multi-model collaborative strategy that accounts for the differences in spatial location and emotion polarity.

(2): Distinctive Features of Emotion Maps

The study found that the predictions from pre-trained models such as ERNIE and BERT corroborate those from BiLSTM, revealing the typical spatial patterns of tourists’ emotion in Harbin. All the clusters are located on the north bank of the Songhua River, alternating between east and west along the Sun Island area. Positive emotion is highly concentrated in the two core areas: the Sun Island Natural Recreation Core and the Central Street historical and cultural core; negative emotion, on the other hand, is scattered across the modern commercial and entertainment zone on the west side (Sunac Cultural Tourism City, Ice and Snow World), the extended experience zone surrounding Sun Island (Russian-style town, Snow Expo, sightseeing cable car), and the Longta City Landmark Zone on the east side. In addition, two areas with relatively low negative emotion have emerged in distinctive commercial districts such as Gogol Street and Shanhetun, as well as in the vicinity of the Northeast Tiger Park. This “core–periphery, east–west alternating” distribution pattern reveals the intrinsic connection between tourists’ emotional responses and the type, location, and quality of the tourism resources, thereby confirming the critical role of model selection in ensuring the reliability of findings in the field of emotional geography.

(3): The Mechanism of Association with “Elements of the Built Environment”

Road density, green space density, and accommodation facility density show a significant positive correlation with emotion scores, while POI diversity shows a weak negative correlation; in contrast, building density and floor area ratio show no significant correlation with emotion scores. These findings reveal the mechanisms through which elements of the built environment influence the differentiation of emotional spaces in cities, providing empirical evidence for understanding the varying emotional responses of different spatial actors, and offering valuable insights for enhancing urban well-being.

5.3. Theoretical Contributions, Practical Implications and Limitations

This study makes theoretical contributions in three main aspects. First, it constructs a multidimensional model evaluation and selection framework for analyzing urban affective spaces. This provides a systematic methodological tool for evaluating and selecting emotion analysis models suitable for urban research scenarios. Second, integrating emotion analysis technology with spatial visualization bridges the methodological gap between spatial parameter research and big data emotion analysis, advancing urban studies from an “object-centered” to a “people-centered” paradigm. Third, through the fine-grained evaluation, this study validates the superiority of pre-trained models in urban vertical applications, revealing the underlying mechanisms of their semantic prior and noise resilience. This provides a theoretical foundation for their subsequent deep application in fields such as urban computing.

The practical significance of this study lies in its exploration of how emotion analysis techniques—integrating sentiment classification, spatial aggregation, and correlation analysis—can be applied to people-centered urban planning and tourism management. First, by generating emotional spatial distribution maps, the study provides a visual reference for identifying potential emotional hotspots and problem areas within cities. Second, attribution analysis based on emotional patterns offers insights for spatial quality optimization, shifting from holistic enhancement to targeted interventions. Finally, the study proposes a preliminary framework for routine emotion monitoring, which could contribute to a “perception–evaluation–optimization” loop, thereby supporting efforts to enhance the urban well-being of both residents and tourists.

This study still has several limitations. First, the study primarily compared the performance of standalone models and has not yet explored the potential of ensemble models, which may offer room for further improvement in classification performance and robustness. Second, the data used in the study is concentrated solely on Harbin-based travel reviews. Although the results are reliable for this particular context, caution should be exercised when extrapolating the findings to other cities or cultural backgrounds, as differences in local characteristics may influence emotional patterns and model performance. Furthermore, the study fails to deconstruct and analyze more granular emotional orientations. These limitations also point to potential directions for future research, including developing lightweight fusion models tailored to vertical domains, expanding validation across multiple cities and scenarios, and advancing fine-grained emotion spatial analysis frameworks.

Author Contributions

Conceptualization, X.L.; methodology, X.L.; software, J.L.; validation, J.L.; formal analysis, S.H.; investigation, S.H.; resources, X.L.; data curation, J.L.; writing—original draft preparation, S.H.; writing—review and editing, J.L.; visualization, J.L.; supervision, X.L.; project administration, X.L.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Basic Research Projects for Higher Education Institutions by the Education Department of Liaoning Province (Grant No. LJ222510153004). Doctoral Research Startup Project under the 2025 Natural Science Foundation Program of Liaoning Province (Grant No. 2025-BS-0881). Liaoning Provincial Department of Education Service for Local Development Project (Grant No. LNSJSKJ-2025-028).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

HWB	Human Well-Being
UGC	User-Generated Content
AHP	Analytic Hierarchy Process
MLM	Masked Language Modeling
NIR	Noise-Interference Robustness
DAR	Domain Adaptation Robustness
MPC	Model Parameter Count
ATTPE	Average Training Time Per Epoch
CE	Convergence Epochs

References

Chai, Y.; Lin, Z.; Chen, X.; Li, Q.; Li, C. Identifying urban agglomeration’s range: Integrating clustering and accessibility analysis. Cities 2026, 171, 106708. [Google Scholar]
Mikhaeil, E.; Okulicz-Kozaryn, A.; Valente, R.R. Subjective well-being and urbanization in Egypt. Cities 2024, 147, 104804. [Google Scholar] [CrossRef]
Xia, C.; Yeh, A.G.-O.; Zhang, A. Analyzing spatial relationships between urban land use intensity and urban vitality at street block level: A case study of five Chinese megacities. Landsc. Urban Plan. 2020, 193, 103669. [Google Scholar] [CrossRef]
Wu, C.; Ye, Y.; Gao, F.; Ye, X. Using street view images to examine the association between human perceptions of locale and urban vitality in Shenzhen, China. Sustain. Cities Soc. 2023, 88, 104291. [Google Scholar] [CrossRef]
Meenar, M.R.; Mandarano, L.A. Using photovoice and emotional maps to understand transitional urban neighborhoods. Cities 2021, 118, 103353. [Google Scholar] [CrossRef]
Agustí, D.P.; Rutllant, J.; Fortea, J.L. Differences in the perception of urban space via mental maps and heart rate variation (HRV). Appl. Geogr. 2019, 112, 102084. [Google Scholar] [CrossRef]
Rani, S.; Kumar, P. Deep learning based sentiment analysis using convolution neural network. Arab. J. Sci. Eng. 2019, 44, 3305–3314. [Google Scholar] [CrossRef]
Wang, C.S.; Chen, Y.F.; Zheng, S.L.; Yuan, Y.; Wang, S. Research on Generating an Indoor Landmark Salience Model for Self-location and Spatial Orientation from Eye-Tracking Data. ISPRS Int. J. Geo-Inf. 2020, 9, 97. [Google Scholar] [CrossRef]
Lo, I.S.; McKercher, B. Ideal image in process: Online tourist photography and impression management. Ann. Tour. Res. 2015, 52, 104–116. [Google Scholar] [CrossRef]
Iwanczak, B.; Lewicka, M. Affective map of Warsaw: Testing Alexander’s pattern language theory in an urban landscape. Landsc. Urban Plan. 2020, 204, 103910. [Google Scholar] [CrossRef]
Ye, H.; Tussyadiah, I.P. Destination visual image and expectation of experiences. J. Travel Tour. Mark. 2011, 28, 129–144. [Google Scholar] [CrossRef]
Wang, F.; Yan, L.; Xiong, X.; Wu, B. A study on tourist cognition of urban memory in historic sites: A case study of Alley Nanluogu Historic Site in Beijing. Acta Geogr. Sin. 2012, 67, 545–556. [Google Scholar]
Li, X.; Jia, T.; Lusk, A.; Larkham, P. Rethinking place-making: Aligning placeness factors with perceived urban design qualities (PUDQs) to improve the built environment in historical districts. Urban Des. Int. 2020, 25, 338–356. [Google Scholar] [CrossRef]
Arshed, M.A.; Mumtaz, S.; Liaqat, M.S.; Haq, I.U.; Hussain, M. Lstm based sentiment analysis model to monitor covid-19 emotion. VFAST Trans. Softw. Eng. 2022, 10, 70–78. [Google Scholar] [CrossRef]
Zhao, Y.; Qin, B.; Shi, Q.; Liu, T.; Sui, M.S. Large-scale emotion lexicon collection and its application in emotion classification. J. Chin. Inf. Process. 2017, 31, 187–193. [Google Scholar]
Dolan, P.; Layard, R.; Metcalfe, R. Measuring Subjective Well-Being for Public Policy; London School of Economics: London, UK, 2011. [Google Scholar]
World Health Organization. Regional Office for Europe. Urban Green Spaces and Health—A Review of Evidence. 2016. Available online: https://iris.who.int/handle/10665/345751 (accessed on 12 March 2026).
leBrasseur, R. Linking human wellbeing and urban greenspaces: Applying the SoftGIS tool for analyzing human wellbeing interaction in Helsinki, Finland. Front. Environ. Sci. 2022, 10, 950894. [Google Scholar] [CrossRef]
Wu, W.; Liu, Y.; Gou, Z. Green infrastructure and urban wellbeing. Urban For. Urban Green. 2022, 68, 127485. [Google Scholar] [CrossRef]
Wang, R.; Browning, M.H.E.M.; Kee, F.; Hunter, R.F. Exploring mechanistic pathways linking urban green and blue space to mental wellbeing before and after urban regeneration of a greenway: Evidence from the Connswater Community Greenway, Belfast, UK. Landsc. Urban Plan. 2023, 235, 104739. [Google Scholar] [CrossRef]
Twohig-Bennett, C.; Jones, A. The health benefits of the great outdoors: A systematic review and meta-analysis of greenspace exposure and health outcomes. Environ. Res. 2018, 166, 628–637. [Google Scholar] [CrossRef] [PubMed]
Bratman, G.N.; Anderson, C.B.; Berman, M.G.; Cochran, B.; de Vries, S.; Flanders, J.; Folke, C.; Frumkin, H.; Gross, J.J.; Hartig, T.; et al. Nature and mental health: An ecosystem service perspective. Sci. Adv. 2019, 5, eaax0903. [Google Scholar] [CrossRef]
Jennings, V.; Gaither, C.J.; Gragg, R.S. Promoting environmental justice through urban green space access: A synopsis. Environ. Justice 2019, 12, 129–143. [Google Scholar] [CrossRef]
Millennium, E.A. Ecosystems and Human Well-Being: Wetlands and Water; World Resources Institute: Washington, DC, USA, 2005. [Google Scholar]
Thompson, S. Introduction to happiness and society. In The Oxford Handbook of Happiness; David, S.A., Boniwell, I., Conley Ayers, A., Eds.; Oxford University Press: Oxford, UK, 2013. [Google Scholar] [CrossRef]
Bondi, L.; Davidson, J.; Smith, M. Introduction: Geography’s ‘emotional turn’. In Emotional Geographies; Routledge: London, UK, 2016; pp. 1–16. [Google Scholar] [CrossRef]
Perkins, C. Performative and embodied mapping. In International Encyclopedia of Human Geography; Kitchin, R., Thrift, N., Eds.; Elsevier: Oxford, UK, 2009; pp. 126–132. [Google Scholar] [CrossRef]
Kim, J.H. The impact of memorable tourism experiences on loyalty behaviors: The mediating effects of destination image and satisfaction. J. Travel Res. 2018, 57, 856–870. [Google Scholar] [CrossRef]
Huang, S.; Lu, X.; Lu, J.; Zhang, J. Mining the Tourism Destination Image and Analyzing Influence Mechanisms. ISPRS Int. J. Geo-Inf. 2026, 15, 74. [Google Scholar] [CrossRef]
Das, S.; Chen, M. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), Bangkok, Thailand, 22–25 July 2001; Volume 35, p. 43. [Google Scholar]
Morinaga, S.; Yamanishi, K.; Tateishi, K.; Fukushima, T. Mining product reputations on the web. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 341–349. [Google Scholar] [CrossRef]
Wang, X.; Mou, N.; Zhu, S.; Yang, T.; Zhang, X.; Zhang, Y. How to perceive tourism destination image? A visual content analysis based on inbound tourists’ photos. J. Destin. Mark. Manag. 2024, 33, 100923. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar] [CrossRef]
He, Z.; Deng, N.; Li, X.; Gu, H. How to “read” a destination from images? Machine learning and network methods for DMOs’ image projection and photo evaluation. J. Travel Res. 2022, 61, 597–619. [Google Scholar] [CrossRef]
Resch, B.; Summa, A.; Zeile, P.; Strube, M. Citizen-centric urban planning through extracting emotion information from twitter in an interdisciplinary space-time-linguistics algorithm. Urban Plan. 2016, 1, 114–127. [Google Scholar] [CrossRef]
Tas, D.; Priyadarshi Sanatani, R. Geo-located Aspect Based Sentiment Analysis (ABSA) for Crowdsourced Evaluation of Urban Environments. arXiv 2023, arXiv:2312.12253. [Google Scholar] [CrossRef]
Zhang, B.; Lin, J.; Luo, M.; Zeng, C.; Feng, J.; Zhou, M.; Deng, F. Changes in public sentiment under the background of major emergencies—Taking the Shanghai epidemic as an example. Int. J. Environ. Res. Public Health 2022, 19, 12594. [Google Scholar] [CrossRef]
Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? emotion classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86. [Google Scholar]
Liu, B. Emotion Analysis and Opinion Mining; Morgan & Claypool Publishers: San Rafael, CA, USA, 2012. [Google Scholar]
SnowNLP. SnowNLP: A Python Library for Processing Chinese Text. GitHub. Available online: https://github.com/isnowfy/snownlp (accessed on 12 February 2026).
Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Kaur, A.; Bhambri, P.; Singla, S.K. emotional Analysis Using RNN, CNN AND LSTM: A Comparative Study of Accuracy and Computational Efficiency. Libr. Prog.-Libr. Sci. Inf. Technol. Comput. 2024, 44, 4424. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Harbin Municipal People’s Government. The ‘Change’ and ‘Unchanged’ of Harbin Cultural Tourism. 18 December 2025. Available online: https://www.harbin.gov.cn/haerbin/c104696/202601/c01_1101853.shtml (accessed on 12 February 2026).
Zhang, F.; Chen, J.; Tang, Q.; Tian, Y. Evaluation of emotion classification schemes in social media text: An annotation-based approach. BMC Psychol. 2024, 12, 503. [Google Scholar] [CrossRef]
Lu, X.; Huang, S.; Xie, W.; Sun, Y. The Impact of Built Environment on Urban Vitality—A Multi-Scale Geographically Weighted Regression Analysis in the Case of Shenyang, China. Buildings 2025, 15, 2989. [Google Scholar] [CrossRef]
Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [Google Scholar] [CrossRef]
Mouratidis, K. Urban planning and quality of life: A review of pathways linking the built environment to subjective well-being. Cities 2021, 115, 103229. [Google Scholar] [CrossRef]
Frey, V.N.; Langthaler, P.B.; Huf, M.J.; Gruber, G.; Prinz, T.; Kedenko, L.; Iglseder, B.; Paulweber, B.; Trinka, E. Stress and the City: Mental Health in Urbanized vs. Rural Areas in Salzburg, Austria. Int. J. Environ. Res. Public Health 2024, 21, 1459. [Google Scholar] [CrossRef] [PubMed]
Larson, L.R.; Jennings, V.; Cloutier, S.A. Public parks and wellbeing in urban areas of the United States. PLoS ONE 2016, 11, e0153211. [Google Scholar] [CrossRef] [PubMed]
Samavati, S.; Veenhoven, R. Happiness in urban environments: What we know and don’t know yet. J. Hous. Built Environ. 2024, 39, 1649–1707. [Google Scholar] [CrossRef]
Ram, B.; Limb, E.S.; Shankar, A.; Nightingale, C.M.; Rudnicka, A.R.; Cummins, S.; Clary, C.; Lewis, D.; Cooper, A.R.; Page, A.S.; et al. Evaluating the effect of change in the built environment on mental health and subjective well-being: A natural experiment. J. Epidemiol. Community Health 2020, 74, 631–638. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of research scope.

Figure 2. Technical approach.

Figure 3. Radar chart of each model index.

Figure 4. Emotional value reviews volume distribution of attractions.

Figure 5. Model performance across emotion polarities.

Figure 6. Model accuracy across spatial rings.

Figure 7. Optimal model selection and integrated emotion map.

Figure 8. Correlation between affective value and various built environment indicators.

Table 1. Categorization of emotion analysis models.

Category	Representative Model/Tool	Implementation Principle	Characteristics
Traditional machine learning tools	SnowNLP	A pre-packaged Chinese emotion analysis library built upon traditional algorithms such as Naive Bayes. It relies on manual feature engineering to vectorize text, then employs statistical classification models for training and prediction.	High modeling efficiency and strong interpretability; however, it heavily relies on feature engineering, exhibits weak generalization capabilities, and struggles with issues such as semantic ambiguity.
Classical model of deep learning	LSTM, BiLSTM, CNN, RNN	Utilizes neural networks for end-to-end feature learning on text, automatically capturing local patterns or sequence dependencies.	It can automatically learn semantic features, significantly enhancing contextual modeling capabilities; however, it typically requires a large amount of labeled data and consumes substantial training resources.
Pre-trained model	BERT, RoBERTa, ERNIE	Based on the Transformer architecture, it is pre-trained on massive unlabeled data through self-supervised tasks and then fine-tuned for specific downstream tasks.	Possessing robust general semantic understanding and contextual representation capabilities, with outstanding generalization performance, it has become the mainstream paradigm in natural language processing, particularly for Chinese emotion analysis.

Table 2. Table of data sources.

Platform	Data Type	URL	Temporal Coverage
Ctrip	Scenic spot coordinates, UGC reviews, UGC images	https://www.ctrip.com/webapp/tripmap/travel?entranceId=Ctriponlinehomeside (accessed on 12 March 2026)	2016.1–2025.5

Table 3. Indicators in the multidimensional evaluation system.

Indicator			Calculation Method	Measured Content
Overall performance	Accuracy		The proportion of samples correctly predicted by the model overall	Comprehensive classification capability
Category-level performance	Macro-F1		The unweighted arithmetic mean of F1 scores across all categories	Avoid overlooking minority categories and evaluate classification balance
		Precision	The proportion of samples in a certain class that actually belong to that class, as predicted by the model	Credibility of prediction results
		Recall	The proportion of samples correctly identified by the model within a given category relative to the total number of genuine samples in that category	Coverage capability for specific categories
Robustness	NIR		Model performance on the noise-free reference subset	Stable performance in the presence of noise in reference
Robustness	DAR		Model performance in the tourism domain subset	Understanding of specialized knowledge and specific contexts within the tourism sector
Efficiency	MPC		Total number of trainable parameters in the model	Model complexity and storage footprint
	ATTPE		The average time required for the model to complete one round of training	Computational cost of single-wheel training
	CE		The number of training cycles required for the model to achieve optimal performance on the validation set	Learn from the data and achieve a stable pace

Table 4. Index weight.

Primary Indicator	Secondary Indicator	Weight
Overall performance	Accuracy	0.13
Category-level performance	Macro-F1	0.50
Robustness	NIR	0.28
Robustness	DAR	0.28
Efficiency	MPC	0.09
	ATTPE
	CE

Table 5. Each index score of each model.

Model		Accuracy	Macro-F1	Robustness	Efficiency	Comprehensive Score
Pre-trained Model	ERNIE	0.9092	0.7548	0.9220	0.0427	0.7612
	RoBERTa	0.9173	0.7531	0.9080	0.0296	0.7563
	BERT	0.9082	0.7111	0.9147	0.0234	0.7354
Classical Model of Deep Learning	BiLSTM	0.8888	0.6587	0.8799	0.0703	0.7011
	LSTM	0.8388	0.3138	0.8518	0.6631	0.5676
	R-CNN	0.8429	0.3049	0.8518	0.1709	0.5194
Machine Learning Tools	SnowNLP	0.8112	0.5209	0.7378	-	0.5757

Table 6. Emotion prediction distribution of each model.

Sorts of Emotions	Pre-Trained Model			Classical Model of Deep Learning			Machine Learning Tool
Sorts of Emotions	ERNIE	RoBERTa	BERT	BiLSTM	LSTM	RCNN	SnowNLP
Positive	41,094	40,638	41,823	42,151	49,010	48,934	37,156
Neutral	3668	3945	3404	2480	0	55	1946
Negative	4248	4427	3783	4379	0	21	9908

Table 7. Emotion maps predicted by each model.

Model Emotion		Neutral	Negative
Pre-trained Model	ERNIE
	RoBERTa
	BERT
Classical Model of Deep Learning	BiLSTM
	LSTM	--	--
	RCNN
Machine Learning Tool	SnowNLP
Legend

Note: ‘--’ That is, the amount of negative and neutral emotional data is not enough to form an emotion map.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, X.; Lu, J.; Huang, S.; Zhan, M. Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin. Buildings 2026, 16, 1695. https://doi.org/10.3390/buildings16091695

AMA Style

Lu X, Lu J, Huang S, Zhan M. Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin. Buildings. 2026; 16(9):1695. https://doi.org/10.3390/buildings16091695

Chicago/Turabian Style

Lu, Xu, Jingqun Lu, Shan Huang, and Mingsong Zhan. 2026. "Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin" Buildings 16, no. 9: 1695. https://doi.org/10.3390/buildings16091695

APA Style

Lu, X., Lu, J., Huang, S., & Zhan, M. (2026). Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin. Buildings, 16(9), 1695. https://doi.org/10.3390/buildings16091695

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Urban Well-Being Assessment Based on Tourist Emotional Space Analysis: The Case of Harbin

Abstract

1. Introduction

2. Literature Review

2.1. Urban Well-Being

2.2. Emotion Map

2.3. Method Evolution

3. Method

3.1. Study Area and Data Preparation

3.1.1. Study Area

3.1.2. Data Acquisition and Preprocessing

3.1.3. Data Annotation

3.2. Emotional Assessment System

3.2.1. Construction Evaluation System

3.2.2. Model Training

3.3. Relationship Between the Built Environment and Emotions

4. Results

4.1. Model Performance Evaluation

4.1.1. Comparison of Multidimensional Performance Metrics

4.1.2. Comparative Model Assessment Based on the Evaluation Framework

4.2. Presentation of Prediction Results

4.2.1. Quantitative Distribution of Predicted Emotion

4.2.2. Generation of Emotion Maps

4.3. Integration of Optimal Models and Generation of a Composite Emotion Map

4.3.1. Comparative Model Performance Across Emotion Polarities

4.3.2. Evaluation of Model Accuracy Across Spatial Rings

4.3.3. Generation of the Composite Emotion Map

4.4. Correlation Mechanism Between Emotion and Built Environment Factors

5. Discussion and Conclusions

5.1. Discussion

5.2. Conclusions

5.3. Theoretical Contributions, Practical Implications and Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI