Next Article in Journal
Increasing the Fault Tolerance of the Pseudo-Random Code Generator with Substitution–Permutation Network “Kuznechik” Transformation Through the Use of Residue Code
Previous Article in Journal
Research on Synergistic Fracturing Technology for Lateral Multi-Layer Thick Hard Rock Stratum in Fully Mechanized Faces with Large Mining Height Based on the Triangular Slip Zone Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Learning Framework for Emotion Recognition and Semantic Interpretation of Social Media Images in Urban Parks: The ULEAF Approach

1
Department of Landscape Architecture, Kyungpook National University, Daegu 41566, Republic of Korea
2
College of Ecology and Environment, Inner Mongolia University, Hohhot 010010, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 127; https://doi.org/10.3390/app16010127
Submission received: 1 December 2025 / Revised: 16 December 2025 / Accepted: 19 December 2025 / Published: 22 December 2025

Abstract

This study proposes the Urban Landscape Emotion Analysis Framework (ULEAF) based on images of urban parks shared on social media. This framework integrates an emotion recognition module driven by a convolutional neural network (ConvNeXt Tiny) with a semantic extraction module supported by multimodal semantic matching models (CLIP and DeepSentiBank ANP lexicon). It constructs a systematic analysis pathway from semantic understanding to emotional perception, effectively overcoming the limitations of traditional research methods. Results indicate that positive emotion images predominantly correlate with nature, health, and openness, while negative emotion images are closely associated with the characteristics of decay, abandonment, and oppression, as well as loneliness and calmness, estrangement and disharmony, and gloom and bleakness. Findings reveal trends consistent with prior research, further validating the stable association between urban landscape visual features and emotional perception. The analytical framework developed in this study facilitates the systematic revelation of semantic characteristics and affective perception mechanisms in large-scale urban park imagery, providing scientific reference for optimizing urban park landscapes and implementing emotion-oriented design.

1. Introduction

With the fast growth of cities, urban parks are very important in modern cities. From a large-scale perspective, urban parks increase green space, help maintain the stability of urban ecosystems, reduce the urban heat island effect [1], and improve air quality and local climate [2]. From a small-scale perspective, urban parks give people spaces for recreation, exercise, and social activities [3]. They also hold cultural and historical value and help build social connections and community cohesion [4]. More importantly, urban parks are open spaces that can reduce stress, regulate emotions, and improve well-being, which benefits both people’s health and society [5,6]. Therefore, urban parks are key parts of the city and are important for human health and social development.
Recently, research on emotions in urban parks has grown. With the development of neural networks and deep learning, research methods have become more varied. Early studies mainly used questionnaires where participants rated images of landscapes to show their emotions. Now, more studies use social media text analysis. This can involve calculating emotion scores using dictionaries or predicting emotion types using machine learning or deep learning models trained on labeled data. For example, Zhang et al. [7] used questionnaires to build a model linking park landscape features to user feelings. Another study, Kong et al. [8], used social media text to study how different park types and features affect visitors’ positive emotions. Social media text helps avoid the limitations of questionnaires, such as small samples and long data collection, but text alone cannot clearly show which environmental elements cause different emotions [9]. Social media images can show emotions directly through what people see, giving a more practical way to study the link between emotions and landscape features [10]. Using social media images provides a new way to study park emotions and helps understand how park landscapes affect feelings.
In landscape research, the Semantic Differential (SD) method is often used to find what affects emotions from images. Experts make opposite adjective pairs, and surveys collect people’s ratings of landscapes. Factor analysis can then reduce the data to find the main factors affecting emotions [11]. Another method uses principal component analysis (PCA) to simplify survey data and show key emotional dimensions [12]. Questionnaires are useful because they are rigorous and objective, but they also have limits. First, small sample sizes may not represent all people. Second, data collection is slow and needs a lot of time and resources [13]. Third, fixed questions may not capture real experiences well [14].
Social media is an important data source and has great potential in research [15]. In urban park studies, social media data is widely used [16,17]. This data comes as text and images [18]. Images show scenes clearly, create context, and quickly evoke emotions [19]. Compared to text, images can show emotions without language, cross cultures, and provide more objective visual information [10]. Photos shared by social media users are useful for showing public visual preferences [20,21]. Some studies use images and text to explore public emotions and visual perception [9] or to improve landscape quality evaluation [22]. Social media images are useful, but more work is needed to overcome challenges.
Image classification is a key task in computer vision. It assigns images to categories based on main visual features [23]. With deep learning and Convolutional Neural Networks (CNNs), classification accuracy has improved. It is used in object recognition, scene analysis, and medical images [24]. Recently, it has also been applied in emotion recognition [25]. Models can find objects, scenes, and compositions in images to predict emotions [26]. In urban research, image classification is used to find urban elements from street view, aerial, and land cover images to assess urban quality [27]. For example, Law et al. [28] uses CNNs to evaluate street frontage quality. These studies show that image classification helps with visual information processing, emotion analysis, and urban research. This paper uses image classification to find emotions in social media images, which helps link park features with public feelings and gives evidence for planning urban spaces.
Multimodal models process text and images together, mapping both to a shared space for semantic alignment. Many models, such as LLaVA, BLIP-2 [29], and GPT-4, use the CLIP framework [30]. Recently, these models have been used in urban research. For example, in Perez et al. [31], a SAGAI framework was developed using LLaVA to generate spatial indicators from text prompts to evaluate safety and walkability. Similarly, in Blečić et al. [32], a multimodal LLM framework evaluates walkability from streetscape images and adds text explanations to improve interpretability. However, Large multimodal models such as LLaVA and BLIP-2 are characterized by substantial parameter sizes and high computational and inference costs, which limit their practicality for large-scale and widely deployable applications in empirical research [29]. DeepSentiBank is still widely used for image emotion classification and landscape feature extraction [33]. DeepSentiBank uses CNNs to classify images into adjective–noun pairs (ANPs). It was trained on over 1 million geotagged photos with 2089 ANP combinations (231 adjectives, 424 nouns) [34] and is widely used in landscape emotion studies [35,36]. Still, DeepSentiBank has limits in semantic flexibility and generalization for specific cases like urban parks [37]. This study uses CLIP as the base, adds the DeepSentiBank ANP lexicon, and filters for urban park features to build a “feature-sentiment” recognition system. It extracts and analyzes park elements and their emotions quantitatively.
In summary, to address the requirements of stability, interpretability, and scalability in emotion recognition and landscape semantic analysis of social media images of urban parks, this study proposes a multimodal emotion–semantic integrated analytical framework, termed the Urban Landscape Emotion Analysis Framework (ULEAF). Rather than relying on a single end-to-end large multimodal model, the framework adopts a lightweight and extensible modular design tailored to the research objectives and methodological considerations. Specifically, a ConvNeXt Tiny convolutional neural network, optimized through transfer learning and fine-tuning, is employed for robust image-level emotion classification (positive/negative) and is integrated with the CLIP-based multimodal semantic matching model in conjunction with the DeepSentiBank ANP lexicon. Compared with large-scale end-to-end multimodal models characterized by high parameter complexity and inference costs, this combined framework substantially reduces computational overhead while maintaining semantic representation accuracy, making it well-suited for large-scale batch analysis of social media images and enhancing the reproducibility and cross-context applicability of the results.
Building on this framework, the study further incorporates statistical techniques, including factor analysis and regression analysis, to systematically model the outputs of the proposed system. This approach enables the direction and magnitude of the effects of different semantic elements on emotional tendencies to be quantified and allows the relationships among visual features, semantic representations, and emotional perception to be examined and validated at the statistical level. Collectively, this framework provides a research pathway that balances lightweight implementation, technical feasibility, and theoretical interpretability, offering new insights into the underlying connections between urban park landscape characteristics and public emotional perception.

2. Materials and Methods

2.1. Study Area and Data Collection

Guangzhou is a typical city in China for economic and urban development in the past two decades. It has often been selected as an important case study in many research projects [38]. As a city with high population density, Guangzhou shows the important role of urban parks in improving residents’ quality of life within a limited urban space [39]. At the same time, Guangzhou has an active social media user base and rich online data, which provides sufficient data support for research on urban parks using social media [40]. In addition, the Guangzhou Green Space System Plan (2021–2035) clearly states that the city continues to improve the quality of urban green spaces, providing a solid foundation for policy-oriented empirical research.
For data, this study used urban park images from the Dianping platform as social media image data. As one of China’s main local lifestyle social platforms, Dianping has been widely used in urban studies, tourism behavior analysis, and public space perception research [15,41,42]. Users can upload photos and comments related to urban parks, providing reliable data to study public visual perception and emotional responses [43].
For data collection, this study used Python 3.11 (Python Software Foundation (PSF), Wilmington, DE, USA) to access the Dianping API and grab image data. Only urban parks with at least 30 photos were kept for analysis. Data collection ended on 1 August 2023. In total, 111,323 images from 82 urban parks in Guangzhou were collected, as shown in Figure 1. From these, 20,000 images were randomly selected for this study. To ensure quality, samples were manually screened to remove blurry, overexposed, or irrelevant images (e.g., ads, food). Any removed images were replaced with other qualified images. Finally, a dataset of 20,000 high-quality urban park images was formed.

2.2. Manual Annotation and Image Classification Model Training

To study emotion recognition in social media images of urban parks, this research randomly selected 5000 samples from 20,000 high-quality urban park landscape photos collected in the early stage. Three experts with relevant professional backgrounds were invited to manually label the emotional categories. The labeling system used a three-class scheme: positive (1), negative (0), and irrelevant (2) (the “irrelevant” category primarily refers to content for which emotional interpretation is highly subjective or for which a stable emotional judgment cannot be reliably established). For examples with clear disagreement among experts, group discussions were held, and final labels were decided through consensus to ensure accuracy and consistency. The labeled data were divided into a training set and a validation set in an 8:2 ratio, forming the core dataset for model training and evaluation. During the training stage, data augmentation methods such as random cropping and horizontal flipping were applied to improve the model’s robustness and generalization. These operations helped the model adapt to the diversity and variation in social media images.
The emotion recognition model in this study was built on the ConvNeXt Tiny architecture. This model is a lightweight convolutional neural network that balances computational efficiency and recognition accuracy. It uses a compound scaling strategy to reduce computational cost while keeping strong feature extraction ability [44,45]. Because the number of labeled samples was limited, transfer learning was applied to improve model performance under small-sample conditions. Specifically, the ConvNeXt Tiny model pre-trained on the ImageNet dataset was used as the base model, and its output layer was modified for the three-class task (positive, negative, irrelevant). During training, a progressive layer unfreezing strategy was used. The process started with training only the classification layer, then gradually unfroze the middle and feature extraction layers, achieving step-by-step optimization from the classification head to the full network (Figure 2). To improve generalization ability, additional data augmentation such as brightness, contrast, rotation, and translation changes was used during training. To balance stability and learning speed, a layer-wise learning rate was applied: the feature layers used a lower rate (1 × 10−5), and the classification layer used a higher rate (1 × 10−3). The AdamW optimizer and the CosineAnnealingWarmRestarts learning rate scheduler were combined to ensure stable convergence. The loss function was a cross-entropy loss with class weights and label smoothing (label smoothing cross-entropy), which reduced bias caused by unbalanced samples.
In addition, pseudo-label learning was introduced to reduce the problem of limited labeled data. This method used high-confidence predictions on unlabeled data as “pseudo-labels,” expanding the training data without extra labeling cost [46]. Previous studies have shown that pseudo-labeling can effectively enlarge the training dataset and improve generalization performance [47]. In this study, the model fine-tuned on the 5000 labeled samples was first used to predict the remaining 15,000 unlabeled images. Samples with prediction confidence higher than 0.8 were then added to the training set for retraining, which improved the model’s overall performance.
To further test the model’s effectiveness and ability to classify emotional tendencies, three widely used image classification models—EfficientNet-B0, ResNet-18, and ResNet-50—were compared with ConvNeXt Tiny. The training process for all models was kept the same to ensure that the evaluation results were comparable.
Figure 2. Framework of the Urban Park Image Emotion Recognition Model Based on ConvNeXt Tiny [48].
Figure 2. Framework of the Urban Park Image Emotion Recognition Model Based on ConvNeXt Tiny [48].
Applsci 16 00127 g002

2.3. Multimodal Model Fine-Tuning and Manual Validation

In this study, to build the candidate semantic units, the DeepSentiBank Adjective–Noun Pair (ANP) lexicon was introduced based on the CLIP model. Since this research focuses on urban park landscapes, the original 424 nouns and 231 adjectives in the DeepSentiBank model were further filtered. Only 55 common landscape elements (nouns) closely related to urban parks (such as trees, flowers, and water bodies) and their corresponding 124 adjectives were kept. This selection improved the relevance of semantic extraction and reduced unrelated noise.
During processing, the CLIP model encoded both the input image and all ANP phrases and calculated the cosine similarity between them in the embedding space. This step identified the most semantically related results for each image. To keep the results interpretable and focused, only the top five ANPs most relevant to each image were retained. These were presented in the form of “landscape element–adjective–correlation” triplets, which were used for the following semantic and emotion analyses.
To evaluate the accuracy of the model outputs, a manual assessment process was conducted following previous research [33]. A total of 200 images and their corresponding 1000 ANPs were randomly selected from the model outputs. Three experts with relevant professional backgrounds independently judged the matching degree using a three-level scoring system (0 = irrelevant, 1 = unclear, 2 = relevant). The final score for each ANP was the average of the three experts’ ratings. When the average score was greater than 1, the ANP was considered a valid match. Finally, the proportion of valid ANPs among all ANPs was used to measure the overall model accuracy, which reached 89.2%, quantitatively verifying the model’s recognition performance.
In summary, this study builds an Urban Landscape Emotion Analysis Framework (ULEAF) based on a CNN emotion recognition module (ConvNeXt Tiny) and a multimodal semantic matching model (CLIP and DeepSentiBank ANP), as shown in Figure 3. This framework can output emotion labels and landscape semantic elements at the same time and provides new technical support for the combined analysis of emotion and semantics in urban park images.

2.4. Adjective Processing and Factor Analysis

Since this study aims to explore the influence of semantic elements on emotions in urban park images, only the adjectives in the generated ANPs were analyzed in depth. First, given the sample size available in this study, static pre-trained GloVe (Global Vectors for Word Representation) embeddings were employed to represent the extracted adjectives as high-dimensional semantic vectors in order to avoid introducing additional model bias and to capture semantic similarity relationships among different semantic elements. GloVe is trained on global statistical information from a large corpus, allowing semantically similar words to be closer in the vector space [49]. In this model, the objective function learns the word vectors by minimizing the difference between the predicted and actual co-occurrence frequencies, as shown below:
J = i , j = 1 v f ( X i j ) · ( w i T w ¯ j + b i + b ¯ j log X i j ) 2
In the formula, w i T and w ¯ j represent the word vectors, b i and b ¯ j are the bias terms, X i j is the co-occurrence count of words i and j, and f ( X ) is a weighting function that reduces noise from low-frequency co-occurrences.
Next, to explore the latent structure of adjective correlations related to emotions in social media images, factor analysis was conducted on the organized adjective correlation data. Unlike cluster analysis, factor analysis extracts potential common dimensions based on the relationships among variables, revealing the structural features of adjective semantics in emotional expression [50]. To improve interpretability, the Varimax orthogonal rotation method was used to extract five main factors. Before the analysis, adjectives with very small variance were removed to ensure the stability and validity of the results [51]. Through this process, the internal structure of adjective semantics in emotional dimensions was systematically revealed, providing theoretical support for subsequent emotional and semantic analyses.

2.5. Impact of Semantic Factors on Emotional Tendency Logistic Regression Analysis

Logistic regression is a statistical method widely used for binary classification problems. It can be applied to measure the direction and strength of the effect of independent variables on a dependent variable [52]. In this study, to investigate the influence of latent semantic factors on the emotional tendencies of social media images, factor scores derived from factor analysis were used as independent variables, while image-level emotion category labels served as the dependent variable. To ensure clear interpretation of the modeling results and to minimize interference from emotionally ambiguous samples, the regression analysis was restricted to images labeled as positive or negative, and samples annotated as “irrelevant” were excluded. The logistic regression model is expressed as follows:
                                            l o g i t ( P ( Y = 1 ) ) = β 0 + i = 1 n β i F i
In this formula, Y represents the emotional label (binary classification, where 0 indicates negative and 1 indicates positive). F i is the score of the i-th latent factor, β 0 is the intercept term, and β i is the coefficient of factor F i , which shows its direction and strength of influence. n represents the total number of latent factors.
Through this logistic regression model, we can estimate the contribution and significance of each latent factor to emotional expression and identify which semantic dimensions have the most important influence on emotional tendency.

2.6. Research Framework

This study proposes an integrated analytical framework for emotion recognition and semantic interpretation of social media images of urban parks, referred to as the Urban Landscape Emotion Analysis Framework (ULEAF). The framework is designed to support large-scale and interpretable analysis by integrating emotion recognition and semantic extraction, followed by statistical modeling based on the outputs of these two core components.
Within the ULEAF, an emotion recognition module is first employed to identify the overall emotional tendency expressed in urban park images, thereby providing reliable emotion labels for subsequent analyses. In parallel, a semantic extraction module applies multimodal semantic matching to extract landscape-related adjectives from images. The reliability of these extracted semantic adjectives is further examined through expert-based evaluation to ensure the validity of the semantic information used in subsequent analyses.
Building upon the outputs of the emotion recognition and semantic extraction modules, statistical analyses are subsequently conducted to investigate the relationships between semantic characteristics and emotional perception. Specifically, semantic representations are quantified, reduced into latent semantic factors, and incorporated into regression-based models to estimate the direction and magnitude of their influence on emotional tendencies. Through this integrated framework, the study provides an interpretable analytical pathway for understanding how adjective-based semantic factors embedded in social media images of urban parks are associated with emotional perception, thereby elucidating the interplay between landscape semantic elements and emotional expression.

3. Results

3.1. Comparison of Image Emotion Classification Models

After training on the dataset, we recorded the accuracy of each model across different epochs. The final results (Table 1 and Figure 4) show that the validation accuracy of ConvNeXt Tiny reached 0.8510, outperforming EfficientNet-B0 (0.8231), ResNet-18 (0.8340), and ResNet-50 (0.8279). In addition, a comparison of the loss values (Table 1 and Figure 5) indicates that ConvNeXt Tiny achieved the lowest final loss at 0.167, whereas EfficientNet-B0 (0.267), ResNet-50 (0.288), and ResNet-18 (0.243) all exhibited substantially higher loss values. A lower loss suggests smaller prediction errors, better convergence, and stronger generalization capability. Accuracy and loss are two key metrics for evaluating and comparing model performance, providing a comprehensive view of how well each model learns and generalizes. Based on these metrics, the results clearly demonstrate that ConvNeXt Tiny outperforms the other models in the emotion recognition task for social media images of urban parks.
To further analyze the classification performance of ConvNeXt Tiny, we used its best weights to create and visualize the confusion matrix. The confusion matrix (Figure 6) clearly shows the model’s performance for each category: the accuracy for positive emotion is 91.1%, for irrelevant emotion is 86.5%, and for negative emotion is 72.9%. These results indicate that the model performs well in recognizing positive and irrelevant emotions, but its performance on negative samples is relatively lower. This outcome is closely related to the widely observed positivity bias in social media content sharing, which results in a relatively limited number of negative samples available for model learning [53,54]. Consequently, although the overall classification performance remains reliable, there is still room for improvement in the recognition of negative emotions.
Next, we used the trained ConvNeXt Tiny model to predict the emotions of 20,000 social media images of urban parks. The results, as shown in Figure 7, indicate that there are 10,324 positive images (51.62%), which is the largest group; 6459 negative images (32.29%), the second largest; and 3217 irrelevant images (16.08%), the smallest group. These results indicate a clear difference in the distribution of emotional categories in the dataset, providing a solid data foundation for later analysis of the relationship between park semantics and emotions.

3.2. Adjective Frequency Analysis and Word Cloud Visualization by Emotion Category

First, this study used the CLIP model and introduced the DeepSentiBank Adjective–Noun Pair (ANP) vocabulary to extract adjectives from urban park images labeled as positive and negative emotions. A total of 16,783 images were analyzed. Next, we counted how often each adjective appeared in positive and negative emotion images and kept only the top 20 adjectives in each group.
The results shown in Table 2 indicate that for positive emotion images, the most frequent adjectives are calm (9185 times), natural (5373 times), beautiful (4323 times), attractive (3337 times), and scenic (3226 times). These words are mostly related to nature, beauty, and a peaceful atmosphere. In contrast, in negative emotion images, frequent adjectives include broken (6049 times), calm (3323 times), natural (1554 times), scenic (1276 times), and dead (979 times). These words often describe decay, silence, and loneliness, suggesting that negative emotions are linked to images showing ruin, depression, or gloomy environments.
Based on these findings, word clouds were created for both positive and negative emotion categories to visually show the differences in adjective use under different emotional contexts (Figure 8).

3.3. Adjective Processing and Factor Extraction

First, this study merged adjectives with similar meanings using the GloVe word vector semantic similarity, resulting in 104 entries. Then, 30 variables with very small variance were removed. Finally, factor analysis was applied to the remaining 74 adjectives to reveal the hidden structure among adjectives and their influence on emotional tendency.
The results of the factor analysis are illustrated in Figure 9, showing the distribution and clustering of adjectives across the extracted latent dimensions. The factor analysis results show that adjectives with loadings above 0.4 can be considered as having a significant contribution to the corresponding factor. Overall, five latent factors were extracted: Factor 1 (crying (No.30), rough (No.82), damaged (No.31), creepy (No.27), broken (No.3), abandoned (No.20)) represents the characteristics of decay, abandonment, and oppression. Factor 2 (classic (No.101), outdoor (No.73), healthy (No.59), clean (No.39)) reflects nature, health, and openness. Factor 3 (empty (No.55), peaceful (No.5), amazing (No.1), stunning (No.92)) shows loneliness and calmness. Factor 4 (broken (No.3), weird (No.57), lonely (No.43)) indicates estrangement and disharmony. Factor 5 (dark (No.15), dead (No.2)) represents gloom and bleakness.

3.4. Impact of Semantic Factors on Emotional Tendency

Then, the extracted factor scores were used as independent variables, and the emotion labels were used as the dependent variable to build a logistic regression model to examine the influence of each factor on emotional tendency. The regression results show that different types of landscape features exhibit significant differences in emotional perception (Table 3). Factor 1 (decay, abandonment, and oppression) has a significant negative correlation with emotional tendency (β = −1.083, p < 0.001), indicating that decayed or oppressive visual elements in the landscape can significantly reduce the perception of positive emotions. Factor 2 (nature, health, and openness) has a significant positive correlation (β = 0.488, p < 0.001), showing that natural, clean, and open spaces help enhance positive emotional experiences. Factor 3 (loneliness and calmness) has a small regression coefficient (β = −0.001, p = 0.951 > 0.001) and is not statistically significant, suggesting that quiet or low-interference spaces have a limited effect on emotional tendency. Factor 4 (estrangement and disharmony) shows a significant negative correlation (β = −0.773, p < 0.001), reflecting that alienation and a sense of distance in the space may suppress positive emotions. Factor 5 (gloom and bleakness) also has a significant negative effect (β = −0.642, p < 0.001), indicating that dark tones and low-energy atmospheres are more likely to induce negative emotional experiences. Overall, except for Factor 3, the remaining factors all significantly influence emotional tendency, indicating that the visual features of landscape spaces play a key role in shaping individual emotional perception.

4. Discussion

4.1. Methodological Contributions and Innovations

As Zhao et al. [55] and Popelka et al. [56] point out, the deep impact of artificial intelligence on urban development lies not in technical innovation but in its effective application and implementation in urban planning and design. Artificial intelligence is not only changing how cities are understood and how spaces are organized but also providing planners with new data insights and decision support. This study builds on this idea and expands it by combining multimodal models and computer vision in artificial intelligence to systematically analyze urban park images shared by the public on social media. By analyzing the relationships between semantic elements of landscapes and emotional features in the images, this study reveals how different semantic characteristics contribute to shaping public emotional perception. Based on this, it proposes several strategies and design suggestions to enhance the emotional impact of urban parks, aiming to provide empirical evidence and theoretical guidance for future park renovation and landscape design.
In addition, the analysis framework proposed in this study overcomes the limitations of traditional methods like surveys and interviews. Unlike previous studies that often use researcher-taken photos for participants to judge emotions or semantics, which can be influenced by the photographer’s gender, education, or age [57], this study directly uses urban park images spontaneously shared by diverse social media users. This data source is more diverse and representative and better reflects how the public perceives and feels about urban parks in daily life.
In future research, the proposed framework has good scalability and generalizability. It can be used as a modular component in other urban spaces or environmental perception models and work together with other big data models. This allows multidimensional and intelligent exploration and evaluation of urban parks and wider city landscapes, providing deeper and broader insights into urban park design and management.

4.2. Impact of Semantic Elements on Emotions in Urban Parks

The study found that Factor 1, Factor 4, and Factor 5 have significant negative effects on emotional tendencies in urban park landscapes. This result is supported by other studies on urban park landscapes. First, regarding Factor 1 (decay, abandonment, and oppression), previous research [58] shows that when park landscapes are neglected or poorly maintained, users tend to feel “desolate” or “worn out,” which reduces psychological comfort and enjoyment. At the same time, Roberts et al. [59] also point out that decayed landscapes can trigger negative emotions.
Second, regarding Factor 4 (estrangement and disharmony), previous studies [60] indicate that unreasonable spatial planning or landscape structures can negatively affect user satisfaction and create a sense of alienation. Another study [61] also shows that poor spatial patterns (e.g., non-permeable patches, high patch density, scattered layouts) significantly reduce park satisfaction, suggesting that chaotic structures may increase user estrangement.
Finally, for Factor 5 (gloom and bleakness), research [62] finds that when landscapes lack visual appeal, users tend to feel cold, empty, and depressed. Another study [62] shows that compared to green and well-designed park spaces, empty or neglected landscapes are more likely to cause negative emotions and are perceived as depressing.
In contrast, Factor 2 (nature, health, and openness) has a positive effect on emotional tendency, which aligns with previous studies. Research shows that natural and open environments can promote positive emotions and mental health. For example, Knez et al. [63] find that higher perceived naturalness not only increases user well-being but also strengthens their sense of place and attachment to the park or urban green space. It also helps improve psychological comfort, reduce stress, and promote overall emotional recovery and positive experiences. In addition, research [64] shows that the more natural a green space is, the stronger the negative correlation with stress, depression, and anxiety, indicating that natural green spaces are more effective in reducing negative emotions and promoting relaxation.
From the above discussion, it is clear that the results of this study are supported by previous research. This not only confirms the relationship between the factors and user emotional responses but also indirectly validates that the proposed framework effectively and objectively captures how semantic elements in urban parks affect emotions. Overall, the results provide a new perspective for understanding how different semantic elements in urban park landscapes influence emotional tendencies.

4.3. Implications for Future Urban Park Planning and Design

Based on the results of this study, the semantic elements that affect emotional tendencies in urban parks include both positive and negative factors. How to avoid negative semantic elements and strengthen positive semantic elements is an important point for future urban park planning and design. Specifically, park design should not only focus on visual beauty but also consider users’ emotional responses and psychological experiences. It should explore how landscape space and visual elements can promote positive emotions and reduce negative emotions. Users’ emotional experiences have an important impact on their satisfaction with the park [65], so this should be valued.
In this context, the visual quality of physical park landscapes is an important factor for achieving emotion-friendly parks. Research shows that about 80% of human perception comes from vision [65]. Therefore, park physical landscapes have a strong influence on users’ emotions. From the perspective of landscape elements, plants play a key role in improving users’ psychological environment. For example, flowers and trees can increase comfort and enjoyment in the space [66]. At the same time, the color and condition of the physical landscape also affect emotions. Flowering plants and autumn leaves in red or yellow can enhance users’ restorative potential [67]. However, users’ feelings are not only determined by physical features. They result from the interaction of physical landscapes and psychological responses [68]. Therefore, maintaining and managing landscapes in good condition is also important to improve comfort and positive emotions.
In addition, previous research shows that rich recreational facilities can improve public satisfaction with urban parks [65]. Recreational facilities give users opportunities for exercise, entertainment, and social interaction. They also meet users’ leisure needs and increase comfort and enjoyment. Moreover, research [69] points out that how to improve user experience is a key question for park managers. Besides physical facilities, cultural activities and social experiences in parks are also important for emotional satisfaction. Research [70] shows that compared with non-urban green spaces, urban parks are more inclusive and promote social cohesion. Similarly, another study [71] shows that users’ sense of belonging in parks is closely related to social interaction, indicating that social interaction in parks plays an important role in shaping emotional experience. Therefore, to enhance the emotional value of urban parks, future planning and management should design and place recreational facilities while integrating cultural activities and social spaces. Physical, cultural, and social elements should work together to improve users’ overall emotional experience and satisfaction.
Building on this, it is necessary to further consider how users subjectively articulate their park experiences through language. Adjective-based evaluations commonly found on social media often represent condensed expressions of users’ overall perceptions of parks. Such descriptions reflect not only perceptions of physical facilities and environmental quality but also, to some extent, evaluations of social atmosphere and usage experience. Importantly, adjectives such as “calm,” “natural,” “lively,” or “lonely” do not inherently convey fixed or singular emotional meanings; rather, their interpretation is highly dependent on specific landscape conditions and usage contexts. For instance, in environments rich in natural elements and characterized by moderate activity levels, “calm” and “natural” are typically associated with positive experiences such as relaxation and comfort [72]. In contrast, in settings marked by spatial emptiness, inadequate maintenance, or low usage intensity, the same terms may be interpreted as silence or even loneliness. This suggests that identical semantic labels may correspond to substantially different emotional interpretations across varying spatial and contextual conditions [73,74].
From this perspective, examining emotional expressions of park landscapes at the semantic level enables a more nuanced understanding of how users integrate spatial experiences and emotional perceptions in language, while also offering a new lens for urban park planning and management. Accordingly, future practice should not only attend to facility provision and activity organization but also incorporate semantic perception analysis and remain attentive to contextual variations reflected in public evaluations, thereby allowing for a more accurate interpretation of emotional feedback under different patterns of park use [75].
Finally, this study is based on analyses of urban park photographs shared on social media platforms, and the findings provide valuable insights for future urban park landscape management and planning. However, it is important to acknowledge the inherent limitations of social media data with respect to user composition and demographic representation. These limitations should be carefully considered when translating the results into practical applications. Accordingly, future urban park management and decision-making processes would benefit from integrating the findings of this study with complementary data sources—such as on-site observations, questionnaire surveys, or administrative and management records—to mitigate potential biases associated with platform-specific user structures. Such an integrative approach can help ensure that park management strategies more comprehensively and objectively reflect the needs of diverse user groups.

4.4. Limitations and Future Research Directions

In this study, we built a framework to analyze the joint output of adjective semantics and emotional tendencies in social media-shared urban park images. Based on this framework, we also explored the relationship between them. However, this study still has some limitations.
First, although this study examined the influence of adjective semantic elements in social media urban park images on emotional tendencies, emotion is usually affected by many objective environmental factors. For example, park accessibility, air quality, noise level, and ticket price may all affect users’ emotional experiences. Since this study mainly analyzed social media images, it is hard to capture or measure non-visual environmental factors from images, which is a limitation. Therefore, future studies can try to include multimodal data such as surveys, geospatial information, or sensor data to make up for the limitations of using only image data to understand emotion generation.
Second, because social media users are mostly young people [76], the images used in this study mainly reflect young users’ perceptions and emotional reactions. However, urban park users are highly diverse, including many older people and those who do not often use social media. These groups are less represented on social media, so their emotions and perceptions are not fully captured in the study. To address this, future research can combine surveys, field interviews, or other traditional data collection methods to supplement social media data. This will provide a more complete picture of the emotional impact and experiences of urban parks across different age groups and diverse users.
In addition, beyond overcoming these limitations, future studies can further expand fine-grained modeling of image emotions based on the framework proposed in this study. This study used a three-class method based on overall emotional tendency (positive/irrelevant/negative). However, related studies on urban streetscapes show that emotional perception has richer layers. For example, subjective feelings such as “beautiful,” “boring,” “dull,” “lively,” “safe,” and “wealthy” not only indicate the direction of emotions but also reflect their content and nature [77]. Therefore, future studies can use multidimensional emotion labels and build multi-label classification or multiple regression models to describe public emotional perception of urban parks more fully. This can deepen the understanding of how different adjective semantic elements trigger diverse emotional reactions and provide more precise theoretical and practical guidance for park design optimization and emotion-oriented management.

5. Conclusions

This study built an Urban Landscape Emotion Analysis Framework (ULEAF) using a convolutional neural network (ConvNeXt Tiny) for emotion recognition and a semantic extraction module based on a multimodal semantic matching model (CLIP and part of the DeepSentiBank ANP lexicon). The framework analyzed social media images of urban parks to explore how users perceive these spaces. The main findings are as follows:
  • Among emotion classification models, ConvNeXt Tiny performed best, reaching an accuracy of 85.1%, showing strong performance in urban park image emotion recognition.
  • Quantitative validation of model recognition showed that using the CLIP model with DeepSentiBank’s Adjective–Noun Pair (ANP) method reached an overall accuracy of 89.2%, proving its effectiveness in semantic extraction.
  • Four semantic factors significantly affected image emotional tendency, with three having negative effects and one having a positive effect.
  • Positive emotion images were mainly associated with nature, health, and openness, while negative emotion images were closely related to decay, abandonment, and oppression, as well as estrangement, disharmony, gloom, and bleakness.
  • These results demonstrate a systematic relationship between urban park visual features and emotional perception.
In summary, the ULEAF effectively integrates emotion recognition and semantic extraction, overcoming the limitations of traditional single-method approaches and enabling a systematic analysis of emotions and semantic features in urban park images. Furthermore, the model demonstrated a satisfactory level of accuracy, and at the same time, its output results showed strong consistency with findings from previous studies, suggesting that the proposed framework is both reliable and practically feasible. This framework can thus serve as a solid foundation for future research and, when combined with other models or multi-source data, holds significant potential for broader applications in urban park studies.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, G.Y., L.Z. and H.X.; formal analysis, Y.Z. and G.Y.; investigation, Y.Z. and H.X.; resources, Y.Z.; data curation, Y.Z., G.Y. and L.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, L.Z., H.X. and T.J.; visualization, Y.Z.; supervision, L.Z., H.X. and T.J.; project administration, Y.Z., G.Y. and T.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by 2025 Kyungpook National University BK21 FOUR Graduate Innovation Project (International Joint Research Project for Graduate Students).

Data Availability Statement

The data are not publicly available due to ongoing plans for further analysis but are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to sincerely thank the 2025 Kyungpook National University BK21 FOUR Graduate Innovation Project (International Joint Research Project for Graduate Students) for funding this study and also sincerely thank the editors and experts for their valuable comments and suggestions on the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

As shown in Appendix Table A1, this study conducted semantic clustering of adjectives based on the cosine similarity of GloVe word embeddings. The analysis indicated that 29 adjectives could be reasonably grouped into nine categories within the semantic space. Through this clustering process, a total of 104 adjectives were obtained, which were then subjected to factor analysis. In the factor analysis results, the Adjective ID corresponds to “After GloVe Embedding: Adjective Indexing” in Appendix Table A2.
Table A1. Aggregation of Adjectives Based on GloVe Embedding Semantic Similarity.
Table A1. Aggregation of Adjectives Based on GloVe Embedding Semantic Similarity.
Original AdjectivesMerged CategoryCosine Similarity Evidence (GloVe)Notes
pretty, lovelyAttractive0.79–0.88These adjectives express general visual appeal and charm, emphasizing aesthetic pleasantness.
amazing, awesome, incredibleImpressive0.77–0.86Denote strong positive evaluation, highlighting extraordinary or notable qualities.
fantastic, splendid, outstanding, great, famous, magnificentFantastic0.84–0.90Represent high-intensity positive appraisal, indicating excellence or grandeur in perception.
funny, crazyPlayful0.78–0.85Convey unconventional, humorous, or lively characteristics.
calm, serene, peaceful, relaxing, tranquilTranquil0.80–0.87Reflect low-arousal, soothing emotional states, and perceptual serenity.
dirty, muddy, dusty, rottenUnclean0.72–0.86Indicate lack of cleanliness or presence of decay, evoking negative environmental perception.
strong, powerfulStrong0.83–0.89Emphasize physical, structural, or metaphorical strength and robustness.
shiny, sparkling, brightShiny0.81–0.88Capture visual brilliance or high luminance, indicating noticeable visual prominence.
hot, sunnyWarm0.74–0.87Represent high temperature or bright environmental conditions, often associated with positive warmth.
Table A2. Changes in Adjective Indexing Before and After GloVe Embedding.
Table A2. Changes in Adjective Indexing Before and After GloVe Embedding.
AdjectiveBefore GloVe Embedding: Adjective IndexingAfter GloVe Embedding: Adjective Indexing
abandoned120
aggressive211
amazing31
ancient412
attractive513
awesome61
bad714
beautiful833
bright98
broken103
busy1117
calm125
charming1318
cheerful1419
christian1510
classic16101
clean1739
clear1823
cloudy1924
colorful2025
comfortable2126
crazy224
creepy2327
crowded2428
cruel2529
crying2630
damaged2731
dangerous2832
dark2915
dead302
delightful3135
derelict3236
dirty336
divine3437
dry3538
dusty366
dying3722
elegant3840
empty3955
excellent4042
falling4165
famous4216
fancy4344
fantastic4416
fascinating4545
favorite4646
fluffy4747
fragile4848
fresh4949
friendly5050
funerary5151
funny524
gentle5352
golden5453
gorgeous5554
graceful5641
great5716
harsh5856
haunted5921
healing6058
healthy6159
heavy6260
holy6361
horrible6462
hot659
icy6663
incredible671
little6864
lonely6943
lost7066
loud7167
lovely7234
magical7368
magnificent7416
misty7569
muddy766
natural7770
nice7871
noisy7972
outdoor8073
outstanding8116
peaceful825
pleasant8374
poor8475
powerful857
precious8676
pretty8734
prickly8877
quaint8978
quiet9079
rainy9180
relaxing925
rotten936
rough9482
sad9581
safe9683
scary9784
scenic9885
serene995
shiny1008
slender10186
slippery10287
smelly10388
smooth10489
sparkling1058
splendid10616
stormy10790
strange10891
strong1097
stunning11092
stupid11193
sunny1129
super11394
sweet11495
tasty11596
tiny11697
traditional11798
tranquil1185
ugly11999
warm120100
weird12157
wet122102
wild123103
young124104

References

  1. Feyisa, G.L.; Dons, K.; Meilby, H. Efficiency of parks in mitigating urban heat island effect: An example from Addis Ababa. Landsc. Urban Plan. 2014, 123, 87–95. [Google Scholar] [CrossRef]
  2. Brown, R.D.; Vanos, J.; Kenny, N.; Lenzholzer, S. Designing urban parks that ameliorate the effects of climate change. Landsc. Urban Plan. 2015, 138, 118–131. [Google Scholar] [CrossRef]
  3. Taylor, D.E. Central Park as a model for social control: Urban parks, social class and leisure behavior in nineteenth-century America. J. Leis. Res. 1999, 31, 420–477. [Google Scholar] [CrossRef]
  4. Chiesura, A. The role of urban parks for the sustainable city. Landsc. Urban Plan. 2004, 68, 129–138. [Google Scholar] [CrossRef]
  5. Song, C.; Ikei, H.; Igarashi, M.; Miwa, M.; Takagaki, M.; Miyazaki, Y. Physiological and psychological responses of young males during spring-time walks in urban parks. J. Physiol. Anthropol. 2014, 33, 8. [Google Scholar] [CrossRef]
  6. Rahnema, S.; Sedaghathoor, S.; Allahyari, M.S.; Damalas, C.A.; El Bilali, H. Preferences and emotion perceptions of ornamental plant species for green space designing among urban park users in Iran. Urban For. Urban Green. 2019, 39, 98–108. [Google Scholar] [CrossRef]
  7. Zhang, L.; Liu, S.; Liu, S. Mechanisms underlying the effects of landscape features of urban community parks on health-related feelings of users. Int. J. Environ. Res. Public Health 2021, 18, 7888. [Google Scholar] [CrossRef]
  8. Kong, L.; Liu, Z.; Pan, X.; Wang, Y.; Guo, X.; Wu, J. How do different types and landscape attributes of urban parks affect visitors’ positive emotions? Landsc. Urban Plan. 2022, 226, 104482. [Google Scholar] [CrossRef]
  9. Yang, C.; Zhang, Y. Public emotions and visual perception of the East Coast Park in Singapore: A deep learning method using social media data. Urban For. Urban Green. 2024, 94, 128285. [Google Scholar] [CrossRef]
  10. Wang, X.; Jia, J.; Tang, J.; Wu, B.; Cai, L.; Xie, L. Modeling emotion influence in image social networks. IEEE Trans. Affect. Comput. 2015, 6, 286–297. [Google Scholar] [CrossRef]
  11. Nguyen-Dinh, N.; Zhang, H. How Landscape Preferences and Emotions Shape Environmental Awareness: Perspectives from University Experiences. Sustainability 2025, 17, 3161. [Google Scholar] [CrossRef]
  12. Wen, B.; Burley, J.B. Expert opinion dimensions of rural landscape quality in Xiangxi, Hunan, China: Principal component analysis and factor analysis. Sustainability 2020, 12, 1316. [Google Scholar] [CrossRef]
  13. Coughlan, M.; Cronin, P.; Ryan, F. Survey research: Process and limitations. Int. J. Ther. Rehabil. 2009, 16, 9–15. [Google Scholar] [CrossRef]
  14. Wang, Z.; Jin, Y.; Liu, Y.; Li, D.; Zhang, B. Comparing social media data and survey data in assessing the attractiveness of Beijing Olympic Forest Park. Sustainability 2018, 10, 382. [Google Scholar] [CrossRef]
  15. Huai, S.; Liu, S.; Zheng, T.; Van de Voorde, T. Are social media data and survey data consistent in measuring park visitation, park satisfaction, and their influencing factors? A case study in Shanghai. Urban For. Urban Green. 2023, 81, 127869. [Google Scholar] [CrossRef]
  16. Huang, W.; Zhao, X.; Lin, G.; Wang, Z.; Chen, M. How to quantify multidimensional perception of urban parks? Integrating deep learning-based social media data analysis with questionnaire survey methods. Urban For. Urban Green. 2025, 107, 128754. [Google Scholar] [CrossRef]
  17. Zhao, X.; Lu, Y.; Huang, W.; Lin, G. Assessing and interpreting perceived park accessibility, usability and attractiveness through texts and images from social media. Sustain. Cities Soc. 2024, 112, 105619. [Google Scholar] [CrossRef]
  18. Huang, Y.; Zheng, B. Social media users’ visual and emotional preferences of internet-famous sites in urban riverfront public spaces: A case study in Changsha, China. Land 2024, 13, 930. [Google Scholar] [CrossRef]
  19. Jia, J.; Wu, S.; Wang, X.; Hu, P.; Cai, L.; Tang, J. Can we understand van gogh’s mood? learning to infer affects from images in social networks. In Proceedings of the 20th ACM International Conference on Multimedia, Nara, Japan, 29 October–2 November 2012; pp. 857–860. [Google Scholar]
  20. Chen, M.; Arribas-Bel, D.; Singleton, A. Quantifying the characteristics of the local urban environment through geotagged flickr photographs and image recognition. ISPRS Int. J. Geo-Inf. 2020, 9, 264. [Google Scholar] [CrossRef]
  21. Huang, J.; Obracht-Prondzynska, H.; Kamrowska-Zaluska, D.; Sun, Y.; Li, L. The image of the City on social media: A comparative study using “Big Data” and “Small Data” methods in the Tri-City Region in Poland. Landsc. Urban Plan. 2021, 206, 103977. [Google Scholar] [CrossRef]
  22. Zhang, X.; Xu, D.; Zhang, N. Research on landscape perception and visual attributes based on social media data—A case study on Wuhan University. Appl. Sci. 2022, 12, 8346. [Google Scholar] [CrossRef]
  23. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 18 December 2025). [CrossRef]
  24. Rawat, W.; Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
  25. Rao, T.; Li, X.; Zhang, H.; Xu, M. Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 2019, 333, 429–439. [Google Scholar] [CrossRef]
  26. Yang, J.; She, D.; Sun, M. Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 3266–3272. [Google Scholar]
  27. Chen, W.; Wu, A.N.; Biljecki, F. Classification of urban morphology with deep learning: Application on urban vitality. Comput. Environ. Urban Syst. 2021, 90, 101706. [Google Scholar] [CrossRef]
  28. Law, S.; Seresinhe, C.I.; Shen, Y.; Gutierrez-Roig, M. Street-Frontage-Net: Urban image classification using deep convolutional neural networks. Int. J. Geogr. Inf. Sci. 2020, 34, 681–707. [Google Scholar] [CrossRef]
  29. Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19730–19742. [Google Scholar]
  30. Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
  31. Perez, J.; Fusco, G. Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes. arXiv 2025, arXiv:2504.16538. [Google Scholar] [CrossRef]
  32. Blečić, I.; Saiu, V.; Trunfio, G.A. Enhancing urban walkability assessment with multimodal Large Language models. In Proceedings of the International Conference on Computational Science and Its Applications, Hanoi, Vietnam, 1–4 July 2024; pp. 394–411. [Google Scholar]
  33. Qian, L.; Guo, J.; Qiu, H.; Zheng, C.; Ren, L. Exploring destination image of dark tourism via analyzing user generated photos: A deep learning approach. Tour. Manag. Perspect. 2023, 48, 101147. [Google Scholar] [CrossRef]
  34. Chen, T.; Borth, D.; Darrell, T.; Chang, S.-F. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv 2014, arXiv:1410.8586. [Google Scholar]
  35. Zeng, X.; Zhong, Y.; Yang, L.; Wei, J.; Tang, X. Analysis of Forest landscape preferences and emotional features of Chinese Forest recreationists based on deep learning of geotagged photos. Forests 2022, 13, 892. [Google Scholar] [CrossRef]
  36. Yan, J.; Yue, J.; Zhang, J.; Qin, P. Research on spatio-temporal characteristics of tourists’ landscape perception and emotional experience by using photo data mining. Int. J. Environ. Res. Public Health 2023, 20, 3843. [Google Scholar] [CrossRef]
  37. Zubić, N.; Soldá, F.; Sulser, A.; Scaramuzza, D. Limits of deep learning: Sequence modeling through the lens of complexity theory. arXiv 2024, arXiv:2405.16674. [Google Scholar] [CrossRef]
  38. Gao, F.; Liao, S.; Wang, Z.; Cai, G.; Feng, L.; Yang, Z.; Chen, W.; Chen, X.; Li, G. Revealing disparities in different types of park visits based on cellphone signaling data in Guangzhou, China. J. Environ. Manag. 2024, 351, 119969. [Google Scholar] [CrossRef]
  39. Yang, W.; Li, X.; Feng, X. Examining the scale effect of nearby residential green space on residents’ BMI: A case study of Guangzhou, China. Urban For. Urban Green. 2024, 95, 128329. [Google Scholar] [CrossRef]
  40. Muhammad, R.; Zhao, Y.; Liu, F. Spatiotemporal analysis to observe gender based check-in behavior by using social media big data: A case study of Guangzhou, China. Sustainability 2019, 11, 2822. [Google Scholar] [CrossRef]
  41. Wang, Z.; Zhu, Z.; Xu, M.; Qureshi, S. Fine-grained assessment of greenspace satisfaction at regional scale using content analysis of social media and machine learning. Sci. Total Environ. 2021, 776, 145908. [Google Scholar] [CrossRef]
  42. Liu, W.; Hu, X.; Song, Z.; Yuan, X. Identifying the integrated visual characteristics of greenway landscape: A focus on human perception. Sustain. Cities Soc. 2023, 99, 104937. [Google Scholar] [CrossRef]
  43. Zhao, X.; Huang, H.; Lin, G.; Lu, Y. Exploring temporal and spatial patterns and nonlinear driving mechanism of park perceptions: A multi-source big data study. Sustain. Cities Soc. 2025, 119, 106083. [Google Scholar] [CrossRef]
  44. Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
  45. Yu, W.; Zhou, P.; Yan, S.; Wang, X. Inceptionnext: When inception meets convnext. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5672–5683. [Google Scholar]
  46. Min, Z.; Ge, Q.; Tai, C. Why the pseudo label based semi-supervised learning algorithm is effective? arXiv 2022, arXiv:2211.10039. [Google Scholar]
  47. Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the Workshop on Challenges in Representation Learning, ICML, Atlanta, GA, USA, 28 June 2013; p. 896. [Google Scholar]
  48. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
  49. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
  50. Ashley, R.; Lloyd, J. An example of the use of factor analysis and cluster analysis in groundwater chemistry interpretation. J. Hydrol. 1978, 39, 355–364. [Google Scholar] [CrossRef]
  51. Yong, A.G.; Pearce, S. A beginner’s guide to factor analysis: Focusing on exploratory factor analysis. Tutor. Quant. Methods Psychol. 2013, 9, 79–94. [Google Scholar] [CrossRef]
  52. LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
  53. Masciantonio, A.; Heiser, N.; Cherbonnier, A. Unveiling the Positivity Bias on Social Media: A Registered Experimental Study on Facebook, Instagram, and X. Collabra Psychol. 2025, 11, 132410. [Google Scholar] [CrossRef]
  54. Schreurs, L.; Meier, A.; Vandenbosch, L. Exposure to the positivity bias and adolescents’ differential longitudinal links with social comparison, inspiration and envy depending on social media literacy. Curr. Psychol. 2023, 42, 28221–28241. [Google Scholar] [CrossRef]
  55. Zhao, X.; Huang, H.; Yang, T.; Lu, Y.; Zhang, L.; Wang, R.; Liu, Z.; Zhong, T.; Liu, T. Urban planning in the age of large language models: Assessing OpenAI o1’s performance and capabilities across 556 tasks. Comput. Environ. Urban Syst. 2025, 121, 102332. [Google Scholar] [CrossRef]
  56. Luusua, A.; Ylipulli, J.; Foth, M.; Aurigi, A. Urban AI: Understanding the emerging role of artificial intelligence in smart cities. AI Soc. 2023, 38, 1039–1044. [Google Scholar] [CrossRef]
  57. Wang, R.; Zhao, J. Demographic groups’ differences in visual preference for vegetated landscapes in urban green space. Sustain. Cities Soc. 2017, 28, 350–357. [Google Scholar] [CrossRef]
  58. Hofmann, M.; Westermann, J.R.; Kowarik, I.; Van der Meer, E. Perceptions of parks and urban derelict land by landscape planners and residents. Urban For. Urban Green. 2012, 11, 303–312. [Google Scholar] [CrossRef]
  59. Roberts, H.; Kellar, I.; Conner, M.; Gidlow, C.; Kelly, B.; Nieuwenhuijsen, M.; McEachan, R. Associations between park features, park satisfaction and park use in a multi-ethnic deprived urban area. Urban For. Urban Green. 2019, 46, 126485. [Google Scholar] [CrossRef]
  60. Deng, L.; Li, X.; Luo, H.; Fu, E.-K.; Ma, J.; Sun, L.-X.; Huang, Z.; Cai, S.-Z.; Jia, Y. Empirical study of landscape types, landscape elements and landscape components of the urban park promoting physiological and psychological restoration. Urban For. Urban Green. 2020, 48, 126488. [Google Scholar] [CrossRef]
  61. Yang, L.; Wu, Q.; Lyu, J. Which affects park satisfaction more, environmental features or spatial pattern? Landsc. Ecol. 2025, 40, 1–24. [Google Scholar] [CrossRef]
  62. Zhu, X.; Gao, M.; Zhang, R.; Zhang, B. Quantifying emotional differences in urban green spaces extracted from photos on social networking sites: A study of 34 parks in three cities in northern China. Urban For. Urban Green. 2021, 62, 127133. [Google Scholar] [CrossRef]
  63. Knez, I.; Ode Sang, Å.; Gunnarsson, B.; Hedblom, M. Wellbeing in urban greenery: The role of naturalness and place identity. Front. Psychol. 2018, 9, 491. [Google Scholar] [CrossRef]
  64. Bressane, A.; Silva, M.B.; Goulart, A.P.G.; Medeiros, L.C.d.C. Understanding how green space naturalness impacts public well-being: Prospects for designing healthier cities. Int. J. Environ. Res. Public Health 2024, 21, 585. [Google Scholar] [CrossRef]
  65. Liu, R.; Xiao, J. Factors affecting users’ satisfaction with urban parks through online comments data: Evidence from Shenzhen, China. Int. J. Environ. Res. Public Health 2021, 18, 253. [Google Scholar] [CrossRef]
  66. Wan, C.; Shen, G.Q.; Choi, S. Eliciting users’ preferences and values in urban parks: Evidence from analyzing social media data from Hong Kong. Urban For. Urban Green. 2021, 62, 127172. [Google Scholar] [CrossRef]
  67. Kuper, R. Effects of flowering, foliation, and autumn colors on preference and restorative potential for designed digital landscape models. Environ. Behav. 2020, 52, 544–576. [Google Scholar] [CrossRef]
  68. Wang, R.; Zhao, J.; Liu, Z. Consensus in visual preferences: The effects of aesthetic quality and landscape types. Urban For. Urban Green. 2016, 20, 210–217. [Google Scholar] [CrossRef]
  69. Berto, R. Exposure to restorative environments helps restore attentional capacity. J. Environ. Psychol. 2005, 25, 249–259. [Google Scholar] [CrossRef]
  70. Peters, K.; Elands, B.; Buijs, A. Social interactions in urban parks: Stimulating social cohesion? Urban For. Urban Green. 2010, 9, 93–100. [Google Scholar] [CrossRef]
  71. Mullenbach, L.E.; Stanis, S.A.W.; Piontek, E. Interracial interaction, park ownership, belonging, community asset, and perceived provision of cultural ecosystem services. Urban For. Urban Green. 2024, 101, 128551. [Google Scholar] [CrossRef]
  72. Welch, D.; Shepherd, D.; Dirks, K.; Tan, M.Y.; Coad, G. Use of creative writing to develop a semantic differential tool for assessing soundscapes. Front. Psychol. 2019, 9, 2698. [Google Scholar] [CrossRef]
  73. Herranz-Pascual, K.; Aspuru, I.; Iraurgi, I.; Santander, Á.; Eguiguren, J.L.; García, I. Going beyond quietness: Determining the emotionally restorative effect of acoustic environments in urban open public spaces. Int. J. Environ. Res. Public Health 2019, 16, 1284. [Google Scholar] [CrossRef]
  74. Hou, J.; Wang, Y.; Zhang, X.; Qiu, L.; Gao, T. The effect of visibility on green space recovery, perception and preference. Trees For. People 2024, 16, 100538. [Google Scholar] [CrossRef]
  75. Song, Y.; Zhang, B. Using social media data in understanding site-scale landscape architecture design: Taking Seattle Freeway Park as an example. Landsc. Res. 2020, 45, 627–648. [Google Scholar] [CrossRef]
  76. Kaplan, A.M.; Haenlein, M. Users of the world, unite! The challenges and opportunities of Social Media. Bus. Horiz. 2010, 53, 59–68. [Google Scholar] [CrossRef]
  77. Zhao, X.; Lu, Y.; Lin, G. An integrated deep learning approach for assessing the visual qualities of built environments utilizing street view images. Eng. Appl. Artif. Intell. 2024, 130, 107805. [Google Scholar] [CrossRef]
Figure 1. Location map of Guangzhou and the 82 urban parks considered in this study.
Figure 1. Location map of Guangzhou and the 82 urban parks considered in this study.
Applsci 16 00127 g001
Figure 3. Framework Architecture of the Urban Landscape Emotion Analysis Framework (ULEAF).
Figure 3. Framework Architecture of the Urban Landscape Emotion Analysis Framework (ULEAF).
Applsci 16 00127 g003
Figure 4. Training Curves of Test Set Accuracy for Different Models.
Figure 4. Training Curves of Test Set Accuracy for Different Models.
Applsci 16 00127 g004
Figure 5. Training Curves of Test Set Loss for Different Models.
Figure 5. Training Curves of Test Set Loss for Different Models.
Applsci 16 00127 g005
Figure 6. Predicted vs. True Labels.
Figure 6. Predicted vs. True Labels.
Applsci 16 00127 g006
Figure 7. Results of Sentiment Analysis on 20,000 Images.
Figure 7. Results of Sentiment Analysis on 20,000 Images.
Applsci 16 00127 g007
Figure 8. Word Cloud Visualization of Adjectives Based on Frequency Analysis.
Figure 8. Word Cloud Visualization of Adjectives Based on Frequency Analysis.
Applsci 16 00127 g008
Figure 9. Factor Analysis of Semantic Elements of Adjectives (The Adjective ID corresponds to “After GloVe Embedding: Adjective Indexing” in Appendix Table A2).
Figure 9. Factor Analysis of Semantic Elements of Adjectives (The Adjective ID corresponds to “After GloVe Embedding: Adjective Indexing” in Appendix Table A2).
Applsci 16 00127 g009
Table 1. Accuracy (ACC) and Loss of Different CNN Image Classification Models on the Test Set.
Table 1. Accuracy (ACC) and Loss of Different CNN Image Classification Models on the Test Set.
Model NameAccuracyLoss
ConvNeXt Tiny0.85100.1670
EfficientNet-B00.82310.2670
ResNet-180.83400.2430
ResNet-500.82790.2880
Table 2. Top 20 Adjectives by Frequency for Positive and Negative Sentiment Labels.
Table 2. Top 20 Adjectives by Frequency for Positive and Negative Sentiment Labels.
No.Positive AdjectiveFrequencyNegative AdjectiveFrequency
1calm9185broken6049
2natural5373calm3323
3beautiful4323natural1554
4attractive3337scenic1276
5scenic3226dead979
6sunny2159busy906
7outdoor1893ancient836
8clean1344sunny807
9wild1187crying773
10colorful1145rough718
11amazing1096lonely655
12empty950damaged649
13healthy936dry644
14quiet934strange615
15charming829nice595
16dark762haunted572
17haunted719amazing559
18dry703favorite506
19young660traditional480
20golden604creepy475
Table 3. Effect of Various Semantic Factors on Sentiment Polarity.
Table 3. Effect of Various Semantic Factors on Sentiment Polarity.
Factor NumberFeatureβp
1Decay, abandonment, and oppression−1.0830.000
2Nature, health, and openness0.4880.000
3Loneliness and calmness−0.0010.951 > 0.001 *
4Estrangement and disharmony−0.7730.000
5Gloom and bleakness−0.6420.000
* indicates a p-value greater than 0.001, suggesting that the result is not statistically significant.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Yu, G.; Zhang, L.; Jung, T.; Xu, H. A Deep Learning Framework for Emotion Recognition and Semantic Interpretation of Social Media Images in Urban Parks: The ULEAF Approach. Appl. Sci. 2026, 16, 127. https://doi.org/10.3390/app16010127

AMA Style

Zhang Y, Yu G, Zhang L, Jung T, Xu H. A Deep Learning Framework for Emotion Recognition and Semantic Interpretation of Social Media Images in Urban Parks: The ULEAF Approach. Applied Sciences. 2026; 16(1):127. https://doi.org/10.3390/app16010127

Chicago/Turabian Style

Zhang, Yujie, Ganyang Yu, Lei Zhang, Taeyeol Jung, and Hongbin Xu. 2026. "A Deep Learning Framework for Emotion Recognition and Semantic Interpretation of Social Media Images in Urban Parks: The ULEAF Approach" Applied Sciences 16, no. 1: 127. https://doi.org/10.3390/app16010127

APA Style

Zhang, Y., Yu, G., Zhang, L., Jung, T., & Xu, H. (2026). A Deep Learning Framework for Emotion Recognition and Semantic Interpretation of Social Media Images in Urban Parks: The ULEAF Approach. Applied Sciences, 16(1), 127. https://doi.org/10.3390/app16010127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop