Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area
Abstract
1. Introduction
- How can advanced LVLMs be optimized for precise multidimensional evaluation of URS perceptions?
- What typologies characterize URSs, and how do key spatial elements relate to specific affective responses (e.g., pleasure, interest, comfort, nostalgia) across different perceptual dimensions?
- What nonlinear relationships and threshold effects exist between built environment features and URS perceptions?
2. Materials and Methods
2.1. Study Area
2.2. Research Framework
2.3. Data Acquisition
2.4. Development of Interpretable LVLMs
2.4.1. Manually Annotated Dataset
2.4.2. Model Structure
2.4.3. Model Evaluation
2.4.4. Topic Analysis Based on the Interpretation Layer
- (1)
- Synonym consolidation: High-frequency keywords were semantically integrated using a predefined synonym dictionary to eliminate terminological inconsistencies and improve analytical accuracy. For instance, variants including “coffee shop”, ”café”, and “coffee bar” were standardized as the canonical form “coffee shop”.
- (2)
- Keyword vectorization: We employed Tencent AI Lab’s Chinese Word and Sentence Embedding Corpus to convert both keyword categories into 200-dimensional semantic vectors. This comprehensive corpus provides vector representations for over 8 million Chinese lexical items, effectively establishing a high-dimensional semantic space for subsequent cluster analysis while preserving nuanced linguistic relationships.
- (3)
- Thematic term clustering: The K-means clustering algorithm was implemented on the vectorized thematic terms, with the optimal cluster number determined through a combined evaluation of the elbow method and silhouette coefficient. For visualization, we applied UMAP to reduce the high-dimensional vectors to two-dimensional space, generating scatter plots where point diameters correspond to term frequencies and color gradients represent distinct thematic clusters.
- (4)
- Modifier word cloud generation: Within each perceptual cluster, we generated weighted word clouds based on modifier term frequency distributions, using cluster-specific color schemes. This approach effectively reveals the systematic relationships between spatial elements represented by theme terms and their associated emotional qualities captured by modifier terms, enabling comprehensive perception deconstruction.
2.5. Exploring the Nonlinear Influencing Mechanisms of the Built Environment on URS Perception
2.5.1. Variable Selection
2.5.2. Machine Learning Modeling
3. Results
3.1. Training and Evaluating the Qwen2.5-VL-7B-SFT
3.2. Thematic Analysis of URS Perception
3.3. Analysis of Influencing Factors on URS Perception
4. Discussion
4.1. Methodological Contribution
4.2. Thematic Composition of URS Perception
4.3. Effects of Built Environmental Characteristics on URS Perceptions
4.4. Research Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| URSs | Urban recreational spaces |
| LVLMs | Large vision-language models |
| SMD | Social media data |
| SFT | Supervised fine-tuning |
| LoRA | Low-Rank Adaptation |
| CoT | Chain-of-Thought |
| SHAP | SHapley Additive exPlanations |
| RFE | Recursive feature elimination |
| XGBoost | eXtreme Gradient Boosting |
| GBDT | Gradient Boosted Decision Trees |
| LightGBM | Light Gradient Boosting Machine |
| RF | Random Forest |
| MAE | Mean Absolute Error |
| RMSE | Root Mean Square Error |
| R2 | R-squared |
| ACC | Accuracy |
Appendix A. Urban Recreational Spaces Perception Dimensions Description
| Description | Examples | ||
|---|---|---|---|
| Esthetics | |||
| Positive | Favorable visual experience and appreciation of spatial quality. Emphasizing explicit expressions of admiration for visual beauty. | Beihai Park · White Pagoda Peach Blossoms and Willow Banks are called Spring and Jingming, still remembered in the children’s song: “Let’s swing our sculls, and the boat pushes away the waves. “ The sea surface reflects the beautiful White Pagoda, surrounded by green trees and red walls…… | ![]() |
| Negative | Unfavorable visual experience and criticism of spatial quality. Emphasizes explicit expressions of dissatisfaction with visual defects and environmental discordance. | Wintersweet near Qingyin Pavilion has bloomed late, and recently it has not been the flowering period of Taoranting. The slightly overcast sky, the dead wood, and the residual lotus are somewhat dark and windy. The Chinese National Garden reproduces many famous gardens in the south. It is estimated that it will look good when the north winter is gone and there are green trees and flowing water…… | ![]() |
| Attractiveness | |||
| Positive | Spatial characteristics that inspire interest and participation. Emphasizes environments with captivating, worth-exploring qualities. | Huguo Temple | The online celebrity ladder in the courtyard was evaluated some days ago, and grass has been planted. It is located in the courtyard of Huguo Temple, where there is a scenic courtyard and Z03 coffee…… | ![]() |
| Negative | Spatial characteristics lacking appeal or participatory value. Emphasizes environments perceived as dull or not worth exploring. | Is this the online celebrity punch-in place? It is really not so good; it is too ordinary. It is recommended to go all the way north to Wangfujing. | ![]() |
| Culturality | |||
| Positive | Successful preservation and presentation of cultural heritage. Emphasizes well-conserved cultural traditions or historical features. | Shijia Hutong is said to be “one hutong, half of China”. Many celebrities and scholars once lived in this hutong, and Prince William of England also visited the first Hutong museum in China. Shu Yi, the signature on the plaque, is the son of writer Lao She. | ![]() |
| Negative | Loss or compromise of cultural-historical values. Emphasizes damaged or inappropriately modified cultural elements. | For many years, this street recorded many memories of my life in the hutong: joys, sorrows, and further joys. Now it is full of commerce and no fireworks and no smell. | ![]() |
| Restorativeness | |||
| Positive | Positive experiences of physical or mental restoration. Emphasizes environments conducive to stress relief and mental recovery. | Beijing Hutong! This is a small shop overlooking the Lama Temple. It is so good to drink coffee in the alley in autumn. I met a little stray cat by chance. Cat lovers are friendly. When the weather is fine, I come to the terrace with my friends to drink coffee and bask in the sun. | ![]() |
| Negative | Negative experiences causing physical or mental discomfort. Emphasizes environments with adverse psychological or physiological impacts. | 11 May 2024: The last day of traveling to Beijing with my roommate. Let me be the first to say that the conclusion is very bad. It is terrible. In particular, when I walked in the Guomao Building myself, I found that I felt depression instead of prosperity. Maybe I should not have come…… | ![]() |
Appendix A.1. Prompt for Qwen2.5-VL-7B-SFT
- [Task Description]
- [URS Perception Description]
- Esthetics:
- Attractiveness:
- Culturality:
- Restorativeness:
Appendix B. Machine Learning Model Performance Enhancement Technology
Appendix B.1. Feature Screening
Appendix B.2. Optuna Hyperparameter Tuning
| Perception Dimension | Feature Selection | Feature Subset |
|---|---|---|
| Esthetics | Boruta | BD, NTAD, NBSD, WI, W, AHH, E, SVF, GVI, SHDI, PFD, RND, HSD |
| RFE | AHH, PFD, NTAD, E SVF RND NMSD, SHDI, GVI, WI, AOS | |
| Attractiveness | Boruta | BD, NTAD, NMBD, WI, W, AHH, E, GVI, SHDI, PR, PFD, RND, HSD |
| RFE | PFD, AHH, RND, PR, GVI, NTAD, NMSD, E, SHDI, BD, W | |
| Culturality | Boruta | BD, BC, NTAD, NMBD, SSI, SSC, WI, W, AHH, GVI, RND, SHDI, PR, PFD, E, HSD |
| RFE | AHH, PFD, RND, PR, NMSD, SHDI, NTAD, E, GVI, WI, HSD | |
| Restorativeness | Boruta | BD, NTAD, NMBD, WI, W, AHH, E, GVI, MSD, SHDI, PFD, RND, HSD |
| RFE | PFD, AHH, RND, E, NTAD, SHDI, NMSD, GVI, W, WI, HSD |
Appendix B.3. Comparison of Machine Learning Models
| Perception Dimension | Feature Screening | XGBoost | RF | GBDT | LightGBM | |
|---|---|---|---|---|---|---|
| Esthetics | R2 | Boruta | 0.5487 | 0.4942 | 0.5358 | 0.5436 |
| RFE | 0.5593 | 0.5066 | 0.5575 | 0.5959 | ||
| RMSE | Boruta | 1700.5394 | 1800.2408 | 1724.6631 | 1710.006 | |
| RFE | 1680.3977 | 1777.8764 | 1683.7445 | 1609.1301 | ||
| MAE | Boruta | 841.5249 | 890.3494 | 841.0683 | 860.2127 | |
| RFE | 839.9607 | 894.5046 | 825.5100 | 792.0527 | ||
| Attractiveness | R2 | Boruta | 0.7000 | 0.5584 | 0.6444 | 0.7089 |
| RFE | 0.6951 | 0.51628 | 0.6382 | 0.7239 | ||
| RMSE | Boruta | 6278.7616 | 7618.2421 | 6836.3824 | 6184.9220 | |
| RFE | 6330.2019 | 7973.0699 | 6895.2056 | 6023.8856 | ||
| MAE | Boruta | 2902.8784 | 3381.3959 | 3104.4383 | 2839.2208 | |
| RFE | 3007.9538 | 3514.4672 | 3244.3208 | 2868.1066 | ||
| Culture | R2 | Boruta | 0.5785 | 0.5692 | 0.5769 | 0.5661 |
| RFE | 0.6311 | 0.5768 | 0.6154 | 0.6319 | ||
| RMSE | Boruta | 2541.9314 | 2569.6580 | 2546.5204 | 2578.9323 | |
| RFE | 2378.0264 | 2547.0606 | 2427.8537 | 2375.5058 | ||
| MAE | Boruta | 1271.4136 | 1290.9579 | 1277.9152 | 1319.6692 | |
| RFE | 1167.6808 | 1260.8989 | 1175.0645 | 1166.0874 | ||
| Restorativeness | R2 | Boruta | 0.620294 | 0.54867 | 0.6084 | 0.60921 |
| RFE | 0.64803 | 0.55655 | 0.6275 | 0.63753 | ||
| RMSE | Boruta | 7813.3988 | 8518.4787 | 7935.1976 | 7926.6090 | |
| RFE | 7522.5223 | 8443.7944 | 7739.2791 | 7633.9517 | ||
| MAE | Boruta | 3414.1766 | 3584.0825 | 3445.5327 | 3425.2580 | |
| RFE | 3361.6974 | 3579.3412 | 3407.0083 | 3330.4188 |
Appendix B.4. Shapley Additive Explanation (SHAP)
Appendix C. Negative Comments in Urban Recreational Space
| Model | Esthetics | Attractiveness | Culturality | Restorativeness | |
|---|---|---|---|---|---|
| ACC | Qwen2.5-VL-7B-SFT | 0.9226 | 0.8607 | 0.9691 | 0.8994 |
| Precision | Qwen2.5-VL-7B-SFT | 0.9265 | 0.8604 | 0.9684 | 0.9013 |
| Recall | Qwen2.5-VL-7B-SFT | 0.9226 | 0.8607 | 0.9691 | 0.8994 |
| F1 score | Qwen2.5-VL-7B-SFT | 0.9243 | 0.8591 | 0.9669 | 0.9002 |

References
- Wu, B.; Dong, L.; Tang, Z. A study on categories an attributes of public urban recreation space. Chin. Landsc. Archit. 2003, 5, 48–50. [Google Scholar]
- Yu, L.; Liu, J.; Li, T. Important Progress and Future Prospects for Studies on Urban Public Recreational Space in China. J. Geogr. Sci. 2019, 29, 1923–1946. [Google Scholar] [CrossRef]
- Li, H. Evaluation and optimization countermeasures for service functions of urban ecological recreation space. City Plan. Rev. 2015, 39, 63–69. [Google Scholar]
- Kang, L.; Yang, Z.; Han, F. The Impact of Urban Recreation Environment on Residents’ Happiness—Based on a Case Study in China. Sustainability 2021, 13, 5549. [Google Scholar] [CrossRef]
- Li, J.; Guo, X.; You, J.; He, Z.; Yang, Z.; Wang, L. Perception and Drivers of Cultural Ecosystem Services in Waterfront Green Spaces: Insights from Social Media Text Analysis. Anthropocene 2025, 50, 100477. [Google Scholar] [CrossRef]
- Huang, W.; Zhao, X.; Lin, G.; Wang, Z.; Chen, M. How to Quantify Multidimensional Perception of Urban Parks? Integrating Deep Learning-Based Social Media Data Analysis with Questionnaire Survey Methods. Urban For. Urban Green. 2025, 107, 128754. [Google Scholar] [CrossRef]
- Li, Y.; Zhao, B.; Jiang, B.; Jia, X.; Li, H.; Zhang, J. Beyond Visits: Investigating the Restorative Pathways and Cumulative Effects of Park Engagement and Sustained Exposure on Psychological Well-Being with Park Type as a Moderator. Environ. Res. 2025, 276, 121520. [Google Scholar] [CrossRef]
- Zhao, X.; Lu, Y.; Huang, W.; Lin, G. Assessing and Interpreting Perceived Park Accessibility, Usability and Attractiveness through Texts and Images from Social Media. Sustain. Cities Soc. 2024, 112, 105619. [Google Scholar] [CrossRef]
- Reid, W.V.; Mooney, H.A.; Cropper, A.; Capistrano, D.; Carpenter, S.R.; Chopra, K.; Dasgupta, P.; Dietz, T.; Duraiappah, A.K.; Hassan, R.; et al. Ecosystems and Human Well-Being—Synthesis: A Report of the Millennium Ecosystem Assessment; Island Press: Washington, DC, USA, 2005; ISBN 978-1-59726-040-4. [Google Scholar]
- Li, J.; Gao, J.; Zhang, Z.; Fu, J.; Shao, G.; Zhao, Z.; Yang, P. Insights into Citizens’ Experiences of Cultural Ecosystem Services in Urban Green Spaces Based on Social Media Analytics. Landsc. Urban Plan. 2024, 244, 104999. [Google Scholar] [CrossRef]
- Mundher, R.; Al-Sharaa, A.; Al-Helli, M.; Gao, H.; Abu Bakar, S. Visual Quality Assessment of Historical Street Scenes: A Case Study of the First “Real” Street Established in Baghdad. Heritage 2022, 5, 3680–3704. [Google Scholar] [CrossRef]
- Chen, S.; Meng, B.; Liu, N.; Qi, Z.; Liu, J.; Wang, J. Cultural Perception of the Historical and Cultural Blocks of Beijing Based on Weibo Photos. Land 2022, 11, 495. [Google Scholar] [CrossRef]
- Jiang, S.; Liu, J. Comparative Study of Cultural Landscape Perception in Historic Districts from the Perspectives of Tourists and Residents. Land 2024, 13, 353. [Google Scholar] [CrossRef]
- Chen, X.; Sun, Y.; Ibrahim, F.I.B.; Kamarazaly, M.A.B.; Abidin, S.N.B.Z.; Tang, S. Social Media Interaction and Built Environment Effects on Urban Walking Experience: A Machine Learning Analysis of Shanghai Citywalk. PLoS ONE 2025, 20, e0320951. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J. An Analysis of Spatial Accessibility Changes According to the Attractiveness Index of Public Libraries Using Social Media Data. Sustainability 2021, 13, 9087. [Google Scholar] [CrossRef]
- Du, X.; Zhang, Y.; Lv, Z. Investigations and Analysis of Indoor Environment Quality of Green and Conventional Shopping Mall Buildings Based on Customers’ Perception. Build. Environ. 2020, 177, 106851. [Google Scholar] [CrossRef]
- Nguyen, T.V.T.; Han, H.; Sahito, N. Role of Urban Public Space and the Surrounding Environment in Promoting Sustainable Development from the Lens of Social Media. Sustainability 2019, 11, 5967. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, D. History, Modernity, and City Branding in China: A Multimodal Critical Discourse Analysis of Xi’an’s Promotional Videos on Social Media. Soc. Semiot. 2023, 33, 402–425. [Google Scholar] [CrossRef]
- Tang, Y.; Li, L.; Gan, Y.; Xie, S. Investigating Resident–Tourist Sharing of Urban Public Recreation Space and Its Influencing Factors. ISPRS Int. J. Geo-Inf. 2024, 13, 305. [Google Scholar] [CrossRef]
- Ding, J.; Liu, N. How can social media construct the spatial image of a “internet-famous city”. News Writ. 2021, 9, 87–91. [Google Scholar] [CrossRef]
- Huang, J.; Obracht-Prondzynska, H.; Kamrowska-Zaluska, D.; Sun, Y.; Li, L. The Image of the City on Social Media: A Comparative Study Using “Big Data” and “Small Data” Methods in the Tri-City Region in Poland. Landsc. Urban Plan. 2021, 206, 103977. [Google Scholar] [CrossRef]
- Guo, C.; Yang, Y. A Multi-Modal Social Media Data Analysis Framework: Exploring the Complex Relationships among Urban Environment, Public Activity, and Public Perception—A Case Study of Xi’an, China. Ecol. Indic. 2025, 171, 113118. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, X.; Jiang, M. Empirical Study on Emotional Perception and Restorative Effects of Suzhou Garden Landscapes: Text Mining and Statistical Analysis. Land 2025, 14, 122. [Google Scholar] [CrossRef]
- Huang, Y.; Li, Z.; Huang, Y. User Perception of Public Parks: A Pilot Study Integrating Spatial Social Media Data with Park Management in the City of Chicago. Land 2022, 11, 211. [Google Scholar] [CrossRef]
- Marine, N.; Arnaiz-Schmitz, C.; Santos-Cid, L.; Schmitz, M.F. Can We Foresee Landscape Interest? Maximum Entropy Applied to Social Media Photographs: A Case Study in Madrid. Land 2022, 11, 715. [Google Scholar] [CrossRef]
- Tieskens, K.F.; Van Zanten, B.T.; Schulp, C.J.E.; Verburg, P.H. Aesthetic Appreciation of the Cultural Landscape through Social Media: An Analysis of Revealed Preference in the Dutch River Landscape. Landsc. Urban Plan. 2018, 177, 128–137. [Google Scholar] [CrossRef]
- Yang, C.; Zhang, Y. Public Emotions and Visual Perception of the East Coast Park in Singapore: A Deep Learning Method Using Social Media Data. Urban For. Urban Green. 2024, 94, 128285. [Google Scholar] [CrossRef]
- Parnami, A.; Lee, M. Learning from Few Examples: A Summary of Approaches to Few-Shot Learning. arXiv 2022, arXiv:2203.04291. [Google Scholar] [CrossRef]
- Luo, H.; Zhang, Z.; Zhu, Q.; Houda Ben Ameur, N.E.; Liu, X.; Ding, F.; Cai, Y. Using Large Language Models to Investigate Cultural Ecosystem Services Perceptions: A Few-Shot and Prompt Method. Landsc. Urban Plan. 2025, 258, 105323. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021. [Google Scholar]
- Leung, T.M.; Miao, S.; Lin, M.; Hou, H.; Sun, M. Tourist Walkability in Traditional Villages: The Role of Built Environment, Shareability, and Personal Attributes. Sustainability 2025, 17, 5311. [Google Scholar] [CrossRef]
- Malekzadeh, M.; Willberg, E.; Torkko, J.; Toivonen, T. Urban Attractiveness According to ChatGPT: Contrasting AI and Human Insights. Comput. Environ. Urban Syst. 2025, 117, 102243. [Google Scholar] [CrossRef]
- Sun, P.; Zhao, H.; Zhong, J.; Cao, S.; Gao, M. Popularity Influence Mechanism of Coastal Spaces in Urban Areas: Insights from Multi-Modal Large Language Models. Cities 2025, 161, 105909. [Google Scholar] [CrossRef]
- Martí, P.; Serrano-Estrada, L.; Nolasco-Cirugeda, A. Social Media Data: Challenges, Opportunities and Limitations in Urban Studies. Comput. Environ. Urban Syst. 2019, 74, 161–174. [Google Scholar] [CrossRef]
- Zhou, S.; Wang, H.; Li, D.; Ng, S.T.; Wei, R.; Zhao, Y.; Zhou, Y. Revealing Public Attitudes toward Mobile Cabin Hospitals during Covid-19 Pandemic: Sentiment and Topic Analyses Using Social Media Data in China. Sustain. Cities Soc. 2024, 107, 105440. [Google Scholar] [CrossRef]
- Yang, Y.; Du, S.; Xiao, Y. Identification of Spatial Influencing Factors and Enhancement Strategies for Cultural Tourism Experience in Huizhou Historic Districts. Buildings 2025, 15, 1568. [Google Scholar] [CrossRef]
- Wu, W.; Gaubatz, P. The Chinese City, 2nd ed.; Routledge: London, UK, 2020; ISBN 978-0-429-82955-0. [Google Scholar]
- Wang, Z.; Huang, W.-J.; Liu-Lastres, B. Impact of User-Generated Travel Posts on Travel Decisions: A Comparative Study on Weibo and Xiaohongshu. Ann. Tour. Res. Empir. Insights 2022, 3, 100064. [Google Scholar] [CrossRef]
- Bai, S.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; Song, S.; Dang, K.; Wang, P.; Wang, S.; Tang, J.; et al. Qwen2.5-VL Technical Report. arXiv 2025, arXiv:2502.13923. [Google Scholar] [CrossRef]
- Li, Q. Parameter Efficient Fine-Tuning on Selective Parameters for Transformer-Based Pre-Trained Models. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; pp. 1–6. [Google Scholar]
- Post-Hoc Interpretability for Neural NLP: A Survey | ACM Computing Surveys. Available online: https://dl.acm.org/doi/10.1145/3546577 (accessed on 16 October 2025).
- Zhu, J.; Wang, W.; Chen, Z.; Liu, Z.; Ye, S.; Gu, L.; Tian, H.; Duan, Y.; Su, W.; Shao, J.; et al. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. arXiv 2025, arXiv:2504.10479. [Google Scholar]
- Huang, B.; Zhou, Y.; Li, Z.; Song, Y.; Cai, J.; Tu, W. Evaluating and Characterizing Urban Vibrancy Using Spatial Big Data: Shanghai as a Case Study. Environ. Plan. B Urban Anal. City Sci. 2020, 47, 1543–1559. [Google Scholar] [CrossRef]
- Hou, X.; Chen, P. Analysis of Road Safety Perception and Influencing Factors in a Complex Urban Environment—Taking Chaoyang District, Beijing, as an Example. ISPRS Int. J. Geo-Inf. 2024, 13, 272. [Google Scholar] [CrossRef]
- Yang, D.; Lin, Q.; Li, H.; Chen, J.; Ni, H.; Li, P.; Hu, Y.; Wang, H. Unraveling Spatial Nonstationary and Nonlinear Dynamics in Life Satisfaction: Integrating Geospatial Analysis of Community Built Environment and Resident Perception via MGWR, GBDT, and XGBoost. ISPRS Int. J. Geo-Inf. 2025, 14, 131. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, S.; Ma, H.; Shan, T.; Xu, D.; Sun, F. Nonlinear Effect of Urban Visual Environment on Residents’ Psychological Perception—An Analysis Based on XGBoost and SHAP Interpretation Model. City Environ. Interact. 2025, 27, 100202. [Google Scholar] [CrossRef]
- Wohlwill, J.F. Environmental Aesthetics: The Environment as a Source of Affect. In Human Behavior and Environment; Altman, I., Wohlwill, J.F., Eds.; Springer: Boston, MA, USA, 1976; pp. 37–86. ISBN 978-1-4684-2552-9. [Google Scholar]
- Gugulica, M.; Burghardt, D. Mapping Indicators of Cultural Ecosystem Services Use in Urban Green Spaces Based on Text Classification of Geosocial Media Data. Ecosyst. Serv. 2023, 60, 101508. [Google Scholar] [CrossRef]
- Yan, F.; Shu, B.; Zhao, X.; Li, X.; Wu, W.; Huang, M. Secular Experience or Spiritual Pursuit? The Attribution of Checking into Internet-famous Places in the Consumerism Context. Tourism Tribune 2022, 37, 94–105. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, A.; Weber, K.; Chan, E.H.W.; Shi, W. Categorisation of Cultural Tourism Attractions by Tourist Preference Using Location-Based Social Network Data: The Case of Central, Hong Kong. Tour. Manag. 2022, 90, 104488. [Google Scholar] [CrossRef]
- Rui, J.; Xu, Y.; Cai, C.; Li, X. Leveraging Large Language Models for Tourism Research Based on 5D Framework: A Collaborative Analysis of Tourist Sentiments and Spatial Features. Tour. Manag. 2025, 108, 105115. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Y.; Fukuda, T.; Wang, B. Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images. Cities 2025, 165, 106122. [Google Scholar] [CrossRef]
- Parekh, J.; Khayatan, P.; Shukor, M.; Newson, A.; Cord, M. A Concept-Based Explainability Framework for Large Multimodal Models. arXiv 2024, arXiv:2406.08074. [Google Scholar] [CrossRef]
- Khaledi, H.J.; Khakzand, M.; Faizi, M. Landscape and Perception: A Systematic Review. Landsc. Online 2022, 97, 1098. [Google Scholar] [CrossRef]
- Cerasi, M. The Urban and Architectural Evolution of the Istanbul Dïvanyolu: Urban Aesthetics and Ideology in Ottoman Town Building. Muqarnas 2005, 2, 189–232. [Google Scholar] [CrossRef]
- Tveit, M.; Ode, Å.; Fry, G. Key Concepts in a Framework for Analysing Visual Landscape Character. Landsc. Res. 2006, 31, 229–255. [Google Scholar] [CrossRef]
- Jia, M.; Feng, J.; Chen, Y.; Zhao, C. Visual Analysis of Social Media Data on Experiences at a World Heritage Tourist Destination: Historic Centre of Macau. Buildings 2024, 14, 2188. [Google Scholar] [CrossRef]
- Zhu, H.; Chang, J.; An, X.; Li, S. Global and Local Feature Extraction of Urban Historical Spatial Perception Using Large Language Models: A Case Study of Harbin Central Street District. Cities 2025, 165, 106183. [Google Scholar] [CrossRef]
- González Martínez, P. Authenticity as a Challenge in the Transformation of Beijing’s Urban Heritage: The Commercial Gentrification of the Guozijian Historic Area. Cities 2016, 59, 48–56. [Google Scholar] [CrossRef]
- Luo, L.; Chen, J.; Cheng, Y.; Cai, K. Empirical Analysis on Influence of Authenticity Perception on Tourist Loyalty in Historical Blocks in China. Sustainability 2024, 16, 2799. [Google Scholar] [CrossRef]
- Al-Shami, H.W.; Al-Alwan, H.A.S.; Abdulkareem, T.A. Cultural Sustainability in Urban Third Places: Assessing the Impact of “Co-Operation in Science and Technology” in Cultural Third Places. Ain Shams Eng. J. 2024, 15, 102465. [Google Scholar] [CrossRef]
- Li, G.; Zhao, C.; Ling, S.; Lu, L. A study on the urban creative space production based on the perspective of “stage-interaction”: A case study of Lei street in Hefei. Hum. Geogr. 2024, 39, 106–117. [Google Scholar] [CrossRef]
- Kaplan, S. The Restorative Benefits of Nature: Toward an Integrative Framework. J. Environ. Psychol. 1995, 15, 169–182. [Google Scholar] [CrossRef]
- Redaelli, E.; Hansen, L.E.; Djupdræt, M.B. Museums as Public Spaces in the City: Insights from Aarhus, Denmark. Cities 2025, 159, 105778. [Google Scholar] [CrossRef]
- González Martínez, P. Curating the Selective Memory of Gentrification: The Wulixiang Shikumen Museum in Xintiandi, Shanghai. Int. J. Herit. Stud. 2021, 27, 537–553. [Google Scholar] [CrossRef]
- Kutsche, P. The Death and Life of Great American Cities. Jane Jacobs. Am. Anthropol. 1962, 64, 907–909. [Google Scholar] [CrossRef]
- Sung, H.; Lee, S. Residential Built Environment and Walking Activity: Empirical Evidence of Jane Jacobs’ Urban Vitality. Transp. Res. Part Transp. Environ. 2015, 41, 318–329. [Google Scholar] [CrossRef]
- Li, X.; Li, Y.; Jia, T.; Zhou, L.; Hijazi, I.H. The Six Dimensions of Built Environment on Urban Vitality: Fusion Evidence from Multi-Source Data. Cities 2022, 121, 103482. [Google Scholar] [CrossRef]
- Liu, W.; Li, D.; Meng, Y.; Guo, C. The Relationship between Emotional Perception and High-Density Built Environment Based on Social Media Data: Evidence from Spatial Analyses in Wuhan. Land 2024, 13, 294. [Google Scholar] [CrossRef]
- Wu, J.; Lu, Y.; Gao, H.; Wang, M. Cultivating Historical Heritage Area Vitality Using Urban Morphology Approach Based on Big Data and Machine Learning. Comput. Environ. Urban Syst. 2022, 91, 101716. [Google Scholar] [CrossRef]
- Wu, T.; Chen, Z.; Li, S.; Xing, P.; Wei, R.; Meng, X.; Zhao, J.; Wu, Z.; Qiao, R. Decoupling Urban Street Attractiveness: An Ensemble Learning Analysis of Color and Visual Element Contributions. Land 2025, 14, 979. [Google Scholar] [CrossRef]
- Wu, W.; Ma, Z.; Guo, J.; Niu, X.; Zhao, K. Evaluating the Effects of Built Environment on Street Vitality at the City Level: An Empirical Research Based on Spatial Panel Durbin Model. Int. J. Environ. Res. Public. Health 2022, 19, 1664. [Google Scholar] [CrossRef]
- Rasoolimanesh, S.M.; Seyfi, S.; Hall, C.M.; Hatamifar, P. Understanding Memorable Tourism Experiences and Behavioural Intentions of Heritage Tourists. J. Destin. Mark. Manag. 2021, 21, 100621. [Google Scholar] [CrossRef]
- Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta—A System for Feature Selection. Fundam. Informaticae 2010, 101, 271–285. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]









| Geographic Coordinate Data of the Entire City | Three Ring Geographic Coordinate Data | Three Ring Picture Data | |
|---|---|---|---|
| Xiaohongshu | 70,807 | 23,046 | 45,571 |
| 154,717 | 77,343 | 126,985 | |
| Total | 225,524 | 100,389 | 172,556 |
| Label | Esthetics | Attractiveness | Culturality | Restorativeness |
|---|---|---|---|---|
| 0 | 107 | 280 | 125 | 267 |
| 1 | 880 | 1,607 | 953 | 1,298 |
| 2 | 4,065 | 3,165 | 3,974 | 3,497 |
| Category | Parameter | Value | Description/Function |
|---|---|---|---|
| LoRA Configuration | rank | 8 | Dimension of low-rank matrices |
| alpha | 32 | Scaling factor for weight adjustment | |
| dropout | 0.05 | Dropout rate to prevent overfitting | |
| Training | learning rate | 0.0001 | Initial learning rate for optimization |
| train_val_split_ratio | 9:1 | Ratio of training set to validation set | |
| epochs | 4 | Number of training iterations | |
| batch size | 16 | Samples processed per GPU per iteration | |
| Hardware | gradient checkpointing | Enabled | Reduces GPU memory usage via cross-layer caching |
| mixed precision (BF16) | Enabled | Accelerates training with bfloat16 precision | |
| Vision | max pixels | 602,112 | Maximum image pixels (448 × 1344) |
| Domains | Variables | Formula Description |
|---|---|---|
| Density | BD (Building density) | j is a building in the block, m j is the building base area of the j-th building, and Si is the area of the block (km2). |
| PFD (POI Functional Density) | Ci is the sum of the values of the POI points for that facility point within the block i, and Si is the area of the block (km2). | |
| PR (Plot Ratio) | j is a building in the block, mj is the building base area of the j-th building, fj is the number of floors of the building, and Si is the area of the block (km2). | |
| HHD (Historical Heritage Density) | Hi is the sum of the number of historical heritages in block i, and Si is the area of the block (km2). | |
| PFM (POI Functional Mixture) | i is the number of POI types in the block i, and Pi is the ratio of the number of types of i-th POI in the block to the total number. | |
| SHDI (Shannon’s Diversity Index) | i is the number of visual feature types computed in the semantic segmentation task within block i, and pi is the proportion of visual feature i to the total pixels. | |
| Distance to transit | BSD (Bus Sop Density) | Bi is the sum of the number of bus stops in block i, and Si is the area of the block (km2). |
| MSD (Metro Station Density) | Mi is the sum of the number of metro stations in block i, and Si is the area of the block (km2). | |
| NBSD (Nearest Bus Sop Distance) | Straight-line distance from the centroid of block i to the nearest bus stop (km). | |
| RND (Road Network Density) | Rl is the sum of the lengths of all types of roads in the block, and Si is the area of the block (km2). | |
| Design | SVF (Sky View Factor) | SVF represents the proportion of the sky visible in the field of vision, sjk is the pixel ratio of the sky in the j-th street view picture of the k-th sample point in block i, and m is the number of street view sample points in block i (the same below). |
| EI (Enclosue Index) | bjk/wjk/tjk/pjk/rjk is the pixel ratio of buildings/walls/trees/sidewalks/roadways in the j-th street view picture of the k-th sample point in i in the spatial unit. | |
| GVI (Green View Index) | gik is the proportion of green plants pixels in the j-th street view picture of the k-th sample point in block i. | |
| WAI (Walkability Index) | pjk/rjk: the proportion of sidewalk/roadway pixels in the j-th street view picture of the k-th sample point in block i. | |
| BC (Building Continuity) | BC represents the standard deviation of the proportion of buildings in block I, and bjk is the proportion of building pixels in the j-th street view image of the k-th sample point in i in the spatial unit. | |
| GSR (Green Space Ratio) | Gr is the sum of the areas of the various types of green spaces in the blocks, and Si is the area of the block (km2). | |
| WI (Water Index) | K takes the value of 3000 m search radius, and NEAR_DIST is the closest distance to the water. | |
| Destination accessibility | NTAD (Nearest Trade Area Distance) | Straight-line distance from the centroid of block i to the nearest business district (km). |
| AHH (Accessibility of Historical Heritage) | Number of historical heritages within 500 m walking of block i, including temples, memorials, churches, etc. (units/500 m). | |
| AOS (Accessibility of Open Space) | Number of open spaces within 500 m walking distance of block i, including green spaces, city parks, squares, etc. (units/500 m). | |
| Destination accessibility | SSC (Space Syntax Choice) | djk is the shortest path between line segment j and line segment k, and djk (i) is the shortest path between line segment j and line segment k that contains line segment i. r = 1000 m. |
| SSI (Space Syntax Integration) | n is the number of units in the street network and MDi is the average depth of segment i. r = 1000 m. |
| Model | Esthetics | Attractiveness | Culturality | Restorativeness | |
|---|---|---|---|---|---|
| ACC | Qwen2.5-VL-7B-SFT | 0.9348 | 0.8715 | 0.9368 | 0.9150 |
| InternVL3-8B-SFT | 0.9187 | 0.8515 | 0.9146 | 0.8972 | |
| Qwen2.5-VL-7B | 0.6653 | 0.5257 | 0.6858 | 0.6735 | |
| Precision | Qwen2.5-VL-7B-SFT | 0.9358 | 0.8703 | 0.9382 | 0.9169 |
| InternVL3-8B-SFT | 0.9187 | 0.8518 | 0.9148 | 0.8978 | |
| Qwen2.5-VL-7B | 0.6663 | 0.5244 | 0.6868 | 0.6764 | |
| Recall | Qwen2.5-VL-7B-SFT | 0.9348 | 0.8715 | 0.9368 | 0.9150 |
| InternVL3-8B-SFT | 0.9187 | 0.8515 | 0.9146 | 0.8972 | |
| Qwen2.5-VL-7B | 0.6653 | 0.5257 | 0.6858 | 0.6735 | |
| F1 score | Qwen2.5-VL-7B-SFT | 0.9348 | 0.8701 | 0.9365 | 0.9155 |
| InternVL3-8B-SFT | 0.9183 | 0.8509 | 0.9144 | 0.8973 | |
| Qwen2.5-VL-7B | 0.6653 | 0.5245 | 0.6854 | 0.6739 |
| Label | Esthetics | Attractiveness | Culturality | Restorativeness |
|---|---|---|---|---|
| 0 | 227 (0.4%) | 1175 (2.1%) | 450 (0.8%) | 2128 (3.9%) |
| 1 | 10,112 (18.4%) | 24,390 (44.3%) | 9703 (17.6%) | 20,978 (38.1%) |
| 2 | 44,742 (81.2%) | 29,516 (53.6%) | 44,928 (81.6%) | 31,975 (58.1%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Hou, X.; Wang, X.; Fan, W. Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area. Land 2025, 14, 2155. https://doi.org/10.3390/land14112155
Wang Y, Hou X, Wang X, Fan W. Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area. Land. 2025; 14(11):2155. https://doi.org/10.3390/land14112155
Chicago/Turabian StyleWang, Yan, Xin Hou, Xuan Wang, and Wei Fan. 2025. "Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area" Land 14, no. 11: 2155. https://doi.org/10.3390/land14112155
APA StyleWang, Y., Hou, X., Wang, X., & Fan, W. (2025). Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area. Land, 14(11), 2155. https://doi.org/10.3390/land14112155









