Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Evaluation of Thermal Comfort in Urban Commercial Space with Vision–Language-Model-Based Agent Model

Land 2025, 14(4), 786; https://doi.org/10.3390/land14040786

by Dongyi Zhang¹, Zihao Xiong² and Xun Zhu^3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Land 2025, 14(4), 786; https://doi.org/10.3390/land14040786

Submission received: 26 February 2025 / Revised: 22 March 2025 / Accepted: 4 April 2025 / Published: 6 April 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study presents an innovative VLMs-based approach to urban thermal comfort assessment, demonstrating significant potential. However, two key limitations should be addressed to enhance its validity and generalizability.

The current methodology relies solely on static street view images, which cannot capture dynamic environmental factors (e.g., wind speed, humidity, temporal shading changes). These factors are critical for a comprehensive understanding of thermal comfort. The authors should provide additional supporting literature or evidence demonstrating the feasibility of VLMs or similar technologies for thermal comfort evaluation. Alternatively, future research could explore the use of video data or time-series images to capture dynamic environmental changes, such as variations in shading and wind patterns.
While the study employs eight AI agents to represent diverse social groups, the 200 professional volunteers used for validation do not appear to be stratified by demographic characteristics. This raises concerns about the representativeness of the validation process.

Author Response

Comments 1: The current methodology relies solely on static street view images, which cannot capture dynamic environmental factors (e.g., wind speed, humidity, temporal shading changes). These factors are critical for a comprehensive understanding of thermal comfort. The authors should provide additional supporting literature or evidence demonstrating the feasibility of VLMs or similar technologies for thermal comfort evaluation. Alternatively, future research could explore the use of video data or time-series images to capture dynamic environmental changes, such as variations in shading and wind patterns.

Response 1: Thank you for your comments. First, we have added more evidence in the literature review section to support the feasibility of using static street view images to evaluate thermal comfort. Second, we have elaborated on the limitations of our study in the discussion section and explored the potential of integrating multi-source data for more accurate assessments in the future.

Comments 2: While the study employs eight AI agents to represent diverse social groups, the 200 professional volunteers used for validation do not appear to be stratified by demographic characteristics. This raises concerns about the representativeness of the validation process.

Response 2: Thank you for your comments. We have reorganized this section and recruited 50 volunteers to score 30% of the sample images (167 images) using the ASHRAE scale. The results were then compared with the scores generated by VLMs for validation. Note that this phase of validation does not consider the demographic characteristics of the participants, but instead focuses on the stability of using VLMs to evaluate thermal comfort.

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript presents an innovative approach to assessing urban thermal comfort using Vision-Language Models (VLMs) for human perception. The study is relevant to urban planning and environmental design, offering a novel, AI-driven alternative to traditional field surveys. The proposed multi-agent framework, which simulates diverse demographic perspectives, is a significant methodological contribution.

1.Specify VLM technical details in the abstract.

More explanation should be provided on the relationship between VLMs’ multimodal capabilities and thermal comfort assessment in the Introduction.
The study focuses only on a commercial district in Harbin during summer, limiting its applicability to other climates, seasons, and urban forms. Expanding the case studies or discussing adaptability would strengthen the findings.
The AI evaluates comfort based on images alone, lacking real environmental data like temperature and wind. Integrating meteorological inputs or discussing multimodal approaches could improve accuracy.
While the eight agent roles cover different demographics, their definitions lack empirical validation. Discussing whether additional groups (e.g., children, tourists) would alter results would improve robustness.
Explicitly address the limitations of image-based methods and propose solutions and policy implications in the Discussion.
The ChatGPT-4 model should be explicitly defined, and the methodology should clarify whether AI prompts were fixed or tailored for different images.

Author Response

Comments 1: Specify VLM technical details in the abstract.

Response 1: Thank you for your suggestion. We have added the technical details of VLM in the abstract section. Please refer Lines 13-16.

Comments 2: More explanation should be provided on the relationship between VLMs’ multimodal capabilities and thermal comfort assessment in the Introduction.

Response 2: Thank you for your suggestion. We have added more explanations in the introduction regarding the relationship between the multimodal capabilities of VLMs and thermal comfort.

Comments 3: The study focuses only on a commercial district in Harbin during summer, limiting its applicability to other climates, seasons, and urban forms. Expanding the case studies or discussing adaptability would strengthen the findings.
Response 3: Thank you for your suggestion. We have described the limitations of this study in the discussion section.

Comments 4: The AI evaluates comfort based on images alone, lacking real environmental data like temperature and wind. Integrating meteorological inputs or discussing multimodal approaches could improve accuracy.

Response 4: Thank you for your comment. Integrating multi-dimensional environmental data requires a more complex system. In this study, we used street view images as the foundational data for several reasons: first, they are relatively easy to obtain; second, there is a strong correlation between images and thermal comfort, which is supported by our introduction section. Additionally, we attempted to incorporate more dimensions of data, but found that it led to limited improvements in the VLMs' understanding capabilities. We acknowledge the limitations of our study and have discussed these in the discussion section.

Comments 5: While the eight agent roles cover different demographics, their definitions lack empirical validation. Discussing whether additional groups (e.g., children, tourists) would alter results would improve robustness.

Response 5: Thank you for your suggestion. Our study focuses primarily on tourists' thermal comfort perception. In future research, we will further explore and discuss the differences in thermal comfort perception among specific groups.

Comments 6: Explicitly address the limitations of image-based methods and propose solutions and policy implications in the Discussion.

Response 6: Thank you for your suggestion. In response to your feedback, we have expanded the discussion section to include a thorough examination of the limitations inherent in our study. This addition aims to provide a more balanced and comprehensive understanding of our findings, highlighting areas where further research could address current constraints and improve upon our initial results.

Comments 7: The ChatGPT-4 model should be explicitly defined, and the methodology should clarify whether AI prompts were fixed or tailored for different images.

Response 7: Thank you for your suggestion. We have clarified the issue you mentioned in the methods section. To put it simply, our study is based on a fixed prompt engineering approach, but we differentiated the socio-demographic information of the simulated roles. This differentiation allows for a better capture of the differences in thermal comfort perception among different groups.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

I would like to share some comments regarding your manuscript.

First, outdoor thermal comfort cannot be accurately evaluated using the PMV index, as it was specifically developed for assessing thermal sensation in indoor environments equipped with HVAC systems. Many researchers mistakenly apply this index to outdoor conditions, but this does not justify its use as standard practice.

Subjective evaluation of thermal comfort should utilize the standardized scales recommended by ISO 10551, which include perception, evaluation, preference, and acceptability. Unfortunately, this does not appear to be the case in the manuscript.

The most significant finding of this investigation is the correlation between VLM (Vision Lnguage Models) and subjective assessments. However, based on the standard deviation values presented in Figure 4, a correlation coefficient of r = 0.602 does not necessarily indicate a strong correlation. Additionally, the comparison should encompass ISO and ASHRAE subjective scales rather than the rating scale employed by the authors.

In summary, while using AI could add value to the evaluation process, it is crucial to prevent biases. Therefore, the assessment of thermal comfort conditions should be conducted consistently with established subjective measures and include feedback from occupants, rather than relying solely on experts as is the case here.
Best regards.

Author Response

Comments 1: First, outdoor thermal comfort cannot be accurately evaluated using the PMV index, as it was specifically developed for assessing thermal sensation in indoor environments equipped with HVAC systems. Many researchers mistakenly apply this index to outdoor conditions, but this does not justify its use as standard practice.

Response 1: T Thank you for your suggestion. We have corrected this part of the content.

Comments 2: Subjective evaluation of thermal comfort should utilize the standardized scales recommended by ISO 10551, which include perception, evaluation, preference, and acceptability. Unfortunately, this does not appear to be the case in the manuscript.

Response 2: Thank you for your comment. We have used the ASHRAE scale for evaluation in the latest version.

Comments 3: The most significant finding of this investigation is the correlation between VLM (Vision Language Models) and subjective assessments. However, based on the standard deviation values presented in Figure 4, a correlation coefficient of r = 0.602 does not necessarily indicate a strong correlation. Additionally, the comparison should encompass ISO and ASHRAE subjective scales rather than the rating scale employed by the authors.

Comments 4: In summary, while using AI could add value to the evaluation process, it is crucial to prevent biases. Therefore, the assessment of thermal comfort conditions should be conducted consistently with established subjective measures and include feedback from occupants, rather than relying solely on experts as is the case here.

Response 3 and 4: Thank you for your comments. We have reorganized this section and recruited 50 volunteers to score 30% of the sample images (167 images) using the ASHRAE scale. The results were then compared with the scores generated by VLMs for validation. Note that this phase of validation does not consider the demographic characteristics of the participants, but instead focuses on the stability of using VLMs to evaluate thermal comfort. Our latest results show that the evaluation results from VLMs are significant consistent with expert scores (r = 0.815, p < 0.001), confirming the stability of using VLMs to assess thermal comfort of urban commercial streets.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have carefully and effectively addressed the two major concerns raised in the previous round of review. The paper is now suitable for publication in its current form.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors, thank you for your punctual replies.

Cross fingers.

Article Menu

Evaluation of Thermal Comfort in Urban Commercial Space with Vision–Language-Model-Based Agent Model

Further Information

Guidelines

MDPI Initiatives

Follow MDPI