Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Explaining Urban Vitality Through Interpretable Machine Learning: A Big Data Approach Using Street View Images and Environmental Factors

Sustainability 2025, 17(11), 4926; https://doi.org/10.3390/su17114926

by Dong Li^1,2, Houzeng Han^1,*, Jian Wang^1,2 and Xingxing Xiao^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sustainability 2025, 17(11), 4926; https://doi.org/10.3390/su17114926

Submission received: 18 March 2025 / Revised: 22 May 2025 / Accepted: 23 May 2025 / Published: 27 May 2025

(This article belongs to the Special Issue Socially Sustainable Urban and Architectural Design)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The title of the article is unclear, please revise the title. The titles at all levels of the article are not academic enough and lack standard English expression.
Why does building density have a negative impact on urban vitality? Numerous studies have shown a positive impact. If it is a negative impact, it is also not in line with the actual situation.
There are two keywords related to machine learning, it is recommended to integrate them.
Figure 1 lacks a legend, such as the one on the right. Please add it.
Change 2.2 Data Source to 2.2 Data.
The full name and abbreviation should be used uniformly throughout the text. For example, in some places, the full name is used for built environment, while in other places, abbreviations are used.
The division of indicators in this article is chaotic. There are too many introductions to NDVI. Some studies convert NDVI into a built environment indicator. The NDVI in the urban area of a city is a green space, an artificial space, and belongs to the built environment. Business density is a built environment indicator, not a socio-economic indicator. How is land use diversity calculated? Most of the urban areas are construction land. How is the land data used for calculation? Please explain the calculation method and formula. What are socio-economic indicators and how are they defined? Can housing prices be considered a socio-economic indicator?
In lines 119-120, this article states that the selection of built environment indicators is relatively limited, but in reality, the selection in this article is mainly based on built environment indicators and is not comprehensive.
Why choose grid scale? It may sever the continuity of space. It may separate a school or company from the middle. Not considering the physical connectivity between research units.
The method for calculating urban vitality is unclear. Where did the data come from? How was it handled? The research selected in this article tends to have cultivated land on the periphery, but the cultivated land lacks vitality. How was this research result considered? Including these lifeless grids in the study will affect the accuracy of the research results. And some of the built environment indicators of these grids are also zero.
Figure 2 only shows 4 environmental indicators, what about the other indicators? Figure 2(d) shows AHA, which did not appear in the previous section.
The socio-economic data in Figure 3 includes AHA, but not in Table 1. The output part in Figure 3 has not been studied in the text and cannot be drawn in the flowchart. Figures 3 and 4 are not drawn up to standard, please express them in a standardized manner.
Why are there only NDVI, RND, and AHP SHAP values in Figure 9? Why aren't other indicators displayed?
The font size in Figure 10 is too small to see clearly. The impact of these environmental indicators on urban vitality varies by less than 0.1. From Figure 5, it can be seen that the range of urban vitality changes is between 6-83, and the significance of these indicators being less than 0.1 is too small, indicating that the research results are meaningless.
On which day is the urban vitality data available? On what day is the environmental data? Do the two match in time?
The English expression is not standardized and needs further improvement. The title of each chapter in the article lacks logical expression in English. It is recommended to revise it according to the English expression.

Author Response

Response to Comments on the Manuscript:

“Decoding Urban Vitality: Leveraging Big Data Analytics and Interpretable Machine Learning to Uncover Environmental

Factors”

May 12, 2025

-------------------------------------------------------------------------------------------------------

The authors gratefully acknowledge the editors and the anonymous reviewers for their constructive comments. We have made a comprehensive revision for our previous manuscript. Specially, any revisions are highlighted using the "Track Changes" function in Microsoft Word. Please refer to the point by point response. Thank you for your time.

Response to comments by Reviewer #1：

We would like to gratefully thank the reviewer for his/her constructive comments and recommendations for improving the paper. A point-by-point response to the interesting comments raised by the reviewer follows.

Point 1. The title of the article is unclear, please revise the title. The titles at all levels of the article are not academic enough and lack standard English expression.

Response 1: Thank you for your suggestion. We agree with the reviewer. We revised the title of the article. We have also revised the titles at various levels in the article to make them more academic and prescriptive.

Below are the additions. Please refer to the revised manuscript.

Explaining Urban Vitality through Interpretable Machine Learning: A Big Data Approach Using Street View Imagery and Environmental Factors

Point 2. Why does building density have a negative impact on urban vitality? Numerous studies have shown a positive impact. If it is a negative impact, it is also not in line with the actual situation.

Response 2: Thank you for your suggestion. We agree with the reviewer. Thank you for your valuable comment. While previous studies have widely documented the positive effects of building density on urban vitality, our findings suggest a U-shaped nonlinear relationship between the two. Specifically, urban vitality initially increases with rising building density, then decreases after surpassing a certain threshold, and eventually rises again at very high density levels.

This phenomenon is particularly evident in our study area. According to our analysis, building densities between 0.4 and 0.8 are negatively correlated with urban vitality, whereas in areas with densities exceeding 0.9 (which account for over 85% of the regions within Beijing's Fifth Ring Road), building density once again shows a positive association with urban vitality. This indicates that in super-high-density environments like central Beijing, additional density supports more commercial activity, better infrastructure, and higher population mobility—factors that contribute positively to urban vitality.

Several previous studies have reported similar findings:

Liu et al. (2020) identified a nonlinear threshold effect between built environment and vitality, with both positive and negative turning points depending on spatial context.

Therefore, the observed nonlinear pattern aligns with existing literature and reflects complex urban dynamics that are sensitive to local conditions, especially in megacities like Beijing.

《Investigating the Nonlinear Relationship Between the Built Environment and Urban Vitality Based on Multi-Source Data and Interpretable Machine Learning》

《Impacts of Built Environment on Urban Vitality: Regression Analyses of Beijing and Chengdu, China》

Below are the additions. Please refer to the section 4.4 and 5 part of the revised manuscript.

Point 3. There are two keywords related to machine learning, it is recommended to integrate them.

Response 3: Thank you for your suggestion. We agree with the reviewer. We have removed a redundant keyword for machine learning.

Below are the additions. Please refer to the revised manuscript.

Keywords: Urban vitality; Interpretable machine learning; SHapley Additive exPlanation; Multi-source urban big data

Point 4. Figure 1 lacks a legend, such as the one on the right. Please add it.

Response 4: Thank you for your suggestion. We agree with the reviewer. We added the legend on the right

Below are the additions. Please refer to the section 2.1 part of the revised manuscript.

Figure 1

Point 5. Change 2.2 Data Source to 2.2 Data.

Response 5: Thank you for your suggestion. We agree with the editor.

Below are the additions. Please refer to the section 2.2 part of the revised manuscript.

Point 6. The full name and abbreviation should be used uniformly throughout the text. For example, in some places, the full name is used for built environment, while in other places, abbreviations are used.

Response 6: Thank you for your suggestion. We agree with the editor. We have adjusted the acronyms in the paper, especially the built environment and urban vitality.

Below are the additions. Please refer to the revised manuscript.

Point 7. The division of indicators in this article is chaotic. There are too many introductions to NDVI. Some studies convert NDVI into a built environment indicator. The NDVI in the urban area of a city is a green space, an artificial space, and belongs to the built environment. Business density is a built environment indicator, not a socio-economic indicator. How is land use diversity calculated? Most of the urban areas are construction land. How is the land data used for calculation? Please explain the calculation method and formula. What are socio-economic indicators and how are they defined? Can housing prices be considered a socio-economic indicator?

Response 7: Thank you for your suggestion. We agree with the reviewer. Thank you for your valuable comments regarding the classification and calculation of indicators. We sincerely acknowledge the need to clarify and standardize the classification of environmental and socioeconomic indicators in our study. In response to your concerns:

(1) NDVI Classification: Although NDVI is commonly used as a natural environment indicator, in urban settings—particularly within highly built-up districts—it largely reflects managed green spaces (e.g., parks, lawns, tree belts) that are part of the built environment. Following studies such as Huang et al. (2017) and Liu et al. (2021), we therefore categorize urban NDVI as a component of the built environment in this context.

(2) Business density: We agree with your observation and have reclassified commercial density as a built environment indicator rather than a socioeconomic one.

(3) Land Use Diversity: We use a Shannon Entropy Index to measure land use mix, which quantifies the distributional diversity of different land use types in each grid cell. The formula is:

When is the number of total land use categories, is the proportion of land use type in the region of the grid to which it belongs. Only land-use categories relevant to urban function (e.g., residential, commercial, industrial, public service) are included. This method is widely applied in urban studies (e.g., Frank et al., 2005; Song et al., 2014).

(4) Socioeconomic Indicators: We define socioeconomic indicators as those reflecting economic activity, income level, social status, or living costs of residents. Housing prices, as a proxy for affordability and economic level, are classified as a socioeconomic indicator in line with previous works (e.g., Zheng et al., 2017).

We have revised the manuscript accordingly to clarify indicator definitions, reclassify variables where necessary, and better explain the calculation methods. A revised table categorizing all indicators has been added for clarity.

Below are the additions. Please refer to the section 2.5.1 and 2.5.2 part of the revised manuscript.

Point 8. In lines 119-120, this article states that the selection of built environment indicators is relatively limited, but in reality, the selection in this article is mainly based on built environment indicators and is not comprehensive.

Response 8: Thank you for your suggestion. We agree with the reviewer. Thank you for pointing out this important issue. We acknowledge that while our study incorporates several built environment indicators—such as NDVI, road network density, and building density—the overall selection may appear limited in scope, especially in capturing the multi-dimensional aspects of the built environment. To address this, we have revised the manuscript to clarify the indicator framework and acknowledge the current limitation in the range of built environment metrics used. Specifically, indicators such as building height, form diversity, pedestrian infrastructure, or street canyon geometry were not included due to data availability constraints. We have now explicitly discussed this limitation in the revised Discussion section and suggested potential directions for future research that could incorporate a more comprehensive set of built environment indicators using emerging datasets (e.g., 3D urban morphology, street view imagery, etc.).

Below are the additions. Please refer to the section 5.4 part of the revised manuscript.

Point 9. Why choose grid scale? It may sever the continuity of space. It may separate a school or company from the middle. Not considering the physical connectivity between research units.

Response 9: Thank you for your suggestion. We agree with the reviewer. Thank you for your insightful comment regarding the use of a regular grid-based spatial unit. We fully acknowledge that this approach may not always reflect the physical continuity of urban elements and may result in the segmentation of functional units such as schools or workplaces.

However, the decision to adopt a 500 m × 500 m grid was based on several considerations:

(1) Comparability and precedence: The grid scale is widely adopted in previous urban vitality and spatial modeling studies, which ensures methodological comparability and facilitates benchmarking [e.g., Refs. (1) Uncovering spatial patterns of environmental influence on the paces of active leisure travel (2) Spatial differentiation characteristics and influencing factors of the green view index in urban areas based on street view images: A case study of Futian District, Shenzhen, China].

(2) Data harmonization and technical feasibility: Given the heterogeneity and varying resolution of our multi-source spatial datasets (e.g., POI, NDVI, road networks, etc.), the grid provides a standardized unit for statistical analysis and aggregation.

(3) Computational stability: A uniform spatial unit is essential for ensuring model convergence and interpretability, particularly when using machine learning methods such as GBDT and SHAP.

We have now added a clearer justification in the revised manuscript and explicitly acknowledged the limitations of the grid-based approach, suggesting that future studies could adopt parcel-level or functionally-informed boundaries (e.g., road-enclosed blocks, land-use parcels) to further refine the spatial unit of analysis.

Below are the additions. Please refer to the section 5.4 part of the revised manuscript.

Despite the strengths of this study in introducing a perceptual, interpretable, and da-ta-driven framework for urban vitality assessment, several limitations should be acknowledged. First, the use of street-level imagery from the Place Pulse 2.0 dataset, while valuable in capturing visual perception, may not fully represent real-time urban dynamics, especially in rapidly changing environments. Then, although SHAP values provide inter-pretability, their global explanations may overlook subtle local patterns or interactions between variables. Also, this study employed a uniform 500 m × 500 m grid to aggregate and analyze multi-source spatial data. Specifically, the use of fixed grids may disrupt the continuity of physical spaces, potentially splitting functionally cohesive entities such as schools, residential compounds, or commercial zones. Moreover, grids do not inherently account for the actual spatial interactions or built environment connectivity between adja-cent units. At last，indicators such as building height, form diversity, pedestrian infra-structure, or street canyon geometry were not included due to data availability constraints.

Future research could address these limitations in several ways. First, integrating multi-source, real-time data—such as mobility traces, urban sensors, and social media feeds—would enrich the measurement of urban vitality and better reflect its temporal evo-lution. Second, adopting multimodal models that combine imagery with textual and spa-tial datasets may provide a more holistic understanding of urban life. Third, Future re-search may consider adopting spatial units defined by functional or morphological boundaries, such as road-enclosed parcels or administrative blocks, which better reflect the spatial structure of urban environments and enable a more nuanced understanding of localized vitality patterns. Additionally, advances in interpretable deep learning, such as attention-based spatial-temporal graph neural networks, could be leveraged to capture complex dependencies while maintaining transparency. Cross-city transfer learning tech-niques could also be employed to test model robustness and generalizability across different urban contexts. Ultimately, future research should aim for more comprehensive, dynamic, and inclusive assessments of urban vitality, combining human-centered per-spectives with scalable machine learning tools.

Point 10. The method for calculating urban vitality is unclear. Where did the data come from? How was it handled? The research selected in this article tends to have cultivated land on the periphery, but the cultivated land lacks vitality. How was this research result considered? Including these lifeless grids in the study will affect the accuracy of the research results. And some of the built environment indicators of these grids are also zero.

Response 10: Thank you for your suggestion. We agree with the reviewer. Clarification on the Measurement of Urban Vitality and the Treatment of Low-Activity Areas

In this study, urban vitality is quantified using a perception-based method derived from street-level imagery. Specifically, we utilized the Place Pulse 2.0 dataset, which includes crowd-sourced perception scores on urban liveliness and other qualities of urban scenes. These scores were used to train a ResNet50 deep learning model, which was then applied to Baidu Street View images within the study area. The output features were aggregated at the grid level to generate an urban vitality score for each spatial unit.

Regarding the spatial extent of the study area: while it primarily covers urban and peri-urban zones within Beijing’s Fifth Ring Road, it is acknowledged that certain peripheral grids may contain agricultural land (e.g., farmland) or low-density transitional areas. These areas often lack urban vitality and may have zero or near-zero values in both vitality score and built environment indicators.

To address this, two measures were implemented: (1) The vitality score is computed solely based on street view imagery. In cases where no valid images are available (e.g., inaccessible or unpaved areas), the corresponding grids are excluded from vitality calculation. (2) A threshold was set to exclude grids with extremely low human presence or missing built environment data from the analysis, ensuring the robustness of the regression and SHAP interpretation results.

While the inclusion of low-vitality or sparsely populated grids could theoretically introduce noise, their influence is minimized through filtering, and they also provide a realistic representation of the spatial variation in vitality. Future studies may further refine this by adopting a multi-tiered spatial sampling strategy or integrating remote sensing-derived land cover data to pre-filter non-urban areas.

Below are the additions. Please refer to the section 2.4 part of the revised manuscript.

Point 11. Figure 2 only shows 4 environmental indicators, what about the other indicators? Figure 2(d) shows AHA, which did not appear in the previous section.

Response 11: Thank you for your suggestion. We agree with the editor. We thank the reviewers for their valuable suggestions. Considering the length and visual constraints of the presentation, we have selected the four most representative environmental indicators in Figure 2. These indicators are intended to demonstrate the diversity of indicator types involved in the study, while ensuring that the core data features are understandable to frequency readers. This study focuses on the correlation between each environmental variable and urban vitality, and therefore adopts a specialised data visualisation strategy. Presenting all 12 indicators at the same time would not only result in overly lengthy graphics, but would also shrink the text to the point of affecting readability. We will consider a more comprehensive way of presenting the data in subsequent studies. AHA was an expression error and has been corrected.

Below are the additions. Please refer to the section 2.5.2 part of the revised manuscript.

Point 12. The socio-economic data in Figure 3 includes AHA, but not in Table 1. The output part in Figure 3 has not been studied in the text and cannot be drawn in the flowchart. Figures 3 and 4 are not drawn up to standard, please express them in a standardized manner.

Response 12: Thank you for your suggestion. AHA was an expression error and has been corrected. We agree with the editor. We have standardized Figures 3 and 4 in this revision.

Below are the additions. Please refer to the section 3.1 and 3.2 part of the revised manuscript.

Point 13. Why are there only NDVI, RND, and AHP SHAP values in Figure 9? Why aren't other indicators displayed?

Response 13: Thank you for your suggestion. We agree with the reviewer. Thank you for your valuable question. Figure 9 is intended to provide a spatial visual reference for interpreting the SHAP values of key environmental indicators. Figure 9 is mainly intended to show that that variable is more spatially prominent, with a focus on Figure 9(a). While Section 4.4 presents the numerical SHAP analysis of all variables, Figure 9 focuses on spatializing the SHAP values to illustrate how the contributions of selected features vary across the study area.

Due to visual clarity and layout constraints, we selected three representative variables (NDVI, RND, and AHP) for mapping. These indicators reflect different dimensions, and were chosen to illustrate the spatial heterogeneity in SHAP values. Other variables, although analyzed, are not displayed in the figure to avoid clutter and redundancy. We will clarify this rationale in the revised manuscript to avoid confusion.

Point 14. The font size in Figure 10 is too small to see clearly. The impact of these environmental indicators on urban vitality varies by less than 0.1. From Figure 5, it can be seen that the range of urban vitality changes is between 6-83, and the significance of these indicators being less than 0.1 is too small, indicating that the research results are meaningless.

Response 14: Thank you for your suggestion. We agree with the reviewer. Thank you for your constructive comments.

(1) We acknowledge that the font size in Figure 10 was too small for clear visibility. In the revised manuscript, we will enlarge the font and improve the graphical layout to enhance clarity and readability.

(2) Regarding the SHAP values, we would like to clarify that SHAP does not measure the absolute value of the indicator itself, but rather its marginal contribution to the model's prediction. Although individual SHAP values may appear small, they are meaningful in the context of machine learning models, especially when aggregated over a large number of samples. Furthermore, SHAP values are unitless and normalized; thus, their numerical magnitude cannot be directly compared with the scale of the target variable (urban vitality).

Previous studies using SHAP in urban analytics have also reported relatively small SHAP values that still reflect meaningful and interpretable relationships (e.g., Lundberg & Lee, 2017; Roscher et al., 2020). Therefore, the presented results are valid and informative for understanding the relative importance and effect direction of each variable.

Point 15. On which day is the urban vitality data available? On what day is the environmental data? Do the two match in time?

Response 15: Thank you for your suggestion. We agree with the reviewer. Thank you for your insightful comment regarding the temporal alignment of the datasets.

In this study, we aimed to ensure temporal consistency as much as possible. The environmental variables—including NDVI, road networks, and land use data—were collected for the year 2020, which serves as the reference year for this study.

As for Street View imagery, we recognise that this type of data is captured at multiple points in time and may not correspond to a single uniform date. However, we have carefully selected images from the 2020 timeframe and used a large sample to minimise any individual point in time discrepancies.

While perfect temporal alignment is challenging due to data availability (especially for street view data), the selected datasets are generally representative of the same time frame. We will clarify this in the revised manuscript to strengthen transparency.

Below are the additions. Please refer to the section 2.2 part of the revised manuscript.

第16分。英文表达不够规范，有待改进。文章各章节标题的英文表达缺乏逻辑性，建议根据英文表达进行修改。

回复 16：感谢您的建议。我们同意审稿人的意见。非常感谢您对语言质量和章节标题结构的宝贵意见。

我们承认，该手稿在学术英语表达和逻辑一致性方面有待进一步完善，特别是在章节标题的制定方面。

在修订版中，我们仔细审阅并改进了整篇稿件的英文写作。具体而言，我们重写了所有章节和小节的标题，以确保其清晰、简洁，并符合学术英语的标准规范。

我们还纠正了语法，改进了句子结构，增强了逻辑流程，以确保论文符合国际学术写作标准。

我们感谢您的建议，它帮助我们大大改善了手稿的整体呈现效果。

以下是新增内容。请参阅修改稿。

谢谢编辑的宝贵意见，以后我会更加努力的。祝您身体健康！

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

Urban vitality (UV) is a fundamental concept in the analysis of sustainable urban development, reflecting socio-economic dynamics, land use efficiency and the quality of the built environment. Cities, as living entities, continuously evolve under the influence of variable factors, from population density and transport infrastructure, to the distribution of green spaces and economic accessibility. Studying UV is not just an academic exercise, but an imperative for the efficient management of urban resources and for the creation of sustainable cities, adaptable to the challenges of the future.

Traditional methods for assessing urban vitality have often been limited by the rigidity of statistical models, which fail to capture the complexity of the interactions between the determinants of UV. To overcome these obstacles, the present study proposes an advanced interpretable machine learning framework, based on the gradient boosting decision tree (GBDT) model in combination with the SHapley Additive exPlanation (SHAP) framework. This innovative methodology allows not only to quantify the influence of each factor on UV, but also to interpret the results in a manner accessible to decision-makers.

The analysis focused on the urban vitality of Beijing’s Fifth Ring Road area, a dynamic urban space characterized by high density, developed infrastructure, and a diverse mix of functionalities. The study used a multidimensional approach, combining spatial, economic, and environmental data to identify the patterns of UV distribution and the factors influencing it.

The GBDT model, which allows for a robust prediction of UV levels based on an extensive set of explanatory variables.

The SHAP framework, used to interpret the impact of each variable on UV, providing a clear view of the causal relationships in the urban system.

GIS spatial analysis, used to visualize the distribution and heterogeneity of UV in the study region.

UV shows significant spatial clustering, reflecting uneven development trends in different parts of the city.

The positive correlations identified suggest that areas with high UV tend to attract and sustain further development, creating competitive urban agglomerations.

Plot Ratio (PR): is a strong predictor of UV, suggesting that an optimal ratio of built-up area to available land contributes to positive urban dynamics.

Green Space Accessibility (GS): has a significant negative impact on UV, indicating that reduced access to green spaces can affect the vitality of urban communities.

Building Density (BD): has a strong negative correlation with UV, suggesting that urban overcrowding can reduce the quality of life.

Economic and infrastructural factors, such as NDVI, average house price (AHP), and road network density (RND), significantly influence UV, reflecting the complex interaction between economic development, infrastructure, and the natural environment.

These findings provide a solid basis for optimizing urban development strategies.

Recommendations:

Review urban densification policies to balance economic growth with the need for green spaces.

Integrate advanced urban analytics technologies, such as artificial intelligence, to facilitate data-driven decisions.

Optimize transport infrastructure to reduce congestion and increase accessibility to low-UV urban areas.

Develop a balance between built and green spaces, promoting sustainable architecture policies that encourage the integration of natural areas into the urban landscape.

Implement economic incentives for sustainable development, such as green building subsidies and congestion penalties.

Improve pedestrian connectivity and alternative transport infrastructure to reduce car dependency and improve accessibility within the city.

By using an innovative interpretable machine learning framework, this study contributes to a deeper understanding of the mechanisms governing urban vitality. The results obtained provide not only a quantitative assessment of UV, but also analytical tools for decision-makers, facilitating the implementation of sustainable urban policies. Thus, the cities of the future can become more balanced, more prosperous and more environmentally friendly, ensuring a high standard of living for their inhabitants.

I agree to the publication.

Author Response

Response to Comments on the Manuscript:

“Decoding Urban Vitality: Leveraging Big Data Analytics and Interpretable Machine Learning to Uncover Environmental

Factors”

May 12, 2025

-------------------------------------------------------------------------------------------------------

Response to comments by Reviewer #3：

Thank you for the valuable comments from the editor, I will work harder in the future. I wish you good health!

Reviewer 3 Report

Comments and Suggestions for Authors

With the help of Baidu map street view images, 5D built environment data, and the integration of machine learning and other technologies, the authors portrayed the spatial characteristics of urban vitality and analyzed the impact of multi-dimensional built environment elements on urban vitality. The research data are basically credible and the research process is rigorous. However, there is still much room for optimization:

(1) Urban vitality includes economic vitality, social vitality, cultural vitality, and spatial vitality, etc. In a broad sense, it refers to the incentives for innovation and entrepreneurship, the economic system, the talent policy, and the inclusion of heterogeneous elements, etc., and in a narrow sense, it refers to the activities of interpersonal exchanges and the activities of urban life. How does the author define urban vitality in the article? What are the main manifestations of urban vitality? What is the theoretical basis and literature foundation for characterizing urban vitality based on Baidu Street View images and human environmental perception?

(2) The author has set up 29,940 sample points of street view images in the study area, which must be a huge amount of data processing work, but the presentation is not sufficient, whether it is the process of cleaning, organizing and classifying the data, or the presentation of the street view images of the typical sample points of high/low urban vitality is not particularly sufficient.

(3) The introduction is too long and the logical relationship is slightly confusing. The introduction describes to the related research on dynamic capture of urban vitality, which is slightly disconnected from this paper, and the article does not portray the time dimension of urban vitality.

(4) All data collection time, timeliness should be added in detail.

(5) What is the basis and thinking behind the selection of selected environment variables in Figure 2?

(6) The logical relationship between chapters 2.5.1,2.5.2,2.5.3 is confusing, and the content covered in Table 1 is interspersed throughout the chapters.

(7) 8.1 In the analysis of the results, the spatial distribution characteristics of urban vitality are too shallow, staying only in the simple description of the data and the simple portrayal of the range of values, which need to be further refined and summarized. Meanwhile, the terms historic core area, heritage sites, dense historic residential neighborhoods, and regional commercial hubs lack a clear geographic point of reference and are difficult to understand for readers who are not familiar with Beijing. In addition, how can the results of the analysis of urban vitality be verified? Can the results of the analysis of typical high/low urban vitality areas be cross-validated by street view images, satellite impacts, or other means? To enhance the credibility of the results

(8) The analysis of the impact mechanism and the argumentation of potential causes in Discussion lacks the support of literature, and the specific high and low value areas are not mentioned. It is better to analyze the potential impact mechanism and put forward optimization suggestions with the results of 4.4 analysis.

(9) The names of the charts in the whole text need to be proofread one by one, and some of the charts are too briefly named and inaccurately described.

Comments on the Quality of English Language

English language expression is basically clean, but there is an urgent need to summarize and refine the core ideas, use passive voice more often, and strengthen the logical coherence between sentences

Author Response

Response to Comments on the Manuscript:

“Decoding Urban Vitality: Leveraging Big Data Analytics and Interpretable Machine Learning to Uncover Environmental

Factors”

May 12, 2025

-------------------------------------------------------------------------------------------------------

Response to comments by Reviewer #3：

Point 1. Urban vitality includes economic vitality, social vitality, cultural vitality, and spatial vitality, etc. In a broad sense, it refers to the incentives for innovation and entrepreneurship, the economic system, the talent policy, and the inclusion of heterogeneous elements, etc., and in a narrow sense, it refers to the activities of interpersonal exchanges and the activities of urban life. How does the author define urban vitality in the article? What are the main manifestations of urban vitality? What is the theoretical basis and literature foundation for characterizing urban vitality based on Baidu Street View images and human environmental perception?

Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for your thoughtful and constructive comment regarding the definition and theoretical foundation of urban vitality in our study.

In our manuscript, we define urban vitality in a relatively narrow sense, focusing on observable urban life activities, including pedestrian presence, human interactions, and street-level vibrancy. This approach aligns with prior literature that characterizes urban vitality through spatial and behavioral indicators (Jacobs, 1961; Gehl, 2010; Montgomery, 1998).

While we acknowledge that urban vitality can be interpreted more broadly to include economic, social, cultural, and institutional dimensions, our study emphasizes the spatial and perceptual expressions of vitality, which are more directly observable and measurable using spatial data and street-level imagery.

Specifically, we use Baidu street view imagery and machine-learning-based human perception metrics (e.g., safety, activity, wealth) to reflect how people perceive the built environment, following methods validated in previous works (e.g., Zhou et al., 2019; Naik et al., 2014). These perceptual variables have been shown to correlate strongly with pedestrian activity, neighborhood vibrancy, and urban experience.

The main forms of urban vitality represented in this study include:

Pedestrian movement intensity (captured through dynamic spatial data),

Perceived activity and safety (from street view images),

Built environment density and land use diversity,

all of which serve as spatial proxies for real-world urban vibrancy.

Reference Paper: (1)Mapping human perception of urban landscape from street-view images: A deep-learning approach

(2)Measuring human perceptions of a large-scale urban region using machine learning

(3) Multidimensional Urban Vitality on Streets: Spatial Patterns and Influence Factor Identification Using Multisource Urban Data

Below are the additions. Please refer to the the revised manuscript.

As an example, in the chart below, city vitality rises in order from left to right.

Point 2. The author has set up 29,940 sample points of street view images in the study area, which must be a huge amount of data processing work, but the presentation is not sufficient, whether it is the process of cleaning, organizing and classifying the data, or the presentation of the street view images of the typical sample points of high/low urban vitality is not particularly sufficient.

Response 2: Thank you for your suggestion. We agree with the reviewer. Thank you for pointing this out. We agree that the processing of 29,940 street view images involves a significant amount of data work, and we appreciate the opportunity to clarify this more explicitly.

In the revised manuscript, we have added a detailed description of the data collection, cleaning, and preprocessing steps. Specifically:

We first extracted 29,940 panoramic images using Baidu Maps API at 500 m × 500 m grid centers.

Each image was cropped into four cardinal views (0°, 90°, 180°, 270°), and then classified using a pre-trained convolutional neural network model to estimate human perceptual features (e.g., safety, activity, and wealth).

Aggregated perception scores were then assigned to each grid cell by averaging the four directional values.

In addition, we have now included supplementary visualizations of typical sample points with high and low levels of urban vitality, showcasing their corresponding street view images and perceptual metrics. These visual examples help illustrate the visual and environmental differences between vibrant and less vibrant areas.

These additions aim to enhance the transparency and reproducibility of our methodology, and to better support the visual and perceptual basis of our vitality assessment.

Below are the additions. Please refer to the the revised manuscript.

Part of the code for time acquisition

Part of the code for location acquisition

Street View Image Crawling Results

Urban Vitality Calculations

Data visualisation on ArcGiS

Data Acquisition Flowchart

Point 3. The introduction is too long and the logical relationship is slightly confusing. The introduction describes to the related research on dynamic capture of urban vitality, which is slightly disconnected from this paper, and the article does not portray the time dimension of urban vitality.

Response 3: Thank you for your suggestion. We agree with the reviewer. Thank you for your valuable feedback. We acknowledge that the original introduction was overly long and contained discussions that were not closely aligned with the core focus of this study.

In the revised manuscript, we have significantly streamlined the introduction by:

Removing lengthy descriptions of temporal dynamics of urban vitality that are not directly addressed in this study.

Focusing more clearly on the spatial measurement and modeling of urban vitality using environmental indicators and interpretable machine learning.

Enhancing the logical structure to ensure a smoother flow from problem statement, literature gap, methodological approach, to research objectives.

As you rightly pointed out, our current study does not incorporate a temporal dimension of urban vitality. Therefore, we have removed discussions on dynamic capture methods to avoid any potential confusion or misalignment with the main content.

We appreciate your suggestion, which helped us clarify the paper’s scope and improve its coherence.

Below are the additions. Please refer to the section 1 part of the revised manuscript.

Point 4. All data collection time, timeliness should be added in detail.

Response 4: Thank you for your suggestion. We agree with the reviewer. Thank you for your insightful comment regarding the temporal alignment of the datasets.

Below are the additions. Please refer to the section 2.2 part of the revised manuscript.

Point 5. What is the basis and thinking behind the selection of selected environment variables in Figure 2?

Response 5: Thank you for your suggestion. We agree with the editor. Thank you for raising this important question. The selection of environmental variables in our study was based on the following considerations:

(1) Theoretical foundation: We referred to classic and recent literature on urban vitality and built environment research , which emphasize that factors such as land use mix, greenness, building density, accessibility, and population density have significant effects on urban vitality.

(2) Data availability and spatial granularity: We prioritized variables that could be obtained at fine spatial resolution across the entire study area, such as NDVI, density, road network density, and housing prices, ensuring that all selected indicators are measurable at the grid level.

(3) Multidimensional representation: We aimed to capture different aspects of the urban environment, including the natural environment (e.g., NDVI), built environment (e.g., floor area ratio, building coverage, road density), and socioeconomic context (e.g., population density, housing price).

(4) Empirical significance: Many of the selected variables have been shown in prior studies to significantly correlate with human activity patterns or urban vitality (e.g., Li et al., 2021; Long & Huang, 2022).

A summary table of all variables, their definitions, and references has been added in the revised manuscript (see Table X) to clarify the rationale behind their selection.

Reference:

(1) Unraveling nonlinear effects of environment features on green view index using multiple data sources and explainable machine learning

(2) Spatial differentiation characteristics and influencing factors of the green view index in urban areas based on street view images: A case study of Futian District, Shenzhen, China

Below are the additions. Please refer to the revised manuscript.

Point 6. The logical relationship between chapters 2.5.1,2.5.2,2.5.3 is confusing, and the content covered in Table 1 is interspersed throughout the chapters.

Response 6: Thank you for your suggestion. We agree with the editor. Thank you for your valuable comment. We acknowledge that the logical flow among Sections 2.5.1, 2.5.2, and 2.5.3 was previously unclear. In the revised manuscript, we have restructured these subsections to ensure a clearer and more coherent presentation of environmental, socioeconomic, and perceptual variables.

Regarding Table 1, we would like to clarify that it provides an integrated summary of all variables used in our study, which are discussed across the three subsections. This unified table was designed to avoid redundancy and to offer readers a consolidated view of all indicators and their data sources.

These changes aim to improve the consistency between the table and the structure of the text, and to enhance the overall readability and logical coherence of the methodology section.

Below are the additions. Please refer to the revised manuscript.

Point 7. 8.1 In the analysis of the results, the spatial distribution characteristics of urban vitality are too shallow, staying only in the simple description of the data and the simple portrayal of the range of values, which need to be further refined and summarized. Meanwhile, the terms historic core area, heritage sites, dense historic residential neighborhoods, and regional commercial hubs lack a clear geographic point of reference and are difficult to understand for readers who are not familiar with Beijing. In addition, how can the results of the analysis of urban vitality be verified? Can the results of the analysis of typical high/low urban vitality areas be cross-validated by street view images, satellite impacts, or other means? To enhance the credibility of the results.

Response 7: Thank you for your suggestion. We agree with the reviewer. Thank you for pointing out the need for a more in-depth interpretation of the spatial distribution of urban vitality. We have revised the manuscript to enhance this section in two ways:

Improved Spatial Interpretation: We moved beyond simple descriptive statistics to highlight meaningful spatial clustering patterns and regional contrasts. We now explicitly discuss areas of concentrated high and low vitality, linking them to urban functional zones and socio-economic characteristics.

Added Typical Case Examples for Cross-Validation: To enhance readers’ understanding and improve the credibility of our results, we have added six representative areas in the spatial distribution map (Figure X):

High vitality areas: Beijing CBD (Guomao), Zhongguancun, CapitaLand Mall, and Wangjing Community.

Low vitality areas: Nanyuan Forest Wetland Park and Haitang Park.

These locations were selected to provide geographically grounded examples of both ends of the vitality spectrum.

Additionally, we clarified references to “historical core areas,” “heritage sites,” and “commercial centers” by either adding geographical markers or more detailed location descriptions to assist international readers unfamiliar with Beijing.

Below are the additions. Please refer to the section 4.1 part of the revised manuscript.

Point 8. The analysis of the impact mechanism and the argumentation of potential causes in Discussion lacks the support of literature, and the specific high and low value areas are not mentioned. It is better to analyze the potential impact mechanism and put forward optimization suggestions with the results of 4.4 analysis.

Response 8: Thank you for your suggestion. We agree with the reviewer. Figure 5 illustrates the spatial distribution of urban vitality within the study area enclosed by Beijing’s Fifth Ring Road, revealing a clear radial gradient. Urban vitality forms a high-value concentric belt around the historic core, gradually declining toward the periphery. This pattern is consistent with previous studies indicating central urban areas typically exhibit higher vitality due to the concentration of economic, cultural, and transportation resources.

Specifically, the central high-vitality zone (UV values 50.87–83.34), shown in dark orange and red, includes areas such as Guomao (CBD), Zhongguancun, and Financial Street. These districts are characterized by intensive economic activities, dense public transport networks, high building density, and vibrant street life — all positively correlated with vitality as shown in Section 4.4 (e.g., high AHP and PR values). This reflects the agglomeration effect of urban functions.

The intermediate zone (UV 24.91–50.87), depicted in yellow and light orange, includes historic neighborhoods like the Dashilan area and regional commercial centers such as Wangjing and Wudaokou. These areas maintain moderate vitality due to mixed land use and active pedestrian environments, though slightly constrained by limited space or aging infrastructure.

Peripheral areas (UV 6.06–24.91), shown in green, such as Nanyuan Wetland Park and Haitang Park, have relatively low urban vitality. These zones often consist of suburban ecological spaces and low-density residential areas, aligning with findings in Section 4.4 where DG and NDVI exert negative effects at certain thresholds. This suggests that excessive greening or sparse development might reduce population density and pedestrian flow, thereby lowering vitality.

Below are the additions. Please refer to the revised manuscript.

Point 9. The names of the charts in the whole text need to be proofread one by one, and some of the charts are too briefly named and inaccurately described.

Response 9: Thank you for your suggestion. We agree with the reviewer. Thank you for pointing out the issue regarding the figure and table captions. We have carefully reviewed and revised all figure and table titles throughout the manuscript to ensure they are accurate, informative, and appropriately detailed.

Below are the additions. Please refer to the revised manuscript.

Thank you for the valuable comments from the editor, I will work harder in the future. I wish you good health!

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The author has made revisions according to the comments.

Author Response

Response to Comments on the Manuscript:

“Decoding Urban Vitality: Leveraging Big Data Analytics and Interpretable Machine Learning to Uncover Environmental

Factors”

May 22, 2025

-------------------------------------------------------------------------------------------------------

Response to comments by Reviewer #1：

Dear Reviewer,

Thank you very much for your recognition and the opportunity to publish our work. We sincerely appreciate the time and effort you devoted to reviewing our manuscript, as well as the insightful comments and suggestions that helped us improve the quality of the paper. Wishing you continued success in your scholarly endeavors and all the best.

Warm regards！

Article Menu

Explaining Urban Vitality Through Interpretable Machine Learning: A Big Data Approach Using Street View Images and Environmental Factors

Further Information

Guidelines

MDPI Initiatives

Follow MDPI