Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai

Hu, Ziyu; Xu, Weizhen; Lu, Zekun; Sun, Tongyu; Liu, Yuxiang

doi:10.3390/buildings16101999

Open AccessArticle

Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai

by

Ziyu Hu

¹,

Weizhen Xu

¹,

Zekun Lu

²

,

Tongyu Sun

^1,* and

Yuxiang Liu

³

¹

College of Urban Planning and Architecture, Tongji University, Shanghai 200092, China

²

College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou 350100, China

³

Faculty of Forestry & Environmental Stewardship, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(10), 1999; https://doi.org/10.3390/buildings16101999

Submission received: 9 April 2026 / Revised: 9 May 2026 / Accepted: 14 May 2026 / Published: 19 May 2026

(This article belongs to the Special Issue Urban Wellbeing: The Impact of Spatial Parameters—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Streets are major public spaces in high-density cities, and their visual environments are closely related to shaping emotional experience and wellbeing. However, existing studies often examine macro-scale urban form and pedestrian-level streetscape perception separately, while paying limited attention to nonlinear relationships and spatial heterogeneity. This limits the evidence available for fine-grained urban renewal in high-density contexts. Focusing on the area within Shanghai’s Outer Ring, this study develops a large-scale street-view dataset of 512,764 Baidu Street View images. Six perceptual dimensions—safety, lively, beautiful, wealthy, boring, and depressing—are estimated using a perception model trained on Place Pulse 2.0 and integrated into a composite Psychological and Emotional Index (PEI). XGBoost–SHAP is used to examine nonlinear relationships and threshold effects between perceptions and environmental indicators, while MGWR is employed to capture spatial nonstationarity and scale-dependent effects. The results show significant spatial heterogeneity and positive spatial autocorrelation across the six perceptual dimensions and the PEI. Compared with traditional morphological indicators, visual features showed stronger explanatory power and clearer threshold effects. Population density acts as a globally stable negative factor, whereas visual entropy and mixture show strong local sensitivity. These findings provide a data-driven basis for identifying context-specific priorities in urban renewal and spatial governance in high-density cities.

Keywords:

nonlinear effects; spatial heterogeneity; street view imagery; streetscape perception; explainable machine learning

1. Introduction

Presently, the global landscape is characterized by intense urbanization, with over 50% of the population and 70–80% of economic output centralized within urban centers. While this concentration fosters developmental opportunities, it simultaneously exacerbates sustainability challenges [1,2]. Within this framework, the United Nations Sustainable Development Goals (SDGs)—specifically SDG3 (Good Health and wellbeing) and SDG11 (Sustainable Cities and Communities)—have emerged as core priorities. These goals underscore a pivotal shift in urban governance: moving beyond the mere optimization of physical infrastructure to actively addressing residents’ subjective experiences and emotional wellbeing [3,4]. Compared to their rural counterparts, urban dwellers are exposed to heightened psychological stressor This study focuses and emotional burdens [5]. Consequently, the built environment transcends its role as a mere physical substrate for human activity; it operates as a dynamic field of environmental cues that can either trigger psychological stress or facilitate emotional restoration.

Extant literature typically elucidates the relationship between urban environments and mental health through dual lenses: social and physical. Social dimensions encompass factors such as socioeconomic status, social capital, and isolation [5,6]. Conversely, physical dimensions involve air pollution, noise exposure, and urban morphology [7,8]. Beyond these direct influences, the physical environment modulates emotional responses via individual perceptions of environmental cues, carrying long-term implications for mental health. In this context, the strategic utilization of urban public spaces and engagement in physical activities serve as vital protective mechanisms [7,8,9]. Attention Restoration Theory (ART) [10] and Stress Reduction Theory (SRT) [11] further suggest that specific visual elements in urban environments—such as vegetation and open spaces—possess substantial restorative potential. By offering soft fascination, these elements evoke positive emotional responses and mitigate cognitive fatigue caused by cumulative stress and sustained attentional demands [12]. As the most ubiquitous public spaces in daily life, streets continuously expose residents to high-frequency visual and spatial cues, thereby exerting a persistent influence on emotional perception, quality of life, and holistic wellbeing [13,14]. Consequently, examining how street environments shape emotional states from a human-scale pedestrian perspective is important for clarifying the complex links among urban environment, emotional perception, health, and wellbeing [15]. In this study, street-level emotional perception is conceptualized as a proximal experiential dimension associated with broader outcomes such as quality of life and urban wellbeing, while not being equivalent to those broader constructs.

Compared with traditional questionnaire-based surveys—which are frequently constrained by limited sample sizes, fragmented spatial continuity, and low reproducibility—the integration of large-scale Street View Imagery (SVI) and deep learning offers a scalable and reproducible paradigm for urban perception research [16]. Platforms like Google Street View and Baidu Street View (BSV) provide exhaustive coverage across numerous cities, establishing a robust empirical foundation [17]. Existing studies have used such data and machine learning to examine restorative evaluation [18], urban playability [19], and residential visual perception [20], while seeking to balance global transferability with local sensitivity [21]. Nonetheless, the influence of urban environments on emotion is inherently scale-dependent and perceptually hierarchical [22]. At the macro-scale, research typically focuses on the extent of blue-green spaces, population density, network connectivity, and functional accessibility, which collectively define the broader spatial context of emotional perception. At the micro-scale, attention has increasingly shifted to variables such as building density, interface continuity, and street furniture, which have been identified as pivotal determinants of emotional perception [23,24]. Recent advances in street view imagery and computer vision have further enabled the model-based estimation of emotional perception from urban scenes. In high-density Asian cities, studies have predicted emotional dimensions such as Safety, Beauty, Depression, and Liveliness, linking them to land use patterns, walkability, and public health outcomes [25,26]. Complementary evidence from European cities suggests that features such as street-level vegetation, sky view factor, and façade openness are effective predictors of perceived safety and esthetic satisfaction [27,28].

Despite these advances, several limitations remain. First, macro-spatial structures and micro-visual perceptions are often examined in isolation, limiting cross-scale interpretation of how different layers of the built environment relate to street experience. Second, many studies rely on linear assumptions, which may obscure threshold or saturation effects in high-density settings. Third, the common assumption of spatial stationarity hinders the interpretation of spatial heterogeneity. Finally, existing frameworks remain predominantly rooted in Western urban contexts; thus, their applicability to Chinese megacities warrants systematic validation.

Shanghai represents a prototypical high-density megacity, with a population density exceeding 23,000 persons per km² in its central districts. Guided by the Shanghai Urban Renewal Regulation (2021) [29], the city is actively marking a critical transition from incremental expansion to quality-oriented renewal. Within such contexts, broad improvements in urban form are often difficult to implement uniformly, and greater planning attention has shifted toward fine-grained interventions in everyday public spaces. The area within Shanghai’s Outer Ring is therefore a useful setting for examining how different urban environmental indicators are associated with street-level emotional perception under conditions of high density, functional complexity, and strong intra-urban spatial differentiation. Against this background, this study addresses three key research questions: (1) How can SVI and deep learning develop an interpretable human-scale Psychological and Emotional Index (PEI) that captures multidimensional semantics? (2) How are macro-scale morphology and micro-scale visual features differentially associated with emotional perception, and do these associations exhibit nonlinearities or perceptual thresholds? (3) How do scale effects and spatial nonstationarity shape the environment–emotion relationship? Furthermore, which environmental indicators function as global drivers, and which exhibit local heterogeneity? This study provides empirical evidence on how urban environmental indicators are associated with street-level emotional perception in Shanghai, contributing to ongoing discussions of urban regeneration, spatial governance, and urban wellbeing in high-density cities.

2. Methods

2.1. Study Area and Data Sources

2.1.1. Study Area

This study focuses on the area within Shanghai’s Outer Ring (31°07′ N–31°22′ N, 121°20′ E–121°38′ E), encompassing an area of 662.19 km² (Figure 1). As Shanghai’s functionally mature urban core, this study area includes key central districts (e.g., Huangpu, Xuhui, and Jing’an) and hosts Shanghai’s highest density of residential, commercial, and public service facilities. The road network follows a typical “ring-and-radial” topology, facilitating highly heterogeneous urban morphologies. Given that these streets act as primary conduits for human activity, their human-scale physical attributes profoundly shape daily spatial experiences in this high-density metropolis. This region therefore provides an appropriate setting for examining the interactions between multidimensional street environments and human emotional perceptions.

2.1.2. Street View Data Collection

SVI was acquired from the BSV open platform. Road centerline data were initially extracted from OpenStreetMap (OSM), with a selective retention of major roads, secondary roads, and local streets. Conversely, expressways, elevated highways, and ramps lacking pedestrian-accessible interfaces were systematically excluded. Sampling points were generated at 50 m intervals along the retained road centerlines. For each sampling point, four Street View images were automatically retrieved via the BSV API, across cardinal directions (0°, 90°, 180°, and 270°). These images were archived in JPEG format at a standardized resolution of 600 × 400 pixels, ensuring sufficient granularity for subsequent semantic segmentation and CNN-based feature extraction.

The dataset was organized within a hierarchical “city–road–location–direction” architecture. This was complemented by extensive metadata—including unique identifiers, geographic coordinates, viewing orientations, timestamps, and API response status—to facilitate rigorous traceability. A stringent quality control protocol was implemented to eliminate images compromised by severe occlusion (e.g., promotional billboards, construction fencing), inadequate nocturnal illumination, or data corruption. In instances of failed API calls, missing data were addressed through neighboring-point interpolation or iterative retrieval strategies. Ultimately, the quality-controlled database comprised 512,764 valid images derived from 128,191 discrete sampling points, constituting a spatially continuous visual record of the study area. Detailed workflow for data collection is provided in Appendix A.1 and Figure A1.

2.1.3. Street-Level Perceptions Assessment

Human wellbeing and behavior are influenced by the surrounding environment and sense of place, with perception awareness of place being an important attribute affecting social sustainability [30,31]. Models for streetscape perception assessment were established using the MIT Place Pulse 2.0 dataset. This dataset comprises six fundamental perceptual dimensions—Safety, Lively, Beautiful, Wealthy, Boring, and Depressing—and is grounded in a pairwise comparison framework from environmental psychology. Through large-scale crowdsourcing, it collects public perceptual judgments of urban scenes worldwide. Empirical evidence suggests that human perception of urban landscapes maintains high cross-cultural consistency; factors such as age, gender, income, and ethnicity exert no statistically significant variation on perception scores [32]. As the original Place Pulse 2.0 labels are based on pairwise preferences rather than direct numerical ratings, this study followed established procedures [33] to transform the pairwise comparison outcomes into continuous perceptual intensities. To enhance label reliability and eliminate ambiguous human evaluations, a Support Vector Machine (SVM) classifier with a radial basis function (RBF) kernel was trained to predict continuous emotion values for each image based on semantic features. The four directional images at each sampling point were averaged to obtain point-level emotional perception scores. Detailed procedures for Q-score calculation, label filtering, model training are provided in Appendix A.2.

To capture the overall perceptual quality of street environments, this study constructed a PEI based on the six normalized perceptual dimensions. Before aggregation, the negative dimensions, namely Boring and Depressing, were reverse-coded so that higher values consistently indicated more positive perception. The PEI was calculated as follows:

P E I_{i} = \frac{P_{Safe}^{i} + P_{Lively}^{i} + P_{Beautiful}^{i} + P_{Wealthy}^{i} + (1 - P_{Boring}^{i}) + (1 - P_{Depressing}^{i})}{6}

(1)

where

P E I_{i}

denotes the composite perceptual score of image i;

P_{A t t r i b u t e}^{i}

represents the normalized probability score for a given perceptual attribute (e.g., Safe, Lively); and the terms

(1 - P)

reverse the direction of negatively valenced perceptions. Here,

i

= 1, 2, 3, …, 128,191. The equal-weighted linear aggregation was adopted because all six perceptual dimensions were derived from the same Place Pulse 2.0 evaluation framework. Moreover, there is currently insufficient theoretical evidence to justify assigning greater weight to any single dimension. Therefore, equal weighting was used to ensure the transparency, interpretability, and reproducibility of the PEI construction. As a supplementary diagnostic check, we also explored PCA-based aggregation. However, the first two principal components showed comparable explanatory contributions, suggesting that the six perceptual dimensions could not be reduced to a single dominant latent dimension without substantial information loss. Therefore, PCA was not adopted as the primary aggregation strategy. To further examine whether the equal-weighted PEI was dominated by any single perceptual dimension, we conducted a leave-one-dimension-out sensitivity check. The alternative indices remained highly correlated with the original PEI, indicating that the composite pattern was not overly dependent on one specific attribute. Nevertheless, the PEI should be interpreted as a synthetic perceptual proxy rather than a fully validated psychometric scale.

2.2. Construction of Urban Environmental Indicators

To encapsulate the multi-dimensional complexity of streetscapes in high-density megacities, this study established an integrative framework comprising six 2D morphological indicators (NDVI, NDWI, building density, population density, road density, BtA500) and eight SVI-derived visual features (e.g., greenery, enclosure, visual entropy). The 2D indicators delineate the fundamental planar configurations of both natural and built environments, which were computed directly at the grid level via zonal statistics or spatial overlay; conversely, the visual features were derived from SVI and represent semantic and scene-level visual characteristics that approximate a visual perceptual experience for pedestrians. The detailed descriptions, data sources, and calculation formulas for all indicators are provided in Appendix A.3 and Table A1.

Visual features were extracted using Mask2Former (pre-trained on Mapillary Vistas) to identify 65 object categories. Previous research has shown that restorative quality and emotional perception are influenced by visual features operating at different hierarchical levels [34]. Based on prior studies [35], eight indicators were constructed across two dimensions: semantic-level (pixel proportions of elements like sky/vegetation) and scene-level (overall complexity and spatial depth). Definitions and formulas are detailed in Table 1.

To harmonize the spatial scale of multi-source datasets, the study area was partitioned into 4219 regular hexagonal grids with a side length of 250 m. Hexagonal units were selected to minimize the Modifiable Areal Unit Problem (MAUP) and directional bias [36].

2.3. XGBoost–SHAP

To examine the nonlinear driving mechanisms and complex coupling relationships between multidimensional urban morphological indicators and PEI, an interpretable analytical framework integrating the Extreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP) was employed. PEI and six perceptual dimensions served as response variables, with six 2D indicators and eight visual indicators as predictors. The model was implemented in Python v3.13.11, using 5-fold cross-validation for hyperparameter tuning. SHAP values quantified feature importance, while dependence and interaction analyses identified nonlinear patterns and moderating effects. Mathematical details are provided in Appendix A.4.

2.4. Multiscale Geographically Weighted Regression (MGWR)

Spatial autocorrelation analysis and multiscale geographically weighted regression (MGWR) were employed to examine the multiscale effects of street-level visual features on perception. First, global Moran’s I and Local Indicators of Spatial Association (LISA) were used to confirm the overall spatial dependence and identify local spatial clusters of perception, respectively. Subsequently, ordinary least squares (OLS), geographically weighted regression (GWR), and MGWR models were estimated sequentially. OLS served as a benchmark to evaluate global average effects and to verify the absence of multicollinearity among indicators using the variance inflation factor (VIF < 10). MGWR extends conventional GWR by allowing an adaptive, variable-specific optimal bandwidth (BW), thereby capturing spatially varying relationships and scale effects more flexibly [37]. Model estimation was implemented in MGWR v2.2.1, and the optimal BW for each predictor was automatically selected using the corrected Akaike Information Criterion (AICc). The detailed mathematical specification of the MGWR model and the definitions of spatial autocorrelation are provided in Appendix A.5.

2.5. Research Workflow

The workflow consisted of 6 steps (Figure 2): (1) collecting four-direction BSV images at 50 m intervals; (2) constructing a multi-source indicator system, including 2D environmental indicators derived from remote sensing, GIS, and built-environment datasets, as well as SVI-derived visual features extracted from BSV images, and aggregating these indicators into 250 m hexagonal grids; (3) estimating perception scores based on the MIT Place Pulse 2.0 dataset and six perceptual dimensions; (4) identifying nonlinear patterns using XGBoost–SHAP; and (5) analyzing multiscale local effects via MGWR after confirming spatial heterogeneity; (6) deriving exploratory urban design implications for context-sensitive micro-renewal based on the identified nonlinear associations and spatial heterogeneity.

3. Results

3.1. Spatial Distribution of Streetscape Emotional Perception

3.1.1. Spatial Distribution and Local Spatial Autocorrelation

Predicted perceptions exhibit distinct spatial structures (Figure 3a–f): positive perceptions (Safety, Lively, Wealthy) cluster in the east and northeast as “high-value belts,” contrasting with the southern/southwestern low-value zones. Lively forms a prominent corridor along the eastern bank of the Huangpu River and the Inner Ring, while central and southeastern zones exhibit fragmented low-value patches. The Wealthy high score also clusters east of the Huangpu River with a more compact spatial morphology. The Beautiful score shows distinct hotspots in the northwest, southwest, and southeast subregions. In contrast, negative perceptions cluster in the central and western parts. The Depressing score forms a relatively continuous low-value belt in the southwest, while the Boring score’s low values are more sporadically distributed across scattered patches.

Despite significant global spatial autocorrelation across all indicators (Table A2), local manifestations (LISA, Figure 3g–l) reveal distinct regional variations. Beautiful and Safety exhibit the most extensive spatial clustering, with their High–High (H–H) clusters largely concentrated in the northeastern region and along the east bank of the Huangpu River, while their Low–Low (L–L) clusters are widely and contiguously distributed across the southern region. Lively and Wealthy present similar distribution patterns, with their H–H clusters primarily concentrated outside the middle ring in the northeast. In contrast, the negative dimensions display a smaller clustering extent and a lack of spatial contiguity. Specifically, the H–H clusters for Depressing are scattered across the western and central areas within the inner and middle ring zones, while its L–L clusters are mainly located along the eastern and southeastern outer margins. Meanwhile, the L–L clusters for Boring are sporadically scattered across the northern areas outside the middle ring.

3.1.2. Spatial Distribution and LISA of PEI

Across the delineated study area, the spatial distribution of the PEI reveals pronounced regional heterogeneities (Figure 4). High PEI values are predominantly clustered within the northeastern urban core and southern peri-urban fringes, whereas low PEI values are primarily localized along the marginal belts of the central, western, and southwestern zones. Rather than following a simple concentric gradient, the surface formed several relatively continuous high-value corridors, including one along the eastern bank of the Huangpu River and others near the southern Outer Ring and northern Middle Ring. Global spatial autocorrelation analysis confirms that PEI is significantly positively spatially autocorrelated (Global Moran’s I = 0.459, z = 47.24, p < 0.001), indicating spatial clustering of both high and low PEI values. LISA results further elucidate that High–High (H–H) clusters are spatially contiguous, forming a north–south high-value axis east of the Huangpu River. Conversely, Low–Low (L–L) clusters are sporadically distributed along the peripheral boundaries of the central and southwestern zones. Furthermore, High–Low (H–L) and Low–High (L–H) spatial outliers are observed, mainly distributed in transition zones between high-value and low-value clusters.

3.2. Results of XGBoost–SHAP

3.2.1. Relative Importance of Environmental Drivers

For positive perception dimensions, visual features had greater relative importance than the 2D indicators in the SHAP analysis (Figure 5a–d). In Beautiful, Greenery was the most influential feature, whereas higher Color complexity and very high Greenery levels were associated with lower SHAP contributions; visual entropy showed a generally positive association. In Safety, both Greenery and Visual entropy showed positive contributions, whereas NDVI exhibits a negative correlation characterized by a distinct threshold effect. In Lively, Greenery and Color complexity were the main contributors; conversely, NDVI consistently acts as a suppressive factor. In Wealthy, NDVI and Population density had the highest importance, both displaying overall negative contributions, while Visibility shows a positive association with perceived affluence. For the negative perception dimensions, Mixture was the strongest contributor to Boring, whereas Greenery and Visual entropy were associated with lower boredom scores (Figure 5e,f). In Depressing, Openness had the largest contribution and was negatively associated with the predicted score. Notably, Greenery exhibits a paradoxical positive association with Depressing, suggesting a counterintuitive effect; however, given the proxy nature of the outcome and the possibility of scene-type and model-related bias, this pattern should be interpreted cautiously.

The SHAP results further showed that visual indicators had greater relative importance than the 2D indicators in PEI prediction (Figure 5g). Visual entropy had the highest overall importance and was generally associated with higher PEI contributions. In contrast, Color complexity, Mixture, and Visibility generally exhibit negative contribution tendencies. Among 2D morphological indicators, although NDVI and Population density demonstrate high relative importance, their elevated values are predominantly associated with negative SHAP values, indicating that excessively dense vegetation or Population concentrations might be linked to lower PEI in high-density urban environments.

3.2.2. SHAP Partial Correlation Dependency Analysis

The SHAP partial dependence plots (Figure 6a–f) revealed pronounced nonlinear associations between several environmental indicators and PEI. SHAP values for Visual entropy rose sharply after approximately 7.50, whereas Color complexity (threshold = 265,554.92) and Visibility (threshold = 0.18) exhibit relatively stable SHAP values below their respective thresholds but rapidly shift to negative and continue to decline once these thresholds are exceeded, reflecting a typical overload-induced inhibitory effect. Mixture displays a non-monotonic pattern, with its partial dependence curve forming an approximately inverted U-shape and reaching a peak around 27.00. Among 2D morphological variables, NDVI exhibits a clear threshold attenuation effect at 0.42, beyond which its contribution declines. Population density was generally associated with lower PEI values, but its negative contribution flattened after roughly 7604.16 persons/km², suggesting a diminishing marginal stress effect.

3.2.3. Interaction Effects Among Environmental Drivers

The three strongest SHAP interaction pairs (Figure 6g–i) indicated nonlinear joint associations among environmental indicators, with Visual entropy acting as a key moderator. Color complexity exerts a negative marginal effect at low levels of visual entropy (<~7.4), but becomes a positive moderator at higher levels, strengthening the contribution of Visual entropy to PEI. Walkability exhibits the strongest positive contribution to PEI within the range of 0.10–0.15, and this effect is further amplified under high Visual entropy conditions. The interaction between Population density and Visual entropy is primarily concentrated in low-density areas (<5000 persons/km²), where high Visual entropy significantly enhances PEI; this moderating pattern weakened as density increased beyond about 7000 persons/km².

3.3. Results of MGWR

Building upon the XGBoost–SHAP findings, the six variables exhibiting the highest contribution to the PEI were integrated into the MGWR model to elucidate the spatial heterogeneity of key drivers. Compared with the global OLS model, MGWR substantially improves the explanatory power for PEI (Adj. R² = 0.542), while markedly reducing both the residual sum of squares and the AICc value (Table A3). The MGWR results indicate pronounced scale differentiation among explanatory variables. Visual entropy (BW = 74) and Mixture (BW = 123) operate at highly localized spatial scales, whereas Color complexity (BW = 257), Visibility (BW = 259), and NDVI (BW = 326) exhibit medium-scale spatial influences. In contrast, Population density shows a BW close to the full sample size (BW = 4012), indicating a predominantly global-scale effect (Table 2). The direction and magnitude of local coefficients vary substantially across space (Figure 7). Visual entropy and Mixture display a patchy pattern characterized by frequent sign reversals, with Visual entropy exerting strong positive effects in selected core street areas. The positive influence of Visibility is mainly concentrated in the southeastern and western margins of the study area but shifts to negative effects within highly compact central districts. NDVI shows significant positive associations in the southwestern fringe and localized northwestern areas, while its influence weakens or turns negative in the high-density inner-core zones. Population density consistently exhibits negative coefficients across the entire study area, with minimal spatial variation in effect magnitude.

4. Discussion

4.1. Nonlinear Driving Mechanisms of Environment Indicators on Emotional Perception

Consistent with Attention Restoration Theory (ART), our findings suggest that street-level emotional perception in Shanghai is associated with nonlinear responses to visual features, notably Visual entropy and Mixture, rather than linear accumulation [38]. It should be noted that the outcome variables in this study represent model-estimated comparative perception proxies derived from Street View Imagery and Place Pulse 2.0, rather than directly observed psychological states or real-time emotions. Within this analytical boundary, the results indicate that visual features show stronger explanatory power than 2D morphological structure indicators, reinforcing the importance of a human-scale perspective in evaluating perceived street environments in high-density urban contexts [39].

Perception dimensions respond differently to environmental attributes: positive perception dimensions (Safety, Beautiful) are more strongly associated with visual richness and visibility, which aligns with research suggesting that visual complexity enhances environmental attractiveness and legibility [40], while negative perception dimensions are more closely linked to spatial structure. The role of Greenery is particularly dimension-dependent. It is positively associated with Safety and Lively and negatively associated with Boring, but it also shows a nonlinear or even counterintuitive association with Beautiful, Wealthy, and Depressing. This suggests that street-level greenery is not necessarily “the more, the better” [41]. This does not mean that vegetation itself produces negative emotional perception; rather, it highlights the distinction between green quantity and green quality, disorganized greenery can obscure building façades and urban activities, weakening perceived order and creating oppressive, enclosed visual experiences [42].

Visual entropy emerges as the most influential predictor of the PEI in the SHAP analysis, with its contribution increasing markedly beyond approximately 7.50. This pattern reflects a preference for “ordered complexity” in high-density areas. Similar evidence from Fuzhou also identifies visual entropy as an important determinant of perceived street quality [35]. By contrast, Color complexity and Visibility show overall negative associations with PEI when they exceed certain thresholds, likely attributable to information overload in compact urban environments, which increases cognitive load and perceptual stress [43]. NDVI exhibits a threshold attenuation effect around 0.42, suggesting that excessive vegetation in compact spaces reduces visual permeability and accessibility, thereby diminishing restorative capacity [44]. Consistent findings [45] highlighted a critical threshold (NDVI = 0.40) beyond which health benefits diminish, corroborating that the restorative benefits of green space are nonlinear and scale-dependent. Functional Mixture follows an inverted U-shape (optimal range 20–27), while excessive mixture may be associated with perceptual disorder, pedestrian conflicts, or overstimulation [46]. Population density is generally negatively associated with PEI, likely by amplifying psychological stress and perceived crowding [47], but its negative impact attenuates beyond 7604.16 persons/km². This finding is consistent with studies showing that high density does not necessarily translate into lower subjective wellbeing and may even coexist with positive quality-of-life outcomes under certain urban conditions [48]. These findings highlight the strong contextual dependency of density–emotion relationships. In Shanghai, residents may have developed psychological adaptation or perceptual desensitization to extreme density, shifting the focus of emotional improvement from macro-density reduction to micro-scale restorative interventions [49].

4.2. Spatial Nonstationarity and Multiscale Heterogeneity of Environmental Indicators

The MGWR results offer a way to interpret how environmental indicators are associated with emotional perception at different spatial scales. The variable-specific bandwidths suggest a hierarchical pattern, ranging from macro structural constraints to micro context-sensitive effects. This provides a quantitative foundation for formulating differentiated and scale-sensitive urban renewal strategies. Population density exhibits global stationarity with consistently negative coefficients across the study area, indicating that population concentration is broadly associated with lower PEI in Shanghai’s central urban area. This pattern may reflect that high population concentration generates persistent feelings of crowding and pressure on public resources, forming a structural negative baseline for emotional perception in high-density cities [50]. Conversely, Visual entropy and Mixture exhibit high local variability and micro-scale sensitivity. For instance, In constrained historic areas like Huangpu, high visual complexity may be perceived as cluttered or overloaded when it overlaps with narrow streets, dense signage, fragmented interfaces, and heavy pedestrian activity [51]. In newly planned areas such as parts of Pudong, however, higher visual entropy may help reduce morphological monotony and enhance perceived richness. These small bandwidths signal high design sensitivity and governance costs, necessitating fine-grained, context-specific renewal strategies.

NDVI operates at a regional scale (BW = 326), and its association with PEI weakens or becomes negative in parts of the dense urban core. In northern industrial or underdeveloped areas, high NDVI may coincide with vacant land, residual greenery, poor maintenance, or limited pedestrian accessibility, which could be perceived as abandonment rather than restoration. In dense central areas, vegetation may also obscure façades, reduce sightlines, or intensify perceived enclosure. Thus, meso-scale environmental governance should prioritize the alignment between land use attributes and landscape functions, emphasizing the quality, usability, and accessibility of green spaces rather than the sheer quantity of vegetative cover [52].

4.3. Implications for Urban Design

The findings have implications for human-scale urban design and for discussions of urban wellbeing, while given the correlational nature of this study and the use of a model-estimated street-level perceptual proxy, these implications should be understood as exploratory and indicative rather than prescriptive planning recommendations. The PEI captures model-estimated street-level emotional perception, which may be related to broader wellbeing and quality of life but should not be interpreted as a direct measurement of psychological states or wellbeing outcomes. Therefore, the results are better understood as providing preliminary perceptual evidence that can inform micro-scale urban renewal. Rather than proposing an operational workflow or direct intervention rules, this section discusses how the identified nonlinear associations and spatial heterogeneity may provide a reference for future context-sensitive design assessment, field validation, and participatory urban renewal. This analytical framework, linking mechanism identification with spatial targeting, provides an indicative reference for shifting governance in megacities from coarse, large-scale interventions and toward fine-grained, context-sensitive micro-renewal. On this basis, three exploratory directions are proposed.

(1): Develop perception-oriented design considerations informed by nonlinear thresholds. Design evaluation can refer to identified critical inflection points to avoid inefficient resource allocation, while recognizing that these thresholds are model-derived and require further local validation. For enhancing Beauty and Lively perceptions, micro-renewal interventions may consider the role of Visual entropy, particularly the positive association observed after approximately 7.50, as a tentative indication of the perceptual value of ordered complexity. At the same time, excessive Color complexity may be carefully managed to prevent sensory overload. NDVI shows threshold attenuation around 0.42, indicating that simply increasing vegetation coverage does not necessarily enhance positive perception; counterintuitive effects may arise from scene type, disorganized greenery, occlusion, or enclosed spaces. Accordingly, ecological strategies may prioritize multi-layer planting, permeable layouts, and effective interaction between greenery and building interfaces. Maintaining Mixture within approximately 20–27 categories may be regarded as an indicative range associated with pedestrian vitality in the model, rather than a universal planning criterion. It should be noted that these street-level perceptions are indicative of short-term urban experience and correlate with, but are not equivalent to, broader wellbeing or quality of life.
(2): Apply differentiated, context-sensitive micro-renewal based on spatial heterogeneity. MGWR results indicate that environmental drivers vary across locations, suggesting that one-size-fits-all approaches may be suboptimal. These spatially varying associations can help identify where different streetscape factors may deserve closer attention in future renewal practice. In areas such as Pudong, where Visual entropy is locally high, interventions may consider diversifying street frontage or introducing varied street furniture to enrich visual complexity. In densely built historic cores, such as Huangpu old town, reducing visual clutter may help moderate perceptual overload. In districts where NDVI shows negative associations, efforts could focus on reclaiming fragmented or neglected green spaces into functional micro-spaces. These examples should be viewed as model-informed hypotheses that can inform further site investigation, resident evaluation, and design testing.
(3): Consider a tiered framework in which micro-scale environmental enhancements may help mitigate the perceptual effects of macro-scale density. Given that population density constitutes a global pressure source in the cores of megacities, governance strategies may need to acknowledge high density as a structural condition rather than an anomaly. Enhancing micro-scale environmental quality can potentially support positive urban perception under dense conditions. Granting greater flexibility to street-level micro-renewal initiatives and adopting “one street, one strategy” approaches may improve walking experience and visual order. By introducing locally adapted visual and functional improvements, cities may achieve perceptual benefits in high-density areas. Future applications could further combine model-based evidence with local perceptual validation, participatory assessment, and longitudinal evaluation.

4.4. Limitations and Future Prospects

Although street-view-based analyses reveal nonlinear patterns in urban perception within high-density cities, these findings should be interpreted cautiously within the central urban area of Shanghai. First, the PEI is a model-estimated perceptual proxy derived from a globally trained perception dataset (Place Pulse 2.0). The study did not include local Shanghai-based validation using residents, Chinese raters, on-site surveys, independent photo-rating experiments, or physiological data. Therefore, the dependent variable should be interpreted as a model-estimated street-level perceptual proxy rather than a directly observed psychological or behavioral response. This absence of local validation limits the local validity and cultural transferability of the findings in Shanghai. Future work could extend the analysis to peripheral, suburban, and peri-urban areas to assess whether identified thresholds and spatial mechanisms remain consistent. In addition, the static Baidu Street View dataset captures a single time period without accounting for weather, lighting, or pedestrian flow, which may introduce selection bias and influence observed spatial patterns. Recent studies have demonstrated the potential of wearable devices, including electroencephalography (EEG), electrodermal activity (EDA), heart rate variability (HRV), functional near-infrared spectroscopy (fNIRS), and facial expression recognition, to measure real-time perceptual responses at the individual level [53]. Future work could integrate such multimodal biosensing with macro-scale predictive models and micro-scale urban walking experiments, thereby moving from population-averaged, image-based perception toward more locally validated and individual-level responses.

Second, the inconsistent nonlinear effects of Greenery and NDVI across different perceptual dimensions highlight the limitations of traditional quantity-based indicators in explaining complex perceptual experiences. This finding underscores the need for future data collection to shift from quantitative accumulation toward qualitative differentiation. More refined landscape indices should be introduced to explicitly quantify vegetation configuration, spatial form, and their interfaces with building façades, thereby enabling a more accurate assessment of how green quality, rather than sheer green quantity, shapes perceptual outcomes.

Finally, although the integration of machine learning models with SHAP analysis helps identify nonlinear mechanisms and threshold effects, this data-driven approach remains primarily correlational and cannot fully exclude confounding influences such as socioeconomic context or street-level governance quality. Moreover, several visual indicators are conceptually related, and although their VIF values are below the conventional threshold of 10, residual correlations may still affect the interpretation of MGWR coefficients, particularly at local scales. Robustness checks across alternative grid sizes, different outlier treatments, and alternative spatial model specifications were not performed in the current study. Together with the absence of local validation noted above, these omissions restrict the strength of inference. Accordingly, the identified nonlinear thresholds and spatially heterogeneous associations should be understood as indicative patterns rather than definitive or universally transferable findings. Future research could address these limitations by testing alternative model specifications, grid sizes, or outlier treatments, and by employing quasi-experimental, natural experiment, or longitudinal designs to evaluate robustness and transferability of identified thresholds across diverse urban contexts. Importantly, observed street-level perceptions relate to, but are not equivalent to, broader constructs of wellbeing or quality of life, highlighting the need for caution when extrapolating perceptions to outcomes.

5. Conclusions

This study examined street-level emotional perception within Shanghai’s Outer Ring using more than 500,000 Baidu SVI and a composite PEI integrated with 2D morphological indicators and visual features at a fine-grained hexagonal scale. Three main conclusions can be drawn. First, the PEI and the six perceptual dimensions showed significant spatial heterogeneity, with high-value areas forming corridor-like clusters and low-value areas appearing more fragmented in transitional and marginal urban zones. Second, compared with conventional 2D morphological indicators, visual features showed stronger explanatory power for street-level perception. In particular, visual entropy emerged as the most influential predictor, while color complexity, visibility, mixture, NDVI, and population density exhibited clear nonlinear or threshold-limited associations. These findings suggest that improving street-level perception in high-density areas depends less on the simple accumulation of greenness, density, or functional mixture than on maintaining an appropriate level of ordered visual complexity. Third, environmental indicators operated at different spatial scales. Population density functioned as a relatively global structural pressure, whereas visual entropy and mixture were more localized and context-sensitive. This scale differentiation indicates that urban renewal should move beyond one-size-fits-all interventions and instead adopt context-specific micro-renewal strategies. Overall, this study demonstrates the value of integrating explainable machine learning with spatial heterogeneity analysis to identify context-specific priorities for fine-grained urban renewal. Given that the PEI is a model-estimated perceptual proxy derived from static street view imagery, the findings should be interpreted as perceptual evidence rather than direct measures of psychological states or wellbeing.

Author Contributions

Conceptualization, Z.H., W.X. and Y.L.; Methodology, Z.H., W.X. and Z.L.; Software, Z.L.; Validation, Y.L.; Resources, Z.L.; Writing—original draft, Z.H.; Writing—review & editing, Z.H., W.X. and T.S.; Visualization, Z.H.; Supervision, W.X. and T.S.; Funding acquisition, T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [Grant No. 52378068].

Data Availability Statement

The datasets analyzed during the current study are available in the following public repositories: Street View imagery was acquired via the Baidu Maps Panorama Static API provided by the Baidu Map Open Platform (API documentation: https://lbsyun.baidu.com/index.php?title=viewstatic/api (accessed on 10 October 2025); platform homepage: https://lbsyun.baidu.com/). Road centerlines were extracted from OpenStreetMap (https://www.openstreetmap.org/). The MIT Place Pulse 2.0 dataset used for streetscape perception assessment is derived from the MIT Media Lab project (https://www.media.mit.edu/projects/place-pulse-new/overview/ (accessed on 18 October 2025)). The derived datasets and code used for analysis are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SVI	Street View Imagery
BSV	Baidu Street View
API	Application Programming Interface
PEI	Psychological and Emotional Index
XGBoost	Extreme Gradient Boosting
SHAP	SHapley Additive exPlanations
MGWR	Multiscale Geographically Weighted Regression
OLS	Ordinary Least Squares
GWR	Geographically Weighted Regression
BW	Bandwidth
LISA	Local Indicators of Spatial Association
NDVI	Normalized Difference Vegetation Index
NDWI	Normalized Difference Water Index

Appendix A

Appendix A.1. Workflow of BSV Data Collection

Figure A1. Workflow of BSV data collection. Source: Created by the authors. The red circle indicates an example sampling point, which corresponds to the API request and the four directional street-view images shown below. The example Baidu Street View API request is: http://api.map.baidu.com/panorama/v2?ak=YOUR_API_KEY&location=121.420538,31.217546&heading=0&pitch=0&fov=90&width=600&height=400 (accessed on 3 October 2025). Street-view images were acquired through the Baidu Maps Panorama Static API documentation (https://lbsyun.baidu.com/index.php?title=viewstatic/api, accessed on 7 December 2025).

Appendix A.2. Detailed Methodology for Street-Level Perception Assessment

Drawing upon established methodologies [32,33,54], the qualitative pairwise comparison outcomes were first converted into continuous perceptual intensities for individual images. For a given image i under a specific perceptual dimension, let

p_{i}

,

n_{i}

and

e_{i}

denote the numbers of wins, losses, and ties, respectively. The probabilities of being selected and not selected are defined as:

\begin{matrix} P_{i} = \frac{p_{i}}{p_{i} + e_{i} + n_{i}}, N_{i} = \frac{n_{i}}{p_{i} + e_{i} + n_{i}} \end{matrix}

(A1)

Based on visual evaluation theory [55], these probabilities were further mapped to a continuous Q-score ranging from 0 to 10, representing the perceived intensity of each image along the corresponding perceptual dimension.

Because public evaluations of medium-quality street scenes are often subject to ambiguity [33], directly modeling the full dataset may introduce noise. To enhance label reliability, the perception prediction task was reformulated as a binary classification problem, retaining only images with clear perceptual tendencies. Specifically, for each perceptual dimension v, the mean

μ_{v}

and standard deviation

σ_{v}

of Q-scores were computed, and a bandwidth parameter δ was used to define the ambiguous interval. Consistent with Zhang et al. [54].

δ

was selected to balance label stability and sample size; in this study,

δ

was set to 1.2. The labeling rule for image i under dimension v is defined as:

\begin{matrix} y_{i}^{v} = {\begin{matrix} 1, & Q_{i v} > μ_{v} + δ σ_{v} \\ - 1, & Q_{i v} < μ_{v} - δ σ_{v} \end{matrix} \end{matrix}

(A2)

Samples falling within the interval

[μ_{v} - δ σ_{v}, μ_{v} + δ σ_{v}]

were treated as ambiguous and excluded. This procedure ensured that model training relied only on scenes with high human consensus, thereby improving generalization performance.

Using the filtered samples, a Support Vector Machine (SVM) classifier with a radial basis function (RBF) kernel was trained for each perceptual dimension to predict high versus low emotion classes based on image features. To recover continuous perceptual intensities, the predicted positive-class confidence scores produced by the SVM were used as continuous perception values for each image.

Appendix A.3. Urban Environmental Indicator Data

Table A1. Data sources and Description of urban environmental indicators.

Category	Indicator	Source	Extraction Method/Model	Description
2D Morphological Structure	NDVI	http://www.gscloud.cn/	—	Reflects the condition of vegetation cover and natural landscape in urban areas [56]
	NDWI	http://www.gscloud.cn/	—	Reflects a technique utilized in satellite imagery analysis to distinguish open water features by utilizing the near-infrared (NIR) and visible green (GREEN) spectral bands [57]
	Building density	https://earthengine.google.com//	—	The ratio of the area occupied by buildings to the total area of a specific region.
	Population density	https://hub.worldpop.org/	—	The total number of residents within each research unit divided by the total area of that region.
	Road density	https://download.geofabrik.de/asia/china.html (accessed on 3 October 2025)	—	The ratio of the total length of roads within a specific area to the total area of the region.
	BtA500	https://download.geofabrik.de/asia/china.html (accessed on 3 October 2025)	Spatial Design Network Analysis, sDNA	It computes measures of accessibility that quantifies the angular betweenness within a 500 m radius [58]
Visual Features	Greenery	Street View Images (SVIs)	Mask2Former	Greenery represents green landscape elements such as grass, trees, vegetation, and green belts, intended to raise active awareness of the distribution of vegetation on streets [36].
	Enclosure	Street View Images (SVIs)	Mask2Former	Enclosure represents the degree of human scale. The percentage of vertical elements to the overall pixel (sky excluded) is measured to express enclosure [59].
	Walkability	Street View Images (SVIs)	Mask2Former	Walkability measures the support of the outdoor environment for walking [60], given here as the ratio of sidewalk to the driveway.
	Visibility	Street View Images (SVIs)	Mask2Former	Visibility responds to the richness of the built environment and street furniture, with objective elements including signboard, sculpture, person, and bench [36].
	Mixture	Street View Images (SVIs)	Mask2Former	“Mixture” represents the degree of mixedness in each street view image.
	Openness	Street View Images (SVIs)	Mask2Former	Openness is the degree of sky visibility and determines the amount of perceived lightness [61].
	Visual entropy	Street View Images (SVIs)	MATLAB (R2024b)	the total amount of information generated for the complete visible object composed of n regions [62]
	Color complexity	Street View Images (SVIs)	MATLAB (R2024b)	An essential metric for capturing color characteristics in an image [63].

Appendix A.4. Detailed Methodology for XGBoost-SHAP Framework

XGBoost is an ensemble learning algorithm based on gradient boosted trees, which iteratively trains multiple regression trees and aggregates them through weighted integration, enabling effective modeling of complex nonlinear relationships and high-dimensional feature spaces.

The objective function of XGBoost is expressed as Equation (A3):

\begin{matrix} O_{b j} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k}) \end{matrix}

(A3)

where

l (y_{i}, {\hat{y}}_{i})

denotes the loss function measuring the discrepancy between the predicted value

{\hat{y}}_{i}

and the observed value

y_{i}

, and

Ω (f_{k})

represents the regularization term of the

k

-th tree, which controls model complexity and reduces overfitting. By incorporating second-order gradient information, XGBoost achieves improved prediction accuracy and computational efficiency compared with conventional gradient boosting regression models.

SHAP is based on Shapley values from cooperative game theory and decomposes model predictions into marginal contributions of each feature. The SHAP formulation is given in Equation (A4):

\begin{matrix} g (x) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} x_{i} \end{matrix}

(A4)

where

g (x)

denotes the predicted value for sample

x

;

ϕ_{0}

represents the baseline prediction, defined as the expected model output without feature inputs; and

ϕ_{i}

denotes the Shapley value of the

i

-th feature, indicating its marginal contribution to the prediction.

Appendix A.5. Detailed Methodology for Spatial Autocorrelation and MGWR Model

Prior to spatial regression, global Moran’s I and Local Indicators of Spatial Association (LISA) were used to test the spatial autocorrelation of perception. Global spatial autocorrelation characterizes the overall degree of spatial correlation, with a focus on distributional patterns and clustered dispersion. Local spatial autocorrelation (LISA) further captures regional differences and reveals explicit spatial association patterns and clusters across subareas.

MGWR extends conventional GWR by allowing an adaptive, variable-specific optimal bandwidth, thereby capturing spatially varying relationships and scale effects more flexibly. The MGWR model is specified as:

y_{i} = \sum_{j = 1}^{k} β_{b w_{j}} (u_{i}, v_{i}) x_{i j} + ε_{i}

(A5)

where

x_{i j}

is the

j

-th predictor,

(u_{i}, v_{i})

denotes the spatial coordinates of observation

i

,

β_{b w_{j}} (u_{i}, v_{i})

is the local coefficient for predictor

j

estimated with its optimal bandwidth

b w_{j}

, and

ε_{i}

is the error term.

Appendix B

Table A2. Global spatial autocorrelation (Moran’s I) of six perceptual dimensions.

Perception	Moran’s I	Z-Score	p-Value
Beautiful	0.483673	49.738322	<0.001
Safety	0.483673	49.738322	<0.001
Lively	0.441310	45.386490	<0.001
Wealthy	0.429191	44.141412	<0.001
Boring	0.322115	33.136621	<0.001
Depressing	0.392990	40.419898	<0.001

Table A3. Performance of OLS, GWR and MGWR model.

Model	$R_{a d j}^{2}$	RSS	AICc
OLS	0.040	3847.427	11,235.351
GWR	0.529	1578	9229
MGWR	0.542	1594.250	8910.728

References

Lu, Y.; Sarkar, C.; Xiao, Y. The Effect of Street-Level Greenery on Walking Behavior: Evidence from Hong Kong. Soc. Sci. Med. 2018, 208, 41–49. [Google Scholar] [CrossRef] [PubMed]
Schneider, A.; Friedl, M.A.; Potere, D. Mapping Global Urban Areas Using MODIS 500-m Data: New Methods and Datasets Based on ‘Urban Ecoregions’. Remote Sens. Environ. 2010, 114, 1733–1746. [Google Scholar] [CrossRef]
Kong, L.; Liu, Z.; Pan, X.; Wang, Y.; Guo, X.; Wu, J. How Do Different Types and Landscape Attributes of Urban Parks Affect Visitors’ Positive Emotions? Landsc. Urban Plan. 2022, 226, 104482. [Google Scholar] [CrossRef]
Pelton, J.N.; Madry, S. Handbook of Small Satellites; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Okkels, N.; Kristiansen, C.B.; Munk-Jørgensen, P.; Sartorius, N. Urban Mental Health: Challenges and Perspectives. Curr. Opin. Psychiatry 2018, 31, 258–264. [Google Scholar] [CrossRef] [PubMed]
Gruebner, O.; Rapp, M.A.; Adli, M.; Kluge, U.; Galea, S.; Heinz, A. Cities and Mental Health. Dtsch. Ärzteblatt Int. 2017, 114, 121. [Google Scholar] [CrossRef]
Rogowska, A.M.; Pavlova, I.; Kuśnierz, C.; Ochnik, D.; Bodnar, I.; Petrytsa, P. Does Physical Activity Matter for the Mental Health of University Students during the COVID-19 Pandemic? J. Clin. Med. 2020, 9, 3494. [Google Scholar] [CrossRef]
Grahn, P.; Stigsdotter, U.A. Landscape Planning and Stress. Urban For. Urban Green. 2003, 2, 1–18. [Google Scholar] [CrossRef]
Macintyre, S.; Macdonald, L.; Ellaway, A. Lack of Agreement between Measured and Self-Reported Distance from Public Green Parks in Glasgow, Scotland. Int. J. Behav. Nutr. Phys. Act. 2008, 5, 26. [Google Scholar] [CrossRef]
Kaplan, S. The Restorative Benefits of Nature: Toward an Integrative Framework. J. Environ. Psychol. 1995, 15, 169–182. [Google Scholar] [CrossRef]
Ulrich, R.S.; Simons, R.F.; Losito, B.D.; Fiorito, E.; Miles, M.A.; Zelson, M. Stress Recovery during Exposure to Natural and Urban Environments. J. Environ. Psychol. 1991, 11, 201–230. [Google Scholar] [CrossRef]
Hartig, T.; Mitchell, R.; De Vries, S.; Frumkin, H. Nature and Health. Annu. Rev. Public Health 2014, 35, 207–228. [Google Scholar] [CrossRef]
Brownson, R.C.; Hoehner, C.M.; Day, K.; Forsyth, A.; Sallis, J.F. Measuring the Built Environment for Physical Activity: State of the Science. Am. J. Prev. Med. 2009, 36, S99–S123.e12. [Google Scholar] [CrossRef]
Tang, J.; Long, Y. Measuring Visual Quality of Street Space and Its Temporal Variation: Methodology and Its Application in the Hutong Area in Beijing. Landsc. Urban Plan. 2019, 191, 103436. [Google Scholar] [CrossRef]
Ji, H.; Qing, L.; Han, L.; Wang, Z.; Cheng, Y.; Peng, Y. A New Data-Enabled Intelligence Framework for Evaluating Urban Space Perception. ISPRS Int. J. Geo-Inf. 2021, 10, 400. [Google Scholar] [CrossRef]
Ito, K.; Kang, Y.; Zhang, Y.; Zhang, F.; Biljecki, F. Understanding Urban Perception with Visual Data: A Systematic Review. Cities 2024, 152, 105169. [Google Scholar] [CrossRef]
Zhang, Y.; Li, S.; Dong, R.; Deng, H.; Fu, X.; Wang, C.; Yu, T.; Jia, T.; Zhao, J. Quantifying Physical and Psychological Perceptions of Urban Scenes Using Deep Learning. Land Use Policy 2021, 111, 105762. [Google Scholar] [CrossRef]
Ma, H.; Xu, Q.; Zhang, Y. High or Low? Exploring the Restorative Effects of Visual Levels on Campus Spaces Using Machine Learning and Street View Imagery. Urban For. Urban Green. 2023, 88, 128087. [Google Scholar] [CrossRef]
Kruse, J.; Kang, Y.; Liu, Y.-N.; Zhang, F.; Gao, S. Places for Play: Understanding Human Perception of Playability in Cities Using Street View Images and Deep Learning. Comput. Environ. Urban Syst. 2021, 90, 101693. [Google Scholar] [CrossRef]
Meng, Y.; Sun, D.; Lyu, M.; Niu, J.; Fukuda, H. Measuring Human Perception of Residential Built Environment through Street View Image and Deep Learning. Environ. Res. Commun. 2024, 6, 055020. [Google Scholar] [CrossRef]
Guo, Z.; Xu, H.; Lin, Q.; Li, X. Deep Learning Assessment of Street Spatial Quality in Old Residential Communities of Wuchang, Wuhan, China. Sci. Rep. 2025, 15, 45176. [Google Scholar] [CrossRef] [PubMed]
Redies, C.; Grebenkina, M.; Mohseni, M.; Kaduhm, A.; Dobel, C. Global Image Properties Predict Ratings of Affective Pictures. Front. Psychol. 2020, 11, 953. [Google Scholar] [CrossRef] [PubMed]
Guo, H.; Wan, B.; Li, J.; Li, W.; Yin, J.; Ho, H.C. Visual 3D Morphology of Street-Level Urban Trees and Older Adults’ Emotional Well-Being: A Nonlinear Machine Learning Model Incorporating Temporal Dynamics. Build. Environ. 2025, 284, 113410. [Google Scholar] [CrossRef]
Rui, J. Exploring the Association between the Settlement Environment and Residents’ Positive Sentiments in Urban Villages and Formal Settlements in Shenzhen. Sustain. Cities Soc. 2023, 98, 104851. [Google Scholar] [CrossRef]
Xu, J.; Liu, Y.; Liu, Y.; An, R.; Tong, Z. Integrating Street View Images and Deep Learning to Explore the Association between Human Perceptions of the Built Environment and Cardiovascular Disease in Older Adults. Soc. Sci. Med. 2023, 338, 116304. [Google Scholar] [CrossRef]
Iamtrakul, P.; Chayphong, S.; Kantavat, P.; Nakamura, K.; Hayashi, Y.; Kijsirikul, B.; Iwahori, Y. Assessing Subjective and Objective Road Environment Perception in the Bangkok Metropolitan Region, Thailand: A Deep Learning Approach Utilizing Street Images. Sustainability 2024, 16, 1494. [Google Scholar] [CrossRef]
van Veghel, J.; Dane, G.; Agugiaro, G.; Borgers, A. Human-Centric Computational Urban Design: Optimizing High-Density Urban Areas to Enhance Human Subjective Well-Being. Comput. Urban Sci. 2024, 4, 13. [Google Scholar] [CrossRef]
Ceccato, V.; Kang, Y.; Abraham, J.; Näsman, P.; Duarte, F.; Gao, S.; Ljungqvist, L.; Zhang, F.; Ratti, C. What Makes a Place Safe? Assessing AI-Generated Safety Perception Scores Using Stockholm’s Street View Images. Br. J. Criminol. 2025, 66, 265–289. [Google Scholar] [CrossRef]
Shanghai Municipal People’s Congress. Regulations of Shanghai Municipality on Urban Renewal. 2021. Available online: https://english.shanghai.gov.cn/en-LocalRules/20240913/a3409b4df5b04314900334f6bdd98d74.html (accessed on 18 December 2025).
Ulrich, R. Visual Landscapes and Psychological Well-Being. Landsc. Res. 1979, 4, 17–23. [Google Scholar] [CrossRef]
Ghahramanpouri, A.; Lamit, H.; Sedaghatnia, S. Urban Social Sustainability Trends in Research Literature. Asian Soc. Sci. 2013, 9, 185. [Google Scholar] [CrossRef]
Salesses, P.; Schechtner, K.; Hidalgo, C.A. The Collaborative Image of The City: Mapping the Inequality of Urban Perception. PLoS ONE 2013, 8, e68400. [Google Scholar] [CrossRef] [PubMed]
Ordonez, V.; Berg, T.L. Learning High-Level Judgments of Urban Perception. In Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; Volume 8694, pp. 494–510. ISBN 978-3-319-10598-7. [Google Scholar]
Cai, Q.; Abdel-Aty, M.; Zheng, O.; Wu, Y. Applying Machine Learning and Google Street View to Explore Effects of Drivers’ Visual Environment on Traffic Safety. Transp. Res. Part C Emerg. Technol. 2022, 135, 103541. [Google Scholar] [CrossRef]
Rui, Q.; Cheng, H. Quantifying the Spatial Quality of Urban Streets with Open Street View Images: A Case Study of the Main Urban Area of Fuzhou. Ecol. Indic. 2023, 156, 111204. [Google Scholar] [CrossRef]
Ma, X.; Ma, C.; Wu, C.; Xi, Y.; Yang, R.; Peng, N.; Zhang, C.; Ren, F. Measuring Human Perceptions of Streetscapes to Better Inform Urban Renewal: A Perspective of Scene Semantic Parsing. Cities 2021, 110, 103086. [Google Scholar] [CrossRef]
Oshan, T.M.; Li, Z.; Kang, W.; Wolf, L.J.; Fotheringham, A.S. Mgwr: A Python Implementation of Multiscale Geographically Weighted Regression for Investigating Process Spatial Heterogeneity and Scale. ISPRS Int. J. Geo-Inf. 2019, 8, 269. [Google Scholar] [CrossRef]
Jacobs, J. The Death and Life of Great American Cities; Random House: New York, NY, USA, 1961. [Google Scholar]
Liu, C.; Wang, T.-Y.; Yuizono, T. Assessing the Landscape Visual Quality of Urban Green Spaces with Multidimensional Visual Indicators. Urban For. Urban Green. 2025, 106, 128727. [Google Scholar] [CrossRef]
Ode, Å.; Hagerhall, C.M.; Sang, N. Analysing Visual Landscape Complexity: Theory and Application. Landsc. Res. 2010, 35, 111–131. [Google Scholar] [CrossRef]
Chen, Z.; Yang, H.; Lin, Y.; Xie, J.; Xie, Y.; Ding, Z. Exploring the Association between the Built Environment and Positive Sentiments of Tourists in Traditional Villages in Fuzhou, China. Ecol. Inform. 2024, 80, 102465. [Google Scholar] [CrossRef]
Wu, Z.; Zheng, M.; Zhang, T. Impact Characteristics and Interaction Effects of Built Environment on Street Space Quality in Megacities: A Case Study of Xi’an, China. City Environ. Interact. 2025, 28, 100257. [Google Scholar] [CrossRef]
Chen, T.; Wang, L.; Huang, B.; Yu, J.; Wu, Y. Pursued Spatial Perception Benefit Considering Attractiveness and Cognitive Load: Appropriate Visual Complexity of Indoor Commercial Space. J. Build. Eng. 2024, 98, 111144. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Lu, S.; Chan, O.F.; Chui, C.H.K.; Lum, T.Y.S. Objective and Perceived Built Environment, Sense of Community, and Mental Wellbeing in Older Adults in Hong Kong: A Multilevel Structural Equation Study. Landsc. Urban Plan. 2021, 209, 104058. [Google Scholar] [CrossRef]
Huang, B.; Yao, Z.; Pearce, J.R.; Feng, Z.; James Browne, A.; Pan, Z.; Liu, Y. Non-Linear Association between Residential Greenness and General Health among Old Adults in China. Landsc. Urban Plan. 2022, 223, 104406. [Google Scholar] [CrossRef]
Liu, Q.; Zou, G.; Luo, Y. How the Neighborhood Amenity Mix Shapes Urban Vitality: An Exploratory Analysis from a Rhythm Perspective. Appl. Geogr. 2025, 185, 103807. [Google Scholar] [CrossRef]
Morrison, P.S. Local Expressions of Subjective Well-Being: The New Zealand Experience. Reg. Stud. 2011, 45, 1039–1058. [Google Scholar] [CrossRef]
Mouratidis, K. Compact City, Urban Sprawl, and Subjective Well-Being. Cities 2019, 92, 261–272. [Google Scholar] [CrossRef]
Fan, X.; Hu, D.; Fan, Y.; Yang, J.; Liang, H.; Gao, T.; Qiu, L. Urban Restorative Environments: The Critical Role of Building Density, Vegetation Structure, and Multi-Sensory Stimulation in Psychophysiological Recovery. Build. Environ. 2025, 281, 113190. [Google Scholar] [CrossRef]
Mangrio, E.; Zdravkovic, S. Crowded Living and Its Association with Mental Ill-Health among Recently-Arrived Migrants in Sweden: A Quantitative Study. BMC Res. Notes 2018, 11, 609. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Yang, H.; Ye, P.; Zhuang, X.; Zhang, R.; Xie, Y.; Ding, Z. How Does the Perception of Informal Green Spaces in Urban Villages Influence Residents’ Complaint Sentiments? A Machine Learning Analysis of Fuzhou City, China. Ecol. Indic. 2024, 166, 112376. [Google Scholar] [CrossRef]
Yang, Z.; Kwan, M.-P.; Liu, D.; Huang, J. How Objective and Subjective Greenspace, Combined with Air and Noise Pollution, Impacts Mental Health through the Mediation of Physical Activity. Urban For. Urban Green. 2025, 105, 128683. [Google Scholar] [CrossRef]
Ma, X.; Song, L.; Hong, B.; Li, Y.; Li, Y. Relationships between EEG and Thermal Comfort of Elderly Adults in Outdoor Open Spaces. Build. Environ. 2023, 235, 110212. [Google Scholar] [CrossRef]
Zhang, F.; Zhou, B.; Liu, L.; Liu, Y.; Fung, H.H.; Lin, H.; Ratti, C. Measuring Human Perceptions of a Large-Scale Urban Region Using Machine Learning. Landsc. Urban Plan. 2018, 180, 148–160. [Google Scholar] [CrossRef]
Nasar, J.L. The Evaluative Image of the City. J. Am. Plan. Assoc. 1990, 56, 41–53. [Google Scholar] [CrossRef]
Kim, H.W.; Kim, J.-H.; Li, W.; Yang, P.; Cao, Y. Exploring the Impact of Green Space Health on Runoff Reduction Using NDVI. Urban For. Urban Green. 2017, 28, 81–87. [Google Scholar] [CrossRef]
Gao, B. NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Cooper, C. Spatial Design Network Analysis (sDNA) Version 3.4 Manual; Cardiff University: Cardiff, UK, 2016. [Google Scholar]
Rapoport, A. Human Aspects of Urban Form: Towards a Man—Environment Approach to Urban Form and Design; Elsevier: Amsterdam, The Netherlands, 2013. [Google Scholar]
Ewing, R.; Handy, S. Measuring the Unmeasurable: Urban Design Qualities Related to Walkability. J. Urban Des. 2009, 14, 65–84. [Google Scholar] [CrossRef]
Li, X.; Ratti, C.; Seiferling, I. Mapping Urban Landscapes Along Streets Using Google Street View. In Proceedings of the Advances in Cartography and GIScience; Peterson, M.P., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 341–356. [Google Scholar]
Stamps, A.E., III. Entropy and Visual Diversity in the Environment. J. Archit. Plan. Res. 2004, 21, 239–256. [Google Scholar]
Kanuri, V.K.; Hughes, C.; Hodges, B.T. Standing out from the Crowd: When and Why Color Complexity in Social Media Images Increases User Engagement. Int. J. Res. Mark. 2024, 41, 174–193. [Google Scholar] [CrossRef]

Figure 1. Study Area. Source: Created by the authors.

Figure 2. Workflow. Source: Created by the authors.

Figure 3. Spatial Distribution and LISA Cluster Maps of Six perceptual Dimensions (a–l). Source: Created by the authors. Notes: H–H and L–L indicate high–high and low–low clusters, H–L and L–H indicate High–Low and Low–High spatial outliers, and “Not significant” denotes no significant local spatial autocorrelation.

Figure 4. Spatial patterns of the PEI (a,b). Source: Created by the authors. Notes: H–H and L–L indicate high–high and low–low clusters, H–L and L–H indicate High–Low and Low–High spatial outliers, and “Not significant” denotes no significant local spatial autocorrelation.

Figure 5. SHAP beeswarm plots and feature importance for the six perceptual dimensions and PEI (a–g). Source: Created by the authors.

Figure 6. Nonlinear threshold and interaction effects of environmental drivers on the PEI (a–i). Source: Created by the authors. Notes: Subplots (a–f) display the SHAP partial dependence curves for key urban environmental indicators, Subplots (g–i) illustrate the top three SHAP interaction pairs.

Figure 7. Result of MGWR. Source: Created by the authors.

Table 1. Calculation formulas of SVI-derived visual features.

	Data	Formula	Explanation
Semantic features	Greenery	$G_{i} = \frac{P_{v e g e t a t i o n}}{1 - P_{s k y}}$	$P_{x}$ : pixel proportion of element x (vegetation, building, tree, etc.).
		$E_{i} = \frac{P_{b u i l d} + P_{w a l l} + P_{f e n c e}}{1 - P_{s k y}}$
	Enclosure	$W_{i} = \frac{P_{s i d e w a l k}}{P_{s i d e w a l k} + P_{r o a d}}$
	Visibility	$V_{i} = \frac{P_{p e r s o n} + P_{b u i l d} + P_{s y m b o l}}{1 - P_{s k y}}$
	Openness	$O_{i} = P_{s k y} \times 100$
	Mixture	$M_{i} = \sum_{c = 1}^{N} I (C a t e g o r y_P r e s e n t)$	Total count of semantic categories present.
Sence features	Visual entropy	$H (x) = - \sum_{i = 1}^{n} P (a_{i}) \log P (a_{i})$	$p_{i}$ : probability of the i-th semantic category’s occurrence; n: total number of categories.
Sence features	color complexity	$C_{k} = - \sum_{i = 1}^{m} n_{i} \log (\frac{n_{i}}{N})$	$m$ : number of connected color regions; $n_{i}$ : pixel count of the i-th region; N: total pixels of the image.

Table 2. MGWR bandwidths and spatial scales of influence.

Variable	Bandwidth	ENP	Adj. t (95%)	DoD
Intercept	43	239.986	3.712	0.339
Visual entropy	74	132.026	3.558	0.411
Color complexity	257	31.944	3.164	0.583
Mixture	123	70.643	3.389	0.487
Visibility	259	29.572	3.142	0.592
NDVI	326	25.567	3.099	0.609
Population density	4012	1.227	2.047	0.975

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, Z.; Xu, W.; Lu, Z.; Sun, T.; Liu, Y. Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai. Buildings 2026, 16, 1999. https://doi.org/10.3390/buildings16101999

AMA Style

Hu Z, Xu W, Lu Z, Sun T, Liu Y. Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai. Buildings. 2026; 16(10):1999. https://doi.org/10.3390/buildings16101999

Chicago/Turabian Style

Hu, Ziyu, Weizhen Xu, Zekun Lu, Tongyu Sun, and Yuxiang Liu. 2026. "Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai" Buildings 16, no. 10: 1999. https://doi.org/10.3390/buildings16101999

APA Style

Hu, Z., Xu, W., Lu, Z., Sun, T., & Liu, Y. (2026). Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai. Buildings, 16(10), 1999. https://doi.org/10.3390/buildings16101999

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revealing the Nonlinear Associations and Spatial Heterogeneity of Urban Environmental Indicators in Emotional Perception: A Machine Learning Perspective from Shanghai

Abstract

1. Introduction

2. Methods

2.1. Study Area and Data Sources

2.1.1. Study Area

2.1.2. Street View Data Collection

2.1.3. Street-Level Perceptions Assessment

2.2. Construction of Urban Environmental Indicators

2.3. XGBoost–SHAP

2.4. Multiscale Geographically Weighted Regression (MGWR)

2.5. Research Workflow

3. Results

3.1. Spatial Distribution of Streetscape Emotional Perception

3.1.1. Spatial Distribution and Local Spatial Autocorrelation

3.1.2. Spatial Distribution and LISA of PEI

3.2. Results of XGBoost–SHAP

3.2.1. Relative Importance of Environmental Drivers

3.2.2. SHAP Partial Correlation Dependency Analysis

3.2.3. Interaction Effects Among Environmental Drivers

3.3. Results of MGWR

4. Discussion

4.1. Nonlinear Driving Mechanisms of Environment Indicators on Emotional Perception

4.2. Spatial Nonstationarity and Multiscale Heterogeneity of Environmental Indicators

4.3. Implications for Urban Design

4.4. Limitations and Future Prospects

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Workflow of BSV Data Collection

Appendix A.2. Detailed Methodology for Street-Level Perception Assessment

Appendix A.3. Urban Environmental Indicator Data

Appendix A.4. Detailed Methodology for XGBoost-SHAP Framework

Appendix A.5. Detailed Methodology for Spatial Autocorrelation and MGWR Model

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI