1. Introduction
Over the past decade, community development has been shaped by rapid urbanization, demographic transitions, and the growing imperative of climate adaptation, resilience building, and social inclusion in cities worldwide. As cities continue to evolve, communities are increasingly recognized as vital subunits for evaluating urban sustainability and livability. This shift in perspective has prompted a broader redefinition of urban planning and management goals [
1]. No longer limited to infrastructure provision and economic growth, contemporary planning agendas now place increasing emphasis on environmental protection, social equity, and transparent governance at the community scale [
2,
3,
4].
In this context, the Environmental, Social, and Governance (ESG) framework—originally developed as a corporate sustainability evaluation tool—offers conceptual and operational advantages for community-scale analysis. Beyond its traditional role in assessing corporate risk and responsibility [
5,
6], ESG has evolved into a versatile analytical lens for evaluating environmental performance, social well-being, and governance capacity in urban settings [
7,
8]. Recent studies have applied ESG principles to urban resilience planning, social equity auditing, and environmental justice assessment, highlighting its potential to align planning strategies with community needs [
9,
10].
Nevertheless, the operationalization of ESG at the community level remains limited. Existing urban ESG assessments often rely on aggregate indicators such as city-wide carbon footprints or governance indices, which obscure local variation and experiential dimensions [
11,
12]. At the same time, studies leveraging street view imagery, POI data, or perception surveys tend to focus on isolated dimensions such as walkability, safety, or greenery [
13,
14]. This fragmentation reflects a broader theoretical gap: the lack of a comprehensive structure capable of integrating environmental, social, and governance factors with human-scale perceptions.
Prior studies have begun to bridge objective environmental indicators with subjective perceptions, demonstrating that greenery, walkability, and spatial openness significantly shape how residents evaluate urban quality. For instance, Helbich showed that exposure to urban greenery was positively linked with perceived mental well-being across European cities [
15]. Li et al. similarly found that neighborhood vegetation and green view indices correlated with residents’ satisfaction and social cohesion [
16]. Mouratidis highlighted how compact urban form and access to parks enhanced perceived liveliness but also produced trade-offs in safety [
17]. Despite these advances, such studies remain fragmented and have rarely been situated within an integrated ESG framework.
In this study, the term community-scale refers to the neighborhood unit where daily activities, social interaction, and governance mechanisms intersect. Operationally, we delineate communities by a 1-km buffer around their geometric centroid, which approximates a 15-min walking radius—a benchmark increasingly used in planning practice [
18,
19]. This definition balances administrative boundaries with functional morphology, ensuring comparability across cases while aligning with contemporary discourses on accessibility and livability.
While numerous studies have leveraged Place Pulse or Google/Baidu Street View imagery to quantify perceptual attributes such as safety, walkability, and greenery, these efforts typically analyze single perceptual or environmental dimensions without embedding them in an integrated Environmental–Social–Governance (ESG) framework at the neighborhood scale [
13,
20,
21,
22]. By contrast, the ESG literature has expanded rapidly but remains predominantly macro-scale, emphasizing corporate or city-level indicators rather than micro-scale, perception-linked diagnostics [
8,
11,
23]. Addressing this gap, our study bridges urban design (public-space configuration and spatial equity), computational geography (pixel-level street-view analysis and factor extraction), and community-scale planning practice (ESG-informed interpretation), thereby operationalizing ESG at the micro scale and linking it to residents’ perceptions in a unified analytical structure.
To address this gap, this study develops a replicable ESG-based framework that integrates street-level imagery and POI data for multi-dimensional community evaluation. By combining perceptual metrics with environmental and governance indicators, the framework seeks to bridge top-down ESG objectives with bottom-up lived experiences, offering a scalable and transferable tool for human-centered neighborhood assessment.
Specifically, this research aims to answer three key questions:
- (1)
How does ESG help build a comprehensive framework for analyzing the spatial environment around urban communities?
- (2)
How are these ESG-related spatial structures reflected in residents’ subjective perceptions of their neighborhoods?
- (3)
In what ways can a perception-informed interpretation of ESG dimensions contribute to more inclusive, responsive, and sustainable approaches to community-scale urban planning?
In response to these questions, this study introduces a novel ESG-based analytical framework that integrates pixel-level street view analysis and POI data to comprehensively evaluate diverse community environmental characteristics. Exploratory Factor Analysis (EFA) is employed to distill the most representative factors within environmental, social, and governance dimensions from numerous metrics. Subsequently, subjective perception scores derived from the Place Pulse 2.0 model is analyzed using Spearman correlation analysis to reveal intrinsic relationships between community micro-environmental characteristics and resident perceptions. Specifically, the environmental dimension addresses street-level visibility and greenery; the social dimension focuses on pedestrian activity and traffic interference; and the governance dimension evaluates infrastructure management through indicators such as sidewalk and roadway configurations.
Employing this framework, we selected seven communities in the central urban area of Shenyang, China, as case studies, taking into account their construction year, population size, and surrounding amenities to ensure the representativeness and generalizability of the research findings. The analysis reveals marked inter-community variation in green visibility, traffic density, and street configuration—spatial features that collectively shape residents’ subjective evaluations across all six Place Pulse dimensions: “Safety”, “Lively”, “Wealthy”, “Beautiful”, “Depressing”, and “Boring”. Notably, the study identifies three recurrent spatial shortcomings that negatively affect perceptual outcomes and governance performance: unbalanced pedestrian space allocation, homogeneous greening schemes, and perceptual fatigue triggered by hyper-dense commercial activity. Building on these contributions, the following section outlines the methodological design of the study.
2. Materials and Methods
The methodological design of this study is grounded in three key theoretical frameworks: the Socio-Ecological Model, the Broken Windows Theory, and Place-making Theory. Drawing on this framework, the study emphasizes micro-scale spatial characteristics such as walkway proportions, green view indices, and crowd aggregation patterns as key components in fostering livable and socially engaging community environments. Together, these three theories provide a robust theoretical foundation for the selection of ESG variables, the construction of factor dimensions, and the interpretation of how community-scale features influence human perception (
Figure 1).
To operationalize the ESG framework at the neighborhood scale, this study first defines and explains specific indicators representing environmental, social, governance, and functional dimensions based on observable features from street-level imagery and POI data.
Environmental indicators include sky rate—a measure of visible sky area in street view imagery widely used to assess street enclosure and openness [
24,
25]; visual permeability—capturing unobstructed sightlines and façade openness, which is linked to perceived safety and spatial legibility [
26]; and green vision rate—the proportion of visible vegetation pixels in imagery, a validated proxy for urban greenery exposure and visual ecological quality [
22,
27]. Social indicators include mobility traffic interference index, measuring the proportion of vehicular presence in carriageway areas, an approach adapted from prior street-level walkability and traffic safety studies [
28,
29]; and walkway ratio, the proportion of pedestrian path pixels relative to total street surface, following established walkability indices from street view data [
13,
30]. Governance indicators include relative walking width—the ratio of pedestrian path width to carriageway width, linked to equitable street space allocation [
31,
32]; crowd aggregation, calculated as the share of human presence in imagery, an indicator for public space use intensity and management needs [
33]; and walking space ratio, the share of pedestrian areas within total visible ground surface, reflecting infrastructure inclusivity [
34]. The POI-based indicator—total number of POI business types—measures functional diversity in neighborhoods, a concept rooted in urban vitality and land-use mix literature [
35,
36].
To account for spatial distribution and aggregation effects, Kernel Density Analysis is applied to all ESG-related indicators prior to factor extraction. This step helps smooth local variations, reveal spatial patterns, and ensure comparability across communities.
Next, EFA is conducted to identify latent components within each ESG category and reduce multicollinearity among spatial variables. These factor scores serve as the basis for subsequent perceptual analysis.
Subjective perception data, which cover dimensions such as safety, liveliness, and beauty, are extracted using the Place Pulse 2.0 model. Spearman correlation analysis is then used to examine the relationship between ESG-based spatial factors and perception scores, revealing which environmental, social, or governance components most strongly influence residents’ lived experiences at the neighborhood scale.
2.1. Study Area
As one of China’s major northeastern cities undergoing a critical phase of post-industrial transition and urban renewal, Shenyang offers a distinctive urban context for exploring the integration of ESG-related environmental and perceptual indicators at the community level [
37]. Despite its status as a provincial capital with a sizable population and extensive infrastructure, Shenyang remains underrepresented in studies that apply fine-grained spatial and perception-based analysis—especially when compared to megacities like Beijing or Shanghai [
38]. This relative research gap, coupled with the city’s complex mixture of aging industrial districts, new residential zones, and evolving green infrastructure, makes it a compelling case for examining the spatial heterogeneity and livability challenges of contemporary Chinese urban communities [
39].
This study focuses on seven communities in the core urban area of Shenyang, chosen according to completion year, surrounding environmental factors, and geographical location. Due to data availability constraints at the community scale, demographic variables such as age, income, and occupation could not be included, as these statistics are only reported at the district or city level in official sources. These communities were selected for their representativeness of a wide range of characteristics within the city.
Table 1 provides a detailed description of the seven selected communities, emphasizing their distinct spatial and functional characteristics—such as the presence of overpasses, riverside landscapes, core metro hubs, government office zones, and large-scale parks—alongside differences in completion year, residential density, and urban location, which jointly reflect variations in environmental context, infrastructural governance, and socio-spatial composition. The data were sourced from the Lianjia Real Estate Brokerage Company, a leading property information platform in China [
40].
2.2. Data Sources
This study primarily utilized Baidu Street View imagery captured in January 2023, supplemented by point-of-interest (POI) data from Gaode Maps dated to 2021, and road network information sourced from OpenStreetMap. The key areas of interest are seven communities in the core urban area of Shenyang, China. In this study, a 1-km buffer zone was delineated around the geometric centroid of each selected community to define its walkable service area. This spatial unit approximates a 15-min walking radius, which has been widely adopted as a planning benchmark for neighborhood-scale accessibility and livability assessments [
19,
41]. The buffer serves as the analytical boundary for aggregating spatial indicators and capturing the environmental and functional context that residents routinely experience in daily life. Using ArcGIS, street coordinates were extracted at 50-m intervals within the 1-km buffer zones, as can be seen in
Figure 2. These captured coordinates were then used to collect street-level imagery for review.
To ensure transparency and reproducibility, all datasets employed in this study are summarized in
Table 2, including the specific study objects, acquisition periods, and exact sources. This structured overview clarifies the provenance of each dataset used in the ESG-based framework, covering both primary spatial layers and supplementary attribute information.
2.3. Methodology
2.3.1. Evaluation Framework
To construct a multidimensional evaluation framework tailored to community-scale ESG analysis, this study draws upon established theoretical perspectives that link urban form with individual perceptions and behaviors. First, the Socio-Ecological Model posits that individual behavior is shaped by interactions across multiple environmental levels, including physical settings, social systems, and institutional structures [
42]. This model provides a conceptual foundation for the ESG-based framework adopted in this study, enabling a multidimensional analysis of how street-level features such as green visibility, pedestrian density, and the spatial configuration of roads affect residents’ perceptions and social behaviors. Second, the Broken Windows Theory emphasizes how signs of physical disorder in the community environment (e.g., deteriorating infrastructure, chaotic traffic, or spatial imbalance) can lead to negative perceptions of safety and governance, thereby undermining residents’ sense of satisfaction and community cohesion [
43]. This perspective informs the operationalization of governance-related indicators in the study, such as the “road extremization factor,” and underpins the use of Spearman correlation analysis to assess their associations with perceived safety and governance efficiency. Lastly, Place-making Theory highlights the role of design interventions and citizen participation in shaping public spaces that are vibrant, inclusive, and supportive of place identity [
44]. The theoretical basis of the study combines principles of urban design with pedestrians’ subjective perceptions of the environment. Drawing on the Place Pulse 2.0 framework, we translated subjective psychological perceptions of the street environment into quantifiable dimensions. These were classified according to the ESG framework, as shown in
Table 3 below, each dimension making a different contribution to the quality of the environmental perception. In this study, “crowd aggregation” is classified under the governance dimension rather than the social dimension because it functions as an indicator of public space management and spatial equity, both of which are core concerns in urban governance. While crowd density reflects the level of social interaction, persistent concentration in specific public areas often signals governance-related issues such as unequal distribution of pedestrian infrastructure, imbalanced allocation of open space, or insufficient regulatory oversight of public realm use. Prior urban governance scholarship highlights that the organization, accessibility, and safety of public spaces are shaped not only by social behaviors but also by the capacity of governance systems to manage spatial resources effectively and equitably [
45,
46]. From this perspective, crowd aggregation serves as a proxy for evaluating how well a community’s governance structure ensures balanced and inclusive use of shared spaces. This study draws upon previous research on environmental assessment in urban studies to systematically define and calculate ESG-related environmental indicators. These indicators are integrated into the proposed ESG framework as ESG-mapped dimensions, each further characterized by its underlying interpretive mechanism. These indicators were selected based on their interpretability and applicability as demonstrated in prior perception-based and spatial analysis studies [
13,
21,
30].
2.3.2. Image Segmentation
Image segmentation plays a foundational role in urban informatics by enabling the extraction of meaningful spatial features from street view imagery. Each segment will represent certain features of an urban environment. While image segmentation can be performed using methods ranging from traditional thresholding and edge detection to advanced CNN-based models, this study adopts deep learning techniques due to their superior ability to capture complex spatial features and object boundaries in street-level imagery [
47]. In this study, we employed the Pyramid Scene Parsing Network (PSPNet), an advanced CNN architecture specifically designed for semantic segmentation [
47]. When paired with the Cityscapes Dataset, this structure allows for fine-grained classification of urban features—such as roads, buildings, and vehicles—achieving high accuracy in dense urban image segmentation and supporting a refined level of spatial analysis [
48].
In this study, we gathered 35,588 street view images from 8897 sampling points using Baidu Street View. We set the vertical angle to 0 degrees and captured images in four directions with heading = 0, 90, 180, 270 at each sampling point.
Figure 3 below is an example of the visualization we used in our study.
To assess the model’s applicability to the Shenyang context, we relied on PSPNet pre-trained on the Cityscapes dataset without additional fine-tuning. All Baidu Street View images were preprocessed by resizing to 1024 × 512 pixels, normalizing to the ImageNet mean and standard deviation, and center-cropping to maintain consistent framing across headings. PSPNet achieved a mean Intersection over Union (mIoU) of 85.4% on the PASCAL VOC 2012 dataset and a pixel accuracy of 80.2% on the Cityscapes dataset [
47,
48]. These official benchmarks provide a reliable reference, and our local validation confirmed that segmentation accuracy was sufficient to support the extraction of ESG-related spatial indicators in the Shenyang case study. Future work will incorporate systematic fine-tuning and quantitative evaluation (mIoU, pixel accuracy) on manually labeled subsets of Baidu Street View imagery.
2.3.3. Kernel Density Analysis
We extracted the raw data needed for this study-the proportion of different features in each street view image. Each segmented category was matched to analytically relevant ESG indicators
Table 3. To capture the spatial distribution patterns of these features, we employed kernel density analysis, which is particularly well-suited for handling the inherently complex and heterogeneous spatial structure of urban environments. This method enables the identification of local intensity variations and spatial clustering, offering a clearer understanding of how specific urban features are distributed across the study area [
49]. Using kernel density analysis, this study reveals complexities in the spatial distribution of specific environmental features across a variety of urban settings [
50]. Kernel Density Analysis was implemented using the ArcGIS Pro “Kernel Density” tool. We applied the default quartic kernel function, and the search radius (bandwidth) was set automatically by the software according to the spatial distribution of the input point data. No manual adjustment of the smoothing factor was introduced in this study, ensuring that results are fully reproducible using ArcGIS’s default settings. Future studies may further test alternative bandwidths to evaluate the robustness of clustering patterns.
2.3.4. Factor Analysis and Place Pulse Perception Scoring
EFA is used here to identify and extract major latent factors within the ESG dimensions of the selected communities. The EFA method is especially suited to investigating data structures when the underlying factors are not predetermined, thus enabling the identification of common latent traits with much efficiency.
Factor extraction was done by the Varimax Rotation method because it maximizes variance in a way that the relationship between the factors and the observed variables becomes distinct. The latter approach allowed the highlighting of various ESG-related dimensions, along with their individual contributions to the community environment.
We use ResNet50 [
51,
52], a deep convolutional neural network, and the Place Pulse 2.0 dataset to calculate the perceptual scores from street view images. Place Pulse 2.0, developed at MIT, contains over 100,000 images from 56 global regions rated by participants across six perceptual dimensions (e.g., safety, liveliness, beauty) via a web-based crowdsourcing platform. This approach enables reliable perception predictions, supporting analysis of how ESG factors shape community perceptions [
20].
3. Results
3.1. ESG Factors Distribution via Kernel Density Analysis
Data derived from image segmentation of street viewpoints—specifically those corresponding to the Secondary Dimension indicators in
Table 3, such as sky rate, green vision rate, and walkway ratio—were visualized and analyzed in ArcGIS as georeferenced point features to examine their spatial distribution across communities. Kernel Density Analysis was applied to spatially represent the results of each dimension of research, showing all the patterns of spatial distribution in detail
Figure 4,
Figure 5 and
Figure 6. This approach emphasizes intra-dimensional variations and reveals spatial patterns of ESG-related indicators across the studied communities. The kernel density analysis results show that features such as walkway ratio and green vision rate tend to cluster around riverside areas and large parks, whereas traffic interference and crowding indicators are more concentrated near metro hubs and commercial centers. These spatial insights contribute to the subsequent discussion by illustrating how the distribution of environmental and social features may influence residents’ perceptual experiences and local livability. These spatial distribution patterns also reflect the influence of physical layout on social dynamics. For example, communities located adjacent to rivers or large parks (e.g., C and E) exhibited concentrated clusters of greenery and walkability, supporting pedestrian interaction and community cohesion. By contrast, commercial and metro-centered communities (e.g., B and F) showed pronounced clustering of traffic interference and crowd aggregation, creating vibrant but sometimes stressful conditions for everyday social interaction. This comparison illustrates how proximity to ecological or commercial nodes not only structures environmental quality but also shapes the lived dynamics of urban communities.
3.2. Exploratory Factor Analysis of Community-Level ESG Factors
To reduce dimensionality and identify latent structures among the numerous ESG-related indicators, factor analysis was conducted. This technique helps to group highly correlated variables into underlying composite factors, thereby simplifying the complexity of multivariate data and facilitating interpretation across dimensions.. Firstly, the adequacy of the dataset for factor analysis was checked. The Kaiser-Meyer-Olkin (KMO) statistic, which measures the adequacy of the data for factor analysis by evaluating the proportion of variance among variables that might be common variance, was 0.686. Since this value exceeds the commonly accepted threshold of 0.6, it indicates that the sample is sufficiently adequate for conducting factor analysis [
53]. The sphericity test by Bartlett [
54] showed that the probability is less than 0.05; thus, the data are adequate for this factor analysis.
In factor analysis, factor loadings represent the correlation coefficients between observed variables and underlying latent factors [
55]. To improve the interpretability of the factor structure, rotation techniques such as Varimax (orthogonal) or Promax (oblique) are often applied after factor extraction. These rotations redistribute the factor loadings without altering the underlying solution, aiming to simplify the structure so that each variable loads strongly onto one factor and weakly onto others. Rotated factor loadings therefore clarify the contribution of each variable to specific factors, facilitating clearer interpretation and labeling of latent constructs [
56]. According to the factor analysis results in the Environmental dimension,
Table 4 and
Table 5 denote Factor 1 as the Environmental Openness Factor (E Factor 1). It is so designated because it is strongly positively correlated with Sky Rate (0.962) and Visual Permeability (0.876), suggesting that this factor essentially measures the openness of the sky and visual permeability within a community. Conversely, its negative correlations with POI Business Completeness (−0.925) and Total Number of POIs (−0.957) suggest that denser commercial facilities are associated with lower environmental openness.
In contrast, Factor 2 is highly positively correlated with the Green Vision Rate (0.972), referring to the extent of green vegetation within the community and the visible presence of trees or other natural elements in street views. Therefore, this factor is named as the Green Ecological Factor (E Factor 2) since it can accommodate the extent of presence and visibility of natural greenery and contribute to environmental quality in the urban environment.
As indicated from the factor analysis results for the Social dimension represented in
Table 6 and
Table 7, Factor 1 is a Traffic and Commercial Density Factor (S Factor 1). The factor scores denote the degree of interference by traffic and the density of commercial functionality within a community, reflecting the effect of transportation and commercial facilities on the social environment.
In this respect, Factor 2 reflects walkable community environments. The higher the walkway ratio, the more facilities for better walkability and the opportunities for social interaction. Therefore, Factor 2 has been termed as the Walkability Factor (S Factor 2) since this factor addresses a pedestrian-friendly and socially conducive environment.
According to the results of factor analysis related to the Governance dimension represented in
Table 8 and
Table 9, Factor 1 is labeled Crowd Aggregation Factor (G Factor 1). This indicates a crowd factor with a measure regarding the level of crowding and concentration of commercial facilities in the community. Higher scores indicate more severe crowding with greater commercial activity and higher foot traffic.
Factor 2 stands for the road extremization phenomenon whereby pedestrian space and motor vehicle lanes show a reverse relation. Therefore, a high score would mean the total imbalance in the distribution of pedestrian space, such that smaller streets will have very small areas for pedestrians while the large roads have disproportionately big open spaces for pedestrians. This obvious inequality then creates differences in the walking environment throughout the community. It is, therefore, identified as Factor 2: the Road Extremization Factor (G Factor 2), which forms the base for assessing the equity in road space distribution related to walkability.
The eigenvalues, explained variance, and factor loadings with weights for the Environmental, Social, and Governance dimensions are summarized in
Table 10. These results clarify the contribution of each factor within its respective ESG dimension and provide a transparent basis for the subsequent correlation analysis linking ESG factors to Place Pulse perceptions. It should be noted that the EFA was conducted solely on ESG-related indicators, and not on perception scores. The explained variance values therefore represent the proportion of variance captured within the ESG indicator sets in
Table 10. The subsequent Spearman correlation analysis links these ESG factors to Place Pulse perception outcomes such as safety and liveliness.
3.3. Multidimensional Perception Analysis Using Place Pulse 2.0
A multidimensional perception analysis of street view images from seven communities in Shenyang was conducted using the Place Pulse 2.0 model, which evaluates each image based on six perception dimensions: “Safety”, “Lively”, “Wealthy”, “Beautiful”, “Depressing”, and “Boring”.
As shown in
Figure 7, the boxplots reveal significant variation in perceptual scores across the seven communities, indicating that residents’ cognitive responses to the built environment are far from homogeneous. For instance, the mean Place Pulse scores for “livelier” perception range from approximately over 80 in Community B and Community F to below 50 in Community E, suggesting substantial inter-community disparity in perceived liveliness. Similarly, standard deviations within individual communities remain high—exceeding 20 in most cases—which implies pronounced internal heterogeneity. These results highlight both spatial inequality in urban perception and micro-environmental fragmentation that may not be captured through aggregated indicators alone.
Such divergence in subjective evaluation points to potential socio-spatial mechanisms shaping the lived experience, possibly driven by contextual factors such as traffic density, pedestrian accessibility, or land-use diversity. These hypotheses will be further investigated in the subsequent correlation analysis section, which examines how ESG-related spatial features relate to perceptual outcomes.
3.4. Spearman Correlation Coefficient Analysis
After obtaining the perception scores, the six perceptual dimensions were combined with the latent factors extracted through factor analysis, enabling an integrated view of how objective urban features align with subjective perceptions. Spearman correlation coefficients were then computed to examine the direction and strength of association between these variables after combination
Figure 8. The results reveal several notable relationships. For example, S Factor 2 (Walkability) shows a strong positive correlation with G Factor 2 (Road Extremization), suggesting that communities with more balanced pedestrian infrastructure tend to exhibit better governance in terms of equitable road allocation. This finding implies a connection that cannot be ignored between walkability and spatial justice in street design.
In addition, E Factor 2 (Green Ecological Quality) is positively correlated with both “More Beautiful” and “More Depressing” perceptions. This dual effect may indicate that while greenery enhances aesthetic appeal, excessive or poorly integrated green space may also induce feelings of monotony or isolation. Another key insight is the moderate positive correlation between S Factor 1 (Traffic and Commercial Density) and “More Boring”, reflecting the paradoxical role of dense traffic and commercial zones in both increasing urban vibrancy and diminishing perceptual diversity. Taken together, these findings underscore the complex, and sometimes counterintuitive, ways in which spatial design influences subjective experiences—highlighting the importance of context-sensitive ESG strategies that balance functionality, aesthetics, and emotional impact in future community planning.
4. Discussion
4.1. Identification of Latent Urban Structure Through Factor Analysis
This article used factor analysis to comprehensively analyze environmental, social, and governance dimensions of seven communities in Shenyang and found complex influences that were brought about by the layout of buildings, green infrastructure, traffic flow characteristics, governance mechanisms on perception and the sustainable development of the communities
Figure 9. In this research, the Environmental dimension refers to the features that affect physical openness and ecological quality, such as sky visibility, vegetation coverage, and view corridors. The Social dimension focuses on elements that influence pedestrian experience and social interaction, such as walkway availability and traffic interference. The Governance dimension relates to spatial equity and management efficiency, reflected in road allocation balance, commercial crowding, and the distribution of walkable space. These dimensions, as operationalized through factor analysis, form the empirical basis for interpreting the urban qualities of each community in the subsequent sections.
4.1.1. Factor Analysis of Environmental Dimension in ESG
Our results indicate that building density, green space design, and geographic locations are key factors that significantly affect the environmental openness and green ecological quality of the communities. Highly built-up communities, Community A and Community F from this study
Table 1,
Figure 2, had low environmental openness due to the fact that such an effect would create visual obstruction through overpasses and high commercial density, as reflected in the significantly higher number of POIs compared with other communities. This also echoes previous research studies showing that higher building densities suppress the visual permeability of an area [
57].
Conversely, riverside communities like C and Ein
Table 1,
Figure 2 demonstrated better results in green ecological indicators, owing to both higher green coverage and natural scenic views. However, Community G, despite being adjacent to a large park, failed to integrate external natural resources into its internal design, falling short in leveraging its ecological potential. These findings suggest that proximity to nature alone is insufficient—internal spatial planning must actively incorporate and extend ecological assets to unlock their full environmental potential. Urban planners should prioritize integrated green design—both within community boundaries and in connection with surrounding ecological infrastructure—to enhance perceived environmental quality and long-term sustainability.
4.1.2. Factor Analysis of Social Dimension in ESG
Traffic and commercial densities have been demonstrated to act as double-edged swords for the social dimension: while they introduced vibrancy, they also introduced challenges. For example, communities B and F are central metro hubs and commercial centers, and they are highly socially vibrant. However, the highly dense commercial facilities and large amount of traffic interference reduce the pedestrian experience. This supports previous research where it has been said that highly dense areas normally experience poor walkability [
29].
In contrast, well-designed walkways and adequate pedestrian spaces makecommunities D and G very convenient and conducive for residents to perform their social interactions. This is again in affirmation of the important function that pedestrian-friendly space plays in strengthening the social dimension of communities [
58]. The optimization of the social dimension thus calls for the right balance between commercial density and design of pedestrians’ space as ways answering both residents’ needs for access and comfort. Effective social design must go beyond promoting activity density and instead optimize the spatial configuration of walkways and interaction zones to foster positive and equitable social experiences.
4.1.3. Factor Analysis of Governance Dimension in ESG
The governance dimension was most affected by the interaction between road allocation and population concentration. Communities A and F, characterized by road extremization—an imbalance favoring motor vehicle lanes over pedestrian zones—experienced weaker governance performance in terms of spatial equity and mobility management. In contrast, communities C and D exhibited more balanced road layouts, contributing to better pedestrian experiences and perceived governance efficiency.
Notably, Community B presented a governance paradox: despite high crowd density and commercial activity, it struggled with space management, underscoring the complexity of governing hyper-dense environments. This supports prior work identifying crowd aggregation and spatial pressure as major governance challenges in dense urban contexts [
45]. Strengthening governance in urban neighborhoods requires proactive management of spatial equity, including balanced allocation of road space and strategic control of crowding. Our findings highlight the need for fine-grained, design-sensitive governance tools rather than one-size-fits-all approaches. In practical terms, road extremization in communities A and F manifests in narrow, congested sidewalks where pedestrian flows are frequently obstructed by parked bicycles or informal vendor stalls, forcing walkers into carriageways and raising safety risks. Conversely, in the high-density commercial hub of Community B, crowd aggregation is visible in overcrowded intersections and metro exits, where large volumes of foot traffic converge with limited spatial management, leading to congestion, longer waiting times, and reduced comfort in daily mobility. These examples illustrate how governance shortcomings directly affect everyday experiences of safety, accessibility, and spatial equity in the studied communities.
4.2. Linking ESG-Based Spatial Factors with Subjective Perceptions
This section explores how latent ESG-based spatial factors—derived through factor analysis—correlate with residents’ subjective perceptions of their communities, using six evaluative dimensions from the Place Pulse 2.0 model. By applying correlation analysis, we identify meaningful associations that shed light on the mechanisms through which specific urban forms shape perceived environmental quality and livability.
A prominent finding is the strong positive correlation between S Factor 2 (Walkability) and G Factor 2 (Road Extremization), indicating that imbalanced road allocation—where motor vehicle lanes dominate over pedestrian space—is directly linked to reduced walkability and governance inefficiency. The rapid growth of motor vehicles has disrupted pedestrian networks and made city streets less friendly for pedestrians [
32]. Communities with poorly proportioned pedestrian environments tend to perform worse in governance-related dimensions, particularly in delivering equitable and accessible mobility. This suggests that walkability should be reframed not merely as a mobility concern, but as a governance issue essential to inclusive urban design.
In terms of greenery, the results reveal a nuanced pattern. E Factor 2 (Green Ecological Quality) shows a dual relationship: it is positively correlated with both “More Beautiful” and “More Depressing” perceptions. Existing studies have been consonant in their findings of the effects of urban green on emotional wellbeing: highly positive. For instance, emotional wellbeing has been associated as significantly improved with urban green, allowing for relaxation and stress relief for its urban residents [
59]. Thus, even merely viewing a residence with green views has been correlated positively with community satisfaction; this would imply that even the mere view of greenery can enhance emotional wellbeing by increasing the satisfaction about one’s living environment [
60]. This reflects the ambivalent psychological effects of green space—while it enhances aesthetic value and may promote calmness, overly uniform or isolated greenery may also induce feelings of dullness or emotional detachment. This echoes recent findings that vegetation affects both ecological functioning and behavioral responses, including crime avoidance and stress reduction [
33]. The implication is clear: the design of urban green spaces must go beyond quantity to consider spatial integration, diversity, and emotional resonance. It should be noted that this dual relationship was observed as a general pattern across all communities in our sample, rather than stratified by demographic characteristics. Due to the absence of community-level demographic data (e.g., age structure or socioeconomic composition), we were unable to examine subgroup variations. Future studies integrating such data could reveal whether different population groups perceive greenery differently, thereby offering more targeted guidance for urban planners.
Another revealing correlation is between S Factor 1 (Traffic and Commercial Density) and both G Factor 1 (Crowd Aggregation) and the perception dimensions “Livelier” and “More Boring”. This paradox illustrates a core challenge in urban planning: high density and commercial vibrancy can enhance energy and activity [
61], yet without adequate spatial variety and leisure infrastructure, such environments may be perceived as monotonous. Urban vibrancy, in other words, does not automatically translate to experiential richness. Spatial diversity and informal social spaces must be designed in tandem with commercial hubs to avoid perceptual fatigue.
Lastly, E Factor 1 (Environmental Openness) was negatively associated with perceptions of safety and liveliness. Contrary to findings in European contexts where openness is positively received [
62], our results suggest that in high-density East Asian cities, expansive but poorly programmed spaces may create discomfort or insecurity [
63]. This points to a perceptual mismatch: increasing visual openness without corresponding social activation may reduce rather than enhance community vitality.
In sum, the discussion in the previous sections underscore a central argument of this study: urban form and perceptual quality are linked through multidimensional trade-offs, not linear relationships. ESG-based spatial features—especially walkability, greenery, and commercial density—exert complex and sometimes conflicting effects on how communities are perceived. Designing livable cities thus requires not only physical improvements but also a careful calibration of spatial equity, sensory experience, and psychological comfort.
4.3. Policy Implications and Urban Renewal Strategies
The findings of this study have direct implications for community-scale urban policy and renewal initiatives in rapidly transforming cities such as Shenyang. First, enhancing walkability should be prioritized as both a mobility and governance goal. This can be achieved by reallocating street space to expand pedestrian zones, retrofitting narrow sidewalks, and introducing traffic-calming measures to mitigate vehicular interference. International best practices, such as complete street design [
31], can be adapted to local contexts to balance motorized and non-motorized mobility.
Second, green space integration must move beyond quantity-based targets toward design approaches that diversify vegetation types, improve spatial connectivity, and incorporate greenery into street-level infrastructure. Linear green corridors, pocket parks, and green façades can be deployed to maximize visual permeability while avoiding the monotony that emerged as a potential drawback in our results.
Third, commercial and social vibrancy management should be incorporated into neighborhood revitalization plans. High-density commercial areas benefit from programmed public spaces—such as small plazas, shaded seating, and cultural amenities—that counteract perceptual fatigue and enrich experiential diversity. Urban design guidelines should include a diversity index for land-use mix to ensure that vibrancy does not translate into visual or functional monotony.
Fourth, data-driven governance can leverage ESG-based spatial indicators to target resource allocation and monitor policy outcomes. Routine integration of street view imagery and POI analytics into municipal planning systems can provide up-to-date, fine-grained diagnostics of community conditions, enabling rapid response to emerging issues.
Finally, participatory planning mechanisms should be embedded in community renewal projects. By aligning top-down ESG objectives with residents’ lived experiences—captured through perception surveys and digital engagement platforms—policy interventions can achieve greater legitimacy, equity, and long-term sustainability.
Collectively, these strategies provide a roadmap for translating the ESG-based analytical framework into actionable urban policy. Their implementation requires cross-sectoral coordination between municipal agencies, private developers, and community organizations, supported by transparent monitoring and iterative feedback loops.
4.4. Limitations and Future Work
This study is not without limitations. One notable constraint lies in the spatial definition of community boundaries. While a 1-km circular buffer was adopted to approximate a 15-min walkable environment, this uniform approach may not accurately capture the actual scale and morphological diversity of each community, particularly given their varying shapes and built environments. Future research could explore more flexible or data-driven boundary delineation methods to better reflect residents’ lived spatial experience. Additionally, although street view imagery has proven effective for extracting environmental indicators, it does not perfectly simulate the pedestrian perspective. The inherent differences between vehicle-mounted and eye-level pedestrian viewpoints may influence visual interpretation, particularly in aspects such as enclosure, obstruction, and perceived safety. Addressing the vehicle–pedestrian visual gap will be a central focus of our subsequent studies, which aim to develop more refined methods for human-scale environmental assessment. Potential methodological improvements include the integration of pedestrian-collected street view imagery using portable 360-degree cameras, low-altitude drone imagery for areas with limited walkability, and augmented/virtual reality tools to better simulate human-scale perspectives. These approaches can help bridge the current gap between vehicle-mounted imagery and lived pedestrian experiences, thereby improving the contextual validity of perception-based analyses.
Although the Place Pulse 2.0 model has demonstrated robust performance in predicting perceptual attributes from street view imagery across multiple global contexts, its application to Baidu Street View images in Shenyang may introduce transferability limitations. Differences in cultural context, urban morphology, and image acquisition parameters (e.g., camera height, field of view) between Google Street View and Baidu Street View could affect the model’s perception predictions. Such variations may influence how certain environmental cues—such as signage, building facades, or public space configurations—are interpreted in terms of safety, liveliness, or beauty. While prior research has applied cross-platform transfer of perception models with reasonable accuracy [
64,
65], future studies could benefit from local calibration or domain adaptation techniques to mitigate these potential biases and improve the contextual validity of perception scoring.
The POI dataset used in this study was last updated in 2021, whereas the street view imagery was captured in 2023. While the functional composition of communities in Shenyang’s urban core tends to remain relatively stable over short periods, some localized changes—such as new developments, business turnover, or land-use conversions—may have occurred during this two-year interval. As our POI analysis focuses on aggregated structural diversity rather than the operational status of individual establishments, we expect the temporal mismatch to have a limited impact on the interpretation of co-located spatial features. Nonetheless, this difference is acknowledged as a potential limitation, and future work could incorporate temporally aligned datasets or multi-year POI updates to improve accuracy in fast-evolving urban areas. Another limitation is the lack of community-scale building density indicators such as floor area ratio (FAR). While FAR values are available at the district or city level through official planning documents, they are not publicly reported for individual residential communities in Shenyang. As our unit of analysis is the community, we were unable to incorporate FAR into
Table 1. Instead, we used the total number of POIs as a proxy for commercial density. Future research could enrich the ESG–perception framework by integrating community-level building density metrics once such data become accessible through local planning bureaus or detailed cadastral datasets.
The use of a fixed 1-km buffer to delineate neighborhood units provides a standardized measurement framework aligned with established walkability and “15-min city” planning benchmarks [
19,
41], and supports comparability across case study sites. However, the morphological diversity of the selected communities may influence the proportion and type of functional areas included within the buffer. Given that our analysis emphasizes relative differences in spatial indicators rather than absolute counts, we expect this effect to be limited. Nonetheless, future research could incorporate sensitivity checks—such as varying buffer radii or employing network-based accessibility polygons—to evaluate the robustness of results across neighborhoods with differing spatial forms.
While PSPNet was trained on the Cityscapes dataset, which contains high-resolution street-level imagery from European cities [
48], some visual features in Chinese urban settings—such as signage, pavement materials, and street furniture—differ from those in the training set. These differences may lead to occasional misclassification of certain object classes, particularly where visual cues are culturally or regionally specific. To mitigate this, we manually inspected a subset of segmentation outputs to ensure plausibility of the extracted indicators. Nonetheless, future work could apply domain adaptation techniques [
66] or fine-tune the model using locally annotated Baidu Street View images to further improve segmentation accuracy and reduce potential bias.
Another limitation concerns the adoption of a uniform 1 km circular buffer around each community centroid to approximate a 15-min walkable environment. While widely used in accessibility studies, this approach ignores actual network distances and physical barriers, potentially distorting both environmental and social indicators. For example, in communities bounded by expressways or rivers lacking pedestrian bridges, the 1 km Euclidean buffer may include areas that are not truly walkable, thereby inflating walkway ratio and misrepresenting walkability–perception correlations. Conversely, in communities with highly connected street grids, a 1 km straight-line buffer might underestimate the accessible area, omitting relevant amenities or green spaces just beyond the circle. Similar effects could occur for greenery metrics if significant parks or waterfronts lie marginally outside the buffer, dampening their observed associations with aesthetic or liveliness perceptions. Future research should conduct sensitivity analyses using alternative spatial units—such as network-based service areas, isochrone polygons, or buffers adjusted for physical barriers—to assess the robustness of ESG–perception linkages to neighborhood boundary definitions.
5. Conclusions
This study develops and tests a novel ESG-based community assessment framework to reveal how surrounding environmental characteristics of seven representative communities in Shenyang influence residents’ subjective perceptions. The framework integrates pixel-level analysis of street-view imagery and POI data, utilizing EFA to identify the most representative key factors within environmental, social, and governance dimensions. Furthermore, resident perception data derived from the Place Pulse 2.0 model were analyzed using Spearman correlation analysis to elucidate intrinsic relationships between community environmental features and subjective perceptions.
The findings confirm that the proposed micro-scale ESG framework effectively captures and explains differences among communities concerning environmental quality, social interaction, and governance mechanisms. Within the environmental dimension, the factor of “Environmental Openness,” positively correlated with sky visibility and visual permeability, was negatively influenced by the density of commercial facilities. Meanwhile, the “Green Ecological” factor effectively reflected the influence of internal and external greenery on residents’ environmental perception. The social dimension analysis indicates that higher density of traffic and commercial facilities, although enhancing community vitality, simultaneously negatively affects the pedestrian experience. Conversely, the proper allocation of pedestrian spaces significantly enhances community livability and facilitates social interaction. Regarding the governance dimension, polarization in roadway space and unequal pedestrian-space distribution directly affected governance efficiency and residents’ satisfaction with community management.
Further perceptual analysis demonstrated a significant correlation between walkability and the polarization of road space. Communities with imbalanced spatial distribution typically exhibited deficiencies in governance efficiency. The “Green Ecological” factor displayed a complex perceptual impact, wherein communities with higher greenery coverage demonstrated strong aesthetic appeal, yet certain design elements might simultaneously induce feelings of monotony or depression, highlighting the necessity of detailed green-space design. Additionally, the dual impacts of traffic and commercial density warrant attention, as these factors, despite contributing to community vitality, may lead to perceptions of monotony due to insufficient spatial diversity. Finally, the study found a negative correlation between environmental openness and residents’ sense of safety and vitality, a finding notably divergent from conclusions commonly drawn in European and American contexts [
67], warranting further exploration within China’s specific urban context.
Overall, by employing a micro-scale ESG framework coupled with refined pixel-level analytical methods, this research systematically illustrates how multidimensional community environmental factors collectively shape residents’ subjective perceptions. The outcomes not only provide new theoretical insights and analytical approaches for academia but also offer critical empirical evidence and practical tools for urban planners and policymakers aiming for finer-grained governance and sustainable urban planning practices. Ultimately, this contributes to developing more equitable, livable, and sustainable community environments in urban development. In practical terms, the ESG-based framework can inform urban regeneration policies in rapidly transforming Chinese cities by highlighting the importance of rebalancing pedestrian and vehicular space, integrating diverse greenery into street-level design, and ensuring functional land-use variety in commercial hubs. These guidelines are particularly relevant for post-industrial cities such as Shenyang, where large-scale renewal initiatives are underway, and can help planners implement regeneration strategies that are both sustainable and responsive to residents’ lived experiences.