Next Article in Journal
Summer Outdoor Thermal Comfort of Lung Cancer Patients: Differences by Treatment Modality and Disease Stage
Previous Article in Journal
The Randomness Analysis of Shrinkage and Creep Mechanical Behavior in Continuous Rigid-Frame Bridges with Ultra-High Piers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unraveling the Non-Linear Impact of the Built Environment on Population-Based Residential Vitality at the Block Scale: An Explainable AI Approach Using Multi-Source Open Data in Zhengzhou, China

1
School of Architecture, Henan University of Technology, Zhengzhou 450001, China
2
School of Art and Design, Zhengzhou University of Aeronautics, Zhengzhou 450015, China
3
School of Architecture and Urban Planning, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
4
Design Institute of Henan University of Technology, Zhengzhou 450001, China
*
Author to whom correspondence should be addressed.
Buildings 2026, 16(11), 2229; https://doi.org/10.3390/buildings16112229
Submission received: 23 April 2026 / Revised: 21 May 2026 / Accepted: 23 May 2026 / Published: 1 June 2026
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Abstract

Understanding the complex relationship between the built environment and urban vitality is essential for evidence-based urban renewal. However, most existing studies rely on linear regression models that fail to capture the non-linear threshold effects inherent in urban systems and depend on costly proprietary datasets that limit reproducibility. This study proposes a scalable, open-data-driven framework to decode the non-linear mechanisms governing population-based urban vitality in Zhengzhou, a rapidly regenerating metropolis in Central China. Using Areas of Interest (AOIs) as functional spatial units to mitigate the Modifiable Areal Unit Problem (MAUP), we construct a multidimensional built environment indicator system (5D+S: Density, Diversity, Design, Distance to Transit, Destination Accessibility, and Surroundings) from multi-source open data, including 100 m WorldPop population grids, OpenStreetMap building vectors, Points of Interest (POIs), and transit station data. An explainable machine learning approach combining XGBoost with SHapley Additive exPlanations (SHAP) is employed to identify the relative importance of built environment factors and quantify their non-linear threshold effects on population-based urban vitality (operationally defined as residential population density derived from WorldPop 100 m grids). Across 3920 AOIs, XGBoost (R2 = 0.846, RMSE = 0.104) substantially outperforms Ordinary Least Squares regression (R2 = 0.634), confirming pervasive non-linear relationships, with stable 5-fold cross-validated R2 = 0.713 ± 0.115. SHAP analysis reveals four dominant drivers: Distance to Commercial Core (DistCBD), Bus Station Density within 500 m (BusDen500), Green Coverage Ratio (GreenRatio), and Building Density (BD). Critical thresholds are identified: vitality contributions decay sharply beyond approximately 4.3 km from the CBD; at least 4 bus stations within 500 m are required for meaningful transit benefit; building density delivers positive returns within a 2–30% range; and excessive green coverage above 8.5% within 500 m is associated with declining population-based vitality, a finding that reflects spatial competition between ecological land use and residential density rather than a negative effect of greenery per se. These findings provide quantitative design guidelines for precision urban renewal, moving beyond “the more, the better” planning assumptions to identify optimal intervention ranges.

Graphical Abstract

1. Introduction

1.1. Urban Vitality and the Paradigm Shift in Urban Renewal

As global urbanization transitions from rapid spatial expansion to quality-oriented regeneration, urban vitality has emerged as a crucial indicator of a city’s socio-economic health and sustainable development potential [1,2]. High urban vitality is associated with economic prosperity, social cohesion, enhanced public safety, and improved human well-being [3]. Empirical work has further demonstrated that consumer-amenity agglomeration contributes to such outcomes [4], and that urban landscape configuration shapes vitality patterns across Chinese metropolises [5]. Consequently, decoding the complex relationship between the built environment and urban vitality has become a central agenda for urban planners and policymakers seeking to implement “precision urban renewal” strategies [6]. It should be emphasized at the outset that, while this paper uses the umbrella term “urban vitality” throughout for readability and continuity with prior literature, the dependent variable is operationally defined as population-based residential vitality, measured by high-resolution gridded population density (see Section 3.4.1 for details); the non-linear thresholds reported in this study therefore apply specifically to this residential dimension rather than to the broader multidimensional construct of vitality. However, achieving precision renewal requires addressing fundamental methodological challenges, particularly the choice of spatial analysis units that faithfully represent the functional structure of urban space.

1.2. The Spatial Unit Dilemma: From Arbitrary Grids to Functional Patches

Despite extensive research on urban vitality, a fundamental methodological challenge persists in the form of the Modifiable Areal Unit Problem (MAUP) [7]. Most existing studies rely on arbitrary spatial grids (e.g., 500 m × 500 m fishnets) or road-bounded blocks as basic analysis units [8,9]. These mechanically defined units often sever the continuous functional semantics of the urban fabric, failing to reflect how residents actually perceive and use urban space. This study addresses this limitation by introducing Areas of Interest (AOIs), urban functional patches such as residential compounds, parks, commercial complexes, and industrial zones, as the fundamental spatial analysis units. Unlike arbitrary grids, AOIs represent physical and semantic boundaries with clear functional identities, providing a more ecologically valid scale for micro-renewal interventions [10]. While AOIs offer a more functionally meaningful spatial framework, the data required to characterize built environment conditions at such fine-grained units has traditionally been prohibitively expensive and proprietary.

1.3. The Data Divide: Democratizing Urban Analytics with Open Data

Historically, high-precision urban vitality studies have relied heavily on costly, proprietary, and privacy-sensitive datasets, such as mobile phone signaling data or bespoke surveys [11]. This data dependency significantly limits the reproducibility and scalability of research findings. This study proposes a scalable, low-cost paradigm by fusing exclusively multi-source open data, including the WorldPop 100 m population raster [12], Points of Interest (POIs) from Amap [13], OpenStreetMap building vectors and AOIs, and public transit station data. By demonstrating that open-source data can achieve high-fidelity vitality analysis, this approach enhances the reproducibility of urban vitality research across diverse urban contexts. With data accessibility resolved, the remaining analytical bottleneck lies in the modeling approach itself; specifically, the inability of conventional linear models to capture the complex, non-monotonic relationships that characterize urban systems.

1.4. The Linearity Trap: Decoding Non-Linear Thresholds via Explainable AI

A significant limitation of traditional urban morphology studies is the over-reliance on linear regression models such as Ordinary Least Squares (OLS) or spatial autoregressive models [14,15]. These models implicitly assume monotonic relationships between built environment variables and urban vitality; for example, that greater building density invariably produces greater vitality. However, urban systems are inherently complex and non-linear. Excessive density or over-commercialization often leads to congestion, environmental degradation, and a subsequent decline in vitality, a phenomenon known as the “threshold effect” [16,17]. Identifying these “tipping points” is critical for avoiding over-development.
This study employs an Explainable Artificial Intelligence (XAI) approach, specifically the eXtreme Gradient Boosting (XGBoost) algorithm coupled with SHapley Additive exPlanations (SHAP) [18,19]. This combination opens the “black box” of machine learning by not only achieving superior predictive accuracy but also quantifying the non-linear marginal effects and optimal threshold ranges of each built environment factor. Building upon these three methodological advances, namely functional spatial units, open data fusion, and explainable non-linear modeling, this study integrates them into a unified analytical framework applied to a rapidly regenerating Chinese metropolis.

1.5. Research Objectives and Contributions

Taking the central urban area of Zhengzhou, a major national transportation hub and rapidly regenerating metropolis in Central China, as a case study, this paper aims to answer the following questions:
  • How can a high-precision, AOI-based urban vitality assessment model be constructed using exclusively free, multi-source open data?
  • What is the relative importance of different built environment dimensions (5D+S) in driving urban vitality?
  • What are the specific non-linear threshold effects (optimal ranges) of key morphological and functional indicators on vitality, and how can they inform localized urban renewal policies?
The key contributions of this study are threefold: (i) the adoption of AOIs as functional spatial units to mitigate MAUP; (ii) the construction of an entirely open-data-based analytical framework with high reproducibility; and (iii) the identification of quantitative non-linear thresholds for precision urban renewal through explainable machine learning.

2. Literature Review

2.1. Built Environment and Urban Vitality

Urban vitality, a concept rooted in Jane Jacobs’ seminal work on urban diversity and street-level activity [20], has evolved into a multidimensional construct encompassing population density, economic activity, social interaction, and functional mixing [1,21]. The “5D” framework (Density, Diversity, Design, Distance to transit, and Destination accessibility), initially proposed by Cervero and Kockelman [22] for travel behavior research, has been widely adapted to study how the built environment shapes urban vitality [8,23]. Recent studies have incorporated additional dimensions such as Surroundings/Ecology [24], recognizing that green space and environmental quality play significant roles in shaping urban attractiveness. However, despite the growing sophistication of indicator systems, most studies continue to rely on linear regression approaches that inherently limit the detection of complex interactions and threshold effects [9,25]. Moreover, a critical examination of the existing literature reveals three recurrent limitations. First, most studies operationalize urban vitality using proprietary datasets, such as mobile phone signaling data [11,16] or social media check-ins [1], which severely limit reproducibility and cross-city comparison. Second, the spatial analysis units adopted vary widely (from 500 m grids to census tracts), with few studies systematically addressing how unit choice affects results [7]. Third, even when non-linear methods are employed, the extracted thresholds are rarely translated into concrete design or planning guidelines, limiting their practical utility for urban renewal [26]. Table 1 summarizes representative studies on built environment and urban vitality. Complementary recent work has linked vitality to street-network centrality [27], multi-source spatial big data on urban vibrancy [28], and the use of urban parks [29].

2.2. From Grids to Functional Patches: The AOI Approach

The MAUP represents a persistent challenge in spatial analysis [7]. Studies using different spatial unit sizes or configurations can yield substantially different results. While some researchers have adopted traffic analysis zones (TAZs) or census tracts as analysis units [31], these administrative boundaries similarly fail to capture the functional heterogeneity of urban space. The AOI-based approach, which delineates analysis units according to actual land-use parcels and functional identities, represents a promising alternative. AOIs align with how urban space is organized and perceived by residents, making them particularly suitable for block-scale renewal planning [10,32]. Functional units have also been delineated from large-scale ride-hailing trajectories [33] and from social-media point patterns clustered with dynamic time warping [34]. This study builds upon this emerging approach by using OpenStreetMap-derived AOIs encompassing 15 distinct land-use categories. Despite its theoretical appeal, the AOI-based approach remains underexplored in urban vitality research. Most existing studies employing functional patches focus on travel behavior or land-use classification rather than vitality assessment, and none have combined AOIs with explainable machine learning to quantify non-linear built environment thresholds.

2.3. Explainable AI in Urban Studies

Machine learning methods such as Random Forest and Gradient Boosting have demonstrated superior performance over linear models in capturing complex urban patterns [35,36]. However, the “black-box” nature of these models has limited their adoption in planning practice, where interpretability is essential for policy guidance. The introduction of SHAP (SHapley Additive exPlanations) [19] has transformed this landscape by providing theoretically grounded, consistent explanations for individual predictions. SHAP values quantify the contribution of each feature to a specific prediction, enabling the visualization of non-linear relationships, threshold effects, and interaction effects [18]. Recent applications in urban studies have demonstrated the power of SHAP in identifying non-linear built environment effects on travel behavior [37], housing prices [38], and urban vitality [16,30]. This study extends this line of research by applying XGBoost-SHAP to AOI-level vitality analysis using exclusively open data. However, several gaps persist: (i) most XAI-based urban studies still rely on proprietary data, undermining the reproducibility that open science demands; (ii) few studies report spatial cross-validation metrics, raising concerns about inflated performance due to spatial autocorrelation; and (iii) the transition from SHAP-identified thresholds to actionable design guidelines remains underdeveloped in the literature.

2.4. Research Gaps and Present Contribution

The above review identifies three interconnected gaps that this study aims to address. Gap 1 (Data accessibility): The majority of high-resolution vitality studies depend on proprietary or restricted datasets, limiting reproducibility and scalability to other urban contexts [11,16]. Gap 2 (Spatial unit validity): Arbitrary grid-based or administrative spatial units fail to reflect the functional organization of urban space, yet functionally meaningful units (AOIs) have rarely been applied to vitality-focused machine learning analysis [7,10]. Gap 3 (Threshold-to-design translation): While recent studies have successfully identified non-linear relationships using SHAP, the translation of statistical thresholds into actionable urban design and planning recommendations remains largely absent [16,26,30]. This study addresses all three gaps simultaneously by proposing an entirely open-data-driven, AOI-based, explainable machine learning framework that not only identifies non-linear thresholds but also translates them into spatially explicit design guidelines for precision urban renewal.

3. Materials and Methods

3.1. Study Area

Zhengzhou, the capital of Henan Province, is a major transportation hub and one of the fastest-growing metropolitan areas in Central China (Figure 1). With a resident population exceeding 12.8 million, the city serves as a national center for railway logistics and is undergoing intensive urban regeneration. The study area encompasses the administrative boundary of Zhengzhou Municipality (approximately 7567 km2), spanning from the urban core to suburban and rural-urban fringe areas. This spatial extent captures the full gradient of urbanization intensity, from dense commercial districts to agricultural peripheries, providing a comprehensive context for analyzing built environment effects on vitality.
The five Main Business Districts (CBDs) used to calculate the DistCBD variable were identified based on the Zhengzhou Master Plan (2018–2035) and verified against commercial POI agglomeration patterns. They are: (1) Erqi Commercial District (二七商圈), the traditional retail core centered on Erqi Square; (2) Huayuan Road Commercial District (花园路商圈), a mature mixed-use corridor along the northern urban axis; (3) Zhengdong New District CBD (郑东新区CBD), the planned financial center featuring the landmark “Big Corn” tower; (4) High-Tech Zone Core (高新区核心区), anchored by technology parks and university clusters; and (5) Longhu–Jinshui Commercial Area (龙湖–金水商圈), an emerging mixed-use hub near the high-speed rail station. Their locations are marked as red stars in Figure 1. The centroid of each CBD polygon was used as the reference point for Euclidean distance calculation.

3.2. Data Sources

All data used in this study are derived from free, publicly accessible sources (Table 2).

3.3. Spatial Analysis Unit: AOI Delineation

AOI polygons derived from OpenStreetMap were used as the fundamental spatial analysis units. Each AOI represents a distinct urban functional patch with a clear boundary and land-use identity. The 15 land-use categories include park (n = 1218), residential (n = 1209), grass (n = 483), industrial (n = 347), commercial (n = 218), farmland (n = 204), forest (n = 136), retail (n = 38), and others. After filtering out slivers smaller than 500 m2, 3930 AOIs were retained for analysis. All spatial data were reprojected to UTM Zone 49N (EPSG:32649) to enable metric-based calculations.

3.4. Variable Measurement

3.4.1. Dependent Variable: Urban Vitality Index

Urban vitality is operationalized as 100 m-gridded population density derived from the WorldPop 2026 constrained population dataset (release R2025A). For each AOI, the population value was extracted using zonal statistics (area-weighted mean within polygon, with all_touched enabled to handle small AOIs); centroid-based sampling was used as a fallback for the 0.7% of AOIs that contained no fully contained pixel. The raw population values were normalized using Min-Max scaling, as defined in Equation (1):
Y i = P o p i P o p m i n P o p m a x P o p m i n
where Yi is the normalized vitality index for AOI i, and Popi is the zonal mean population density. Although the original research design also planned to incorporate the NPP/VIIRS Day/Night Band radiance composite as a complementary economic-vitality dimension, the annual radiance composite (avg_rade9.tif) requires NASA Earthdata registration, a free but gated access process inconsistent with our fully open-data methodology. Moreover, the 500 m native resolution of VIIRS is substantially coarser than the 100 m WorldPop grid, introducing potential scale mismatch at the AOI level. We therefore retained population density as the sole dependent variable in the present analysis. This conservative choice further isolates the dependent variable from any POI-based built environment indicator.
It is important to distinguish between the broad concept of urban vitality, which encompasses economic activity, social interaction, cultural vibrancy, and street-level liveliness [3,20] (Jacobs, 1961; Montgomery, 1998), and the narrower operationalization adopted in this study. Our dependent variable captures only the residential dimension of vitality, i.e., where people live at relatively high densities. While population concentration is a necessary condition for Jacobs-style vitality, it is not sufficient: a dormitory suburb may exhibit high residential density without the functional mixing and street-level activity that constitute broader urban vitality. This operationalization was chosen for three reasons: (i) the WorldPop 100 m grid is the highest-resolution, freely accessible population product available for Chinese cities; (ii) using population density avoids circular reasoning with POI-based independent variables; and (iii) it ensures full reproducibility without proprietary data. All subsequent references to “urban vitality” in this paper should therefore be understood as shorthand for “population-based residential vitality,” and the identified thresholds apply specifically to this dimension.

3.4.2. Independent Variables: Multidimensional Built Environment (5D+S)

Based on the extended “5D+S” framework, ten independent variables were constructed across six dimensions (Table 3).
Building Density (BD) is calculated as the ratio of total building footprint area to AOI area. Average Building Height (AH) uses area-weighted averaging to account for the varying sizes of buildings. Floor Area Ratio (FAR) estimates total floor area by dividing building height by an assumed floor height of 3.0 m. Functional Mix (ENT) uses Shannon Entropy computed from POI category distributions within each AOI and a 100 m buffer zone to capture edge effects, as defined in Equation (2):
E N T i = j = 1 K P j l n ( P j )
where K is the total number of POI categories and P j is the proportion of the j -th category. Green Coverage Ratio (GreenRatio) is calculated as the proportion of the nearby area (500 m buffer) classified as green land use (park, forest, grass, nature reserve, scrub, or orchard).

3.5. Data Preprocessing

Outliers were removed using the interquartile range (IQR) method at the 1st and 99th percentiles with a 1.5 × IQR threshold, reducing the sample from 3930 to 3920 AOIs. Variance Inflation Factor (VIF) analysis was conducted to assess multicollinearity; all variables exhibited VIF values below 10 (maximum VIF = 6.78 for FAR, followed by 6.11 for BD), indicating acceptable multicollinearity levels. Therefore, all ten variables were retained for modeling.

3.6. Explainable Machine Learning Framework

3.6.1. XGBoost

To capture non-linear relationships between built environment features and urban vitality, we employ XGBoost (eXtreme Gradient Boosting), a scalable tree-boosting algorithm [18]. Unlike OLS regression, XGBoost prevents overfitting through L1 and L2 regularization and does not require strict assumptions about variable distributions or independence. The objective function minimized during training is given in Equation (3):
O b j = i = 1 N l ( y i , y ^ i ) + k = 1 K Ω ( f k )
where l is a differentiable loss function and Ω penalizes tree complexity. The dataset was split 80/20 for training and testing. Hyperparameter tuning was performed via 5-fold cross-validated grid search over learning rate {0.05, 0.1}, max depth {4, 6, 8}, and number of estimators {100, 200, 300}. Model performance was evaluated using R2, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), and compared against OLS and Random Forest baselines. We acknowledge that the random train–test split does not enforce spatial separation, which may lead to optimistic performance estimates due to spatial autocorrelation in the dependent variable. To assess this risk, we report both the random 5-fold CV R2 and discuss the implications of potential spatial information leakage in Section 4.2 and Section 5.4. A full spatial block cross-validation is deferred to future work as it requires careful definition of spatially contiguous blocks that balance sample sizes across folds.

3.6.2. SHAP Interpretation

To interpret the XGBoost model, we employ SHAP (SHapley Additive exPlanations) based on cooperative game theory [19]. SHAP assigns each feature an importance value for a particular prediction following the additive attribution form in Equation (4):
g ( z ) = ϕ 0 + j = 1 M ϕ j z j
where ϕ j is the Shapley value of the j -th feature. Through SHAP summary plots, we identify the global feature importance ranking. Through SHAP dependence plots, we extract the non-linear relationship curves and identify threshold effects, that is, the points at which a feature’s contribution to vitality shifts from positive to negative or vice versa. These thresholds provide quantitative boundaries for precision urban renewal interventions.

4. Results

4.1. Descriptive Statistics

The spatial distribution of normalized urban vitality across the 3920 AOIs is shown in Figure 2. A pronounced monocentric–polycentric pattern is visible, with high-vitality clusters concentrated around the five CBDs and along major arterial corridors, decaying rapidly toward the urban periphery.
Table 4 summarizes the descriptive statistics of all variables. Substantial variation exists across indicators: the normalized vitality ranges from 0 to 1 with a median of 0.16 and a mean of 0.27, reflecting the strong positive skew typical of urban population distributions. Building Density ranges from 0 to 0.72 (mean = 0.07), reflecting the heterogeneity between built-up areas and open spaces. POI Density exhibits extreme variation (0 to 14,289 per km2), with a median of only 82 per km2, indicating strong spatial concentration. Building Age averages 26.9 years, with most buildings constructed in the mid-1990s. Distance to CBD ranges from 50 m to 88.5 km, capturing the full urban-suburban gradient.

4.2. Model Performance Comparison

Table 5 presents the performance comparison of three models (Figure 3). XGBoost achieves the highest R2 (0.846), followed by Random Forest (0.833) and OLS (0.634). The substantial improvement from OLS to XGBoost (+33.4% in R2) confirms the presence of significant non-linear relationships between built environment variables and urban vitality that linear models fail to capture. The 5-fold cross-validated R2 (0.713 ± 0.115) provides a more conservative estimate of generalization performance.
The gap between the test R2 (0.846) and the mean cross-validated R2 (0.713) warrants careful interpretation. Three factors likely contribute to this discrepancy. First, the random 80/20 train–test split does not account for spatial proximity: nearby AOIs share similar locational and morphological characteristics, allowing spatially adjacent training samples to “leak” information to test samples. This inflates the test R2 relative to the CV estimate, which, through its fold rotation, partially mitigates such leakage. Second, the relatively high standard deviation across folds (±0.115) indicates substantial heterogeneity between spatial subsets of the data, suggesting that model performance varies across different urban zones (e.g., core vs. periphery). Third, the regularization parameters employed (reg_alpha = 0.1, reg_lambda = 1.0, max_depth = 6) were specifically chosen to constrain tree complexity and reduce overfitting risk. We therefore interpret the CV R2 of 0.713 as the more reliable indicator of out-of-sample performance, while acknowledging that spatial cross-validation (discussed in Section 5.4) would provide an even more conservative estimate. Despite this gap, both metrics substantially exceed the OLS baseline (R2 = 0.634), confirming the presence of genuine non-linear relationships.
The optimal XGBoost hyperparameters were learning rate = 0.05, max depth = 6, n_estimators = 300, subsample = 0.8, colsample_bytree = 0.8, reg_alpha = 0.1, and reg_lambda = 1.0.

4.3. Feature Importance Analysis

Figure 4 presents the SHAP-based global feature importance ranking. Distance to CBD (DistCBD) emerges as the dominant factor, with a mean absolute SHAP value of 0.134, approximately three times greater than the second-ranked feature. This finding underscores the persistent centrality of spatial location in determining urban vitality, consistent with classical central place theory.
The directional contribution of each feature across all 3920 AOIs is visualized in the SHAP beeswarm plot (Figure 5). Each point is one AOI; horizontal position indicates the feature’s contribution to that AOI’s predicted vitality, and colour encodes the underlying feature value.
The remaining features form three tiers of importance:
  • Tier 2 (mean |SHAP| ≈ 0.04–0.05): Bus Station Density (BusDen500, 0.047) is the second most important feature, demonstrating that fine-grained transit service provision (within walking distance) outweighs even the proximity to a single bus station (DistBus).
  • Tier 3 (mean |SHAP| ≈ 0.02–0.04): Green Coverage Ratio (GreenRatio, 0.035) and Building Density (BD, 0.023) form a second meaningful tier, reflecting the importance of ecological context and physical morphology.
  • Tier 4 (mean |SHAP| ≈ 0.01–0.02): Distance to Bus Station (DistBus), POI Density (PD), Floor Area Ratio (FAR), and Functional Mix (ENT) contribute moderately.
  • Tier 5 (mean |SHAP| < 0.012): Building Age (BldgAge) and Average Building Height (AH) contribute the least, suggesting that vertical density and historical maturity are weak predictors of population-based vitality once location and other density factors are accounted for.

4.4. Non-Linear Threshold Effects

The SHAP dependence plots (Figure 6) reveal critical non-linear thresholds for the top four features. Each panel plots one feature’s value (x-axis) against its SHAP contribution (y-axis); the dashed red line marks the empirically identified threshold, and the black curve is a moving-window mean.

4.4.1. Distance to CBD (DistCBD)

DistCBD exhibits a pronounced non-linear decay pattern. Within approximately 4.3 km of the CBD, AOIs receive strong positive SHAP contributions (up to +0.40), reflecting the agglomeration benefits of central location. Between 4.3 and 20 km, contributions decline sharply and cross zero at the threshold. Beyond 20 km, contributions stabilize at a moderate negative level (~−0.10), indicating that peripheral locations are consistently associated with lower vitality regardless of other built environment characteristics. This threshold of approximately 4.3 km can be interpreted as the effective “vitality radius” of Zhengzhou’s CBDs.

4.4.2. Bus Station Density Within 500 m (BusDen500)

Bus station density exhibits a clear S-shaped relationship. AOIs with zero or one bus stop within 500 m receive negative SHAP contributions (~−0.08). The threshold of 4 stations marks the onset of consistent positive contributions, with effects increasing approximately linearly between 4 and 15 stations and plateauing at higher densities. This finding offers a direct planning rule of thumb: ensuring at least 4 bus stops within walking distance (500 m) of a block is associated with meaningful vitality enhancement. The dominance of BusDen500 over the simpler “distance to nearest stop” metric (DistBus) confirms that transit availability matters more than mere proximity.

4.4.3. Green Coverage Ratio (GreenRatio)

It is essential to note that the following finding is a direct consequence of operationalizing vitality as population density; it should not be interpreted as evidence that green space reduces broader urban quality or liveability. GreenRatio reveals a counter-intuitive but theoretically important pattern. SHAP values are mildly positive at near-zero green coverage and turn negative beyond a threshold of approximately 8.5% green-land coverage within a 500 m buffer. The most negative contributions (~−0.30) occur between 30% and 60% green coverage. This does not imply that “green is bad”; rather, it reflects the operational definition of vitality as resident population density. Areas dominated by parks, forests, or open green spaces are by definition residentially under-utilized. From a renewal-policy standpoint, this finding cautions against equating “more green space” with “more vital neighborhoods,” and instead supports a balanced compact-city model in which green spaces serve, but do not dominate, residential fabric. Abundant international evidence documents the well-being and health benefits of urban greenery [39,40], and our findings should therefore not be read as a value judgement against green space. What the model captures is a mechanical land-use trade-off within fixed AOI extents: above approximately 8.5% green coverage, ecological or recreational land use begins to displace residential floor area at the AOI scale, which the index registers as lower population density.

4.4.4. Building Density (BD)

Building Density exhibits a classic inverted-U-shaped non-linear relationship. SHAP values rise sharply from negative to positive at the threshold of approximately 0.021 (2.1% footprint coverage), peak between 0.10 and 0.30, and gradually decline at very high densities (>0.50). This pattern provides quantitative evidence of the long-hypothesized “tipping point” in urban density: an extremely sparse built fabric (BD < 2%) cannot support critical-mass population vitality, while extreme over-development (BD > 50%) yields diminishing returns. The optimal range of approximately 10–30% building density coincides with the typical morphology of mature mixed-use mid-rise neighborhoods.
Figure 7 provides visual evidence of the urban morphologies corresponding to the identified BD thresholds. Figure 7a illustrates a sparse residential AOI (BD = 1.9%) typical of peri-urban areas with scattered low-rise structures and large vacant lots, where the built fabric lacks sufficient critical mass to support population vitality. Figure 7b,c exemplify the optimal BD range: Figure 7b shows a mid-rise residential compound (BD = 14.9%) with courtyard-plan buildings of 4–8 stories interspersed with ground-level amenities, achieving the highest vitality score (0.97); Figure 7c depicts a denser mixed-use configuration (BD = 26.5%) with taller buildings and more compact street frontage, still maintaining high vitality (0.93). Figure 7d reveals an industrial site (BD = 52.0%) where bulky factory buildings cover more than half of the AOI footprint, yielding low residential vitality (0.13). These morphological examples confirm that the statistical thresholds identified by SHAP correspond to recognizable and distinct urban typologies, reinforcing the practical relevance of the quantitative findings for architectural and urban design practice.

4.5. Visual Representativeness of Vitality Classes

To complement the threshold analysis with concrete visual evidence and to clarify the morphological character of the vitality classes generated by the model, Figure 8 presents one representative AOI from each of the lowest, middle, and highest vitality quintiles. The three examples were selected by spatially querying the AOI dataset for the candidate locations and confirming that each falls comfortably within its respective quintile range. Figure 8(a1,a2) shows AOI 669 in the agricultural fringe of Zhongmu County (Y = 0.02, BD = 0.00, FAR = 0.00, DistCBD = 32,492 m), characterized by farmland and scattered village structures. Figure 8(b1,b2) shows AOI 1896 along Daxue Road–Huaihe Road in Erqi District (Y = 0.51, BD = 0.09, FAR = 0.55, ENT = 2.19, DistCBD = 4070 m), a traditional mid-density mixed-use street with active ground-floor commerce and tree-lined sidewalks. Figure 8(c1,c2) shows AOI 3030 along Jingsan Road–Hongzhuan Road in Jinshui District (Y = 0.81, BD = 0.20, FAR = 1.21, ENT = 2.47, DistCBD = 783 m), a dense but moderately scaled commercial-residential core in proximity to the CBD. Crucially, the highest-vitality example exhibits moderate rather than extreme density values (BD = 0.20 and FAR = 1.21 sit near the centre of the inverted-U range identified in Section 4.4.4), and is associated with a high land-use entropy of 2.47. This is empirically important: it indicates that what the model identifies as “high vitality” in Zhengzhou is associated with the traditional dense-but-mixed urban fabric described by Jacobs [20] and elaborated in the New Urbanism tradition [39,41], not with isolated high-rise tower clusters or large-footprint dormitory housing. We return to the ethical implications of this finding in Section 5.5.

5. Discussion

5.1. Implications

This study’s primary contribution is the quantification of non-linear thresholds governing the built-environment–vitality nexus. Before discussing these findings, it is important to reiterate that all thresholds identified below apply to population-based residential vitality as operationally defined in Section 3.4.1, and should not be directly extrapolated to broader conceptions of urban vitality encompassing economic, social, or cultural dimensions. Four key findings merit emphasis, organized here into theoretical, practical, and policy implications.

5.1.1. Theoretical Implications

First, the dominance of DistCBD confirms that spatial location remains the most fundamental determinant of urban vitality, consistent with central place theory and economic geography [42]. For Zhengzhou’s planners, this implies that urban renewal investments within the 4.3 km CBD vitality radius are likely to yield the highest returns, while peripheral interventions require more comprehensive strategies (e.g., sub-centre cultivation) to overcome locational disadvantages.
Second, the second-place ranking of fine-grained transit availability (BusDen500) over coarse proximity (DistBus) provides strong evidence for network-thinking in transit planning: scattering individual stops is far less effective than concentrating multiple stops within walkable catchments. The actionable threshold of “≥4 stops within 500 m” can be incorporated directly into local renewal codes.

5.1.2. Practical Implications for Urban Design

Translating the above statistical thresholds into actionable urban design strategies requires bridging the gap between numerical indicators and spatial interventions at the neighborhood and block scale. For the DistCBD threshold (<4.3 km), planners should prioritize urban renewal investments within the CBD vitality radius, where returns are highest; interventions in peripheral areas (>20 km) require complementary strategies such as sub-center cultivation or transit-oriented development to overcome locational disadvantages. For the BusDen500 threshold (≥4 stations), renewal codes should mandate a minimum of four bus stops within 500 m walking distance of each block; this can be achieved through micro-circulation routes and demand-responsive transit rather than solely through trunk-line expansion. For the Building Density threshold (10–30% BD), renewal schemes should target the morphological typology of 4–8 story courtyard-plan residential compounds interspersed with ground-level retail, a configuration prevalent in Zhengzhou’s 1990s–2000s urban fabric that naturally achieves this density range. For the GreenRatio threshold (<8.5% within 500 m when residential vitality is the priority), green space should be integrated as pocket parks, linear greenways, or rooftop gardens embedded within residential fabric, rather than as large contiguous parks that displace residential density. These design translations demonstrate that the identified thresholds are not abstract numerical targets but correspond to recognizable urban typologies and implementable spatial strategies.
Third, the non-monotonic effect of Green Coverage Ratio challenges the implicit assumption in many sustainability rubrics that “more green is always better.” When the operational definition of vitality is residential population density, ecological land use and human density compete for the same spatial budget. The 8.5% threshold, beyond which green coverage begins to detract from population vitality, can serve as a quantitative guideline for allocating green-grey balance in compact-city strategies.
Fourth, the inverted-U pattern of Building Density supplies an empirical answer to one of the most enduring debates in urban morphology: how dense is “too dense?” In Zhengzhou, the 10–30% BD band emerges as the empirically optimal range. Interventions that push density above 50% footprint coverage (typical of certain super-block redevelopment proposals) are unlikely to deliver further vitality gains and may erode urban quality.

5.1.3. Policy and Managerial Implications

From a policy and urban management perspective, this study offers three contributions. First, the open-data-based framework substantially lowers the technical and financial barriers to evidence-based urban analysis. Municipal planning bureaus, particularly in resource-constrained cities, can replicate this approach using freely available WorldPop, OpenStreetMap, and POI data without procuring expensive mobile phone signaling datasets. Second, the non-linear thresholds can be directly embedded into regulatory planning instruments. For instance, the optimal building density range of 10–30% can inform plot ratio and building coverage ratio caps in detailed regulatory plans, while the transit threshold of ≥4 stops within 500 m can guide public transit service standards in renewal zones. Third, the counterintuitive finding regarding green coverage challenges the prevailing “more is better” approach to urban greening targets. Planning authorities should reconsider performance indicators that incentivize maximizing green coverage ratio without accounting for its interaction with residential density and broader vitality outcomes.

5.2. Comparison with Existing Studies

Our finding that XGBoost outperforms OLS by approximately 33% in R2 is consistent with recent urban vitality studies employing machine learning approaches [16,30,35]. The R2 value of 0.846, with a stable cross-validated R2 of 0.713, is at the upper end of values reported in similar studies (typically 0.45–0.75) [9,43], a result particularly notable considering our exclusive reliance on open data.
The dominance of destination accessibility (DistCBD) over morphological density indicators (BD, AH, FAR) echoes findings by Wu et al. [8] and Chen et al. [44], who similarly found that locational factors outweigh physical form in determining vitality. However, our SHAP-based analysis adds critical nuance: it quantifies exactly where these relationships shift from positive to negative, providing actionable thresholds rather than mere directional associations.

5.3. Methodological Contributions

The AOI-based approach addresses the MAUP by using semantically meaningful spatial units rather than arbitrary grids. Unlike fixed-size grids that may split a residential compound or merge distinct functional zones, AOIs respect the actual boundaries of urban land-use parcels. This alignment with planning practice enhances the transferability of our findings to real-world renewal decisions.
The exclusive use of open data represents a deliberate methodological choice. The upgrade from a 1 km to a 100 m WorldPop raster substantially improved model performance (R2 rising from 0.59 to 0.85 and CV-R2 rising from 0.19 ± 0.37 to 0.71 ± 0.12), demonstrating that the resolution of the dependent variable is a critical, and often under-discussed, bottleneck in vitality research. Our framework demonstrates that meaningful, robust non-linear analysis is achievable with freely available data, substantially lowering the barrier for replicating similar analyses in other cities, particularly in developing regions.

5.4. Limitations and Future Work

Several limitations warrant acknowledgment. First, although the 100 m WorldPop raster provides much-improved spatial detail compared with its 1 km counterpart, it remains a modelled product subject to the assumptions of the underlying dasymetric framework. Higher-frequency proxies such as mobile phone signaling data or social media check-in density would enable temporally resolved (e.g., diurnal) vitality analysis.
Second, the present analysis uses population density alone as the dependent variable. Although the original research design also intended to incorporate the NPP/VIIRS Day/Night Band radiance composite as a complementary economic-vitality dimension, only the cloud-free coverage band was directly accessible without registration. Future work integrating the radiance composite (avg_rade9.tif) is expected to broaden the operational definition of vitality from “where people live” to “where people and economic activity concentrate.”
Third, the cross-sectional design cannot establish causality. The identified thresholds represent associations, and longitudinal studies tracking renewal interventions over time are needed to validate causal mechanisms.
Fourth, the study considers only Zhengzhou, and threshold values may vary across cities with different urban structures. Multi-city comparative studies would enhance the generalizability of findings.
Fifth, the current model does not explicitly account for spatial autocorrelation in the dependent variable. Population density is inherently spatially clustered, meaning that nearby AOIs tend to share similar vitality levels. While XGBoost’s tree-based structure can implicitly capture some spatial patterns through location-related features (e.g., DistCBD), the random cross-validation employed in this study does not enforce spatial separation between training and testing folds, potentially inflating performance estimates. This likely explains part of the gap between the test R2 (0.846) and the CV R2 (0.713) discussed in Section 4.2. Future work should implement spatial block cross-validation, where folds are defined by non-overlapping geographic zones, to obtain more conservative and spatially honest performance estimates. Additionally, integrating spatial econometric approaches such as Multi-scale Geographically Weighted Regression (MGWR) could capture spatial heterogeneity in the relationships and reveal whether threshold values vary systematically across urban zones.

5.5. Ethical Scope and Operational Boundaries of the Vitality Index

While the empirical findings reported above provide actionable thresholds for urban renewal, several boundary conditions must be acknowledged to forestall misinterpretation. The vitality index operationalized in this study is, as emphasized in Section 1.1 and Section 3.4.1, a population-based residential measure: it captures the density of human occupation at the AOI scale via the WorldPop gridded population, and nothing more. It is not, and was never intended as, a measure of liveability, walkability, sense of place, social cohesion, environmental quality, or any of the other dimensions that constitute “good” urbanism in the Jacobsian or New Urbanism traditions [20,39,41]. All thresholds reported in Section 4.4 should be read as bounded statements about this single dimension, not as universal claims about urban quality.
A specific implication concerns extremely high-density configurations. Because the index measures where people are rather than how well they live, it can in principle assign high values to any high-density configuration, including configurations that may be undesirable from a human-development perspective, such as poorly serviced dormitory housing or speculative tower clusters lacking mixed-use ground-floor activation. Figure 8 mitigates this concern empirically by showing that the AOIs at the high end of our distribution in Zhengzhou are dominated by traditional mixed-use mid-density forms (BD ≈ 0.20, FAR ≈ 1.21, land-use entropy ≈ 2.47) rather than extreme tower clusters, and the optimal BD range of 10–30% identified by SHAP (Section 4.4.4) is itself incompatible with the very high footprint coverage typical of isolated dormitory complexes. Nevertheless, the conceptual risk remains for any future application of the index to other contexts, and we urge readers to treat the thresholds in Section 4.4 as necessary but not sufficient design parameters: a renewal scheme that achieves high BD, low DistCBD, and high entropy will satisfy our model but may still produce poor outcomes if standards for ground-floor activation, walkability, mix of uses, and access to social infrastructure are not simultaneously upheld principles consolidated in the UN New Urban Agenda [45] and the broader healthy-urbanism literature [14,20,39,40,41,46].
More broadly, the index complements rather than replaces the qualitative, ethnographic, and design-led methods through which healthy urbanism is ultimately assessed [20,39]. We therefore recommend that practitioners using the thresholds derived from this work pair them with: (i) qualitative audits of mix of uses and ground-floor activation along the lines proposed by Sung and Lee [46]; (ii) pedestrian-level walkability assessments grounded in the evidence base summarized by Ewing and Cervero [14]; and (iii) subjective well-being indicators along the dimensions reviewed by Mouratidis [40]. The vitality index is thus best understood as a useful, reproducible, open-data summary of one important dimension of urban form–population interaction. This is a starting point for evidence-based renewal, not a substitute for the multidimensional evaluation that responsible practice requires.

6. Conclusions

This study developed an explainable machine learning framework to decode the non-linear impact of the built environment on urban vitality at the block scale in Zhengzhou, China. Using exclusively multi-source open data and AOI-based functional spatial units, we constructed a comprehensive 5D+S indicator system and applied XGBoost with SHAP interpretation. The key conclusions are:
  • Non-linear superiority confirmed. XGBoost (R2 = 0.846; CV R2 = 0.713 ± 0.115) significantly outperforms OLS (R2 = 0.634), confirming that built-environment–vitality relationships are fundamentally non-linear and threshold-driven, not monotonic.
  • Location dominates. Distance to the commercial core is the single most important predictor (mean |SHAP| = 0.134), with a critical vitality radius of approximately 4.3 km. Urban renewal within this radius offers the highest vitality return.
  • Threshold-based design guidelines. Specific thresholds were identified for precision renewal: (i) at least 4 bus stations should be accessible within 500 m of a block; (ii) green-land coverage should not exceed approximately 8.5% within 500 m if residential vitality is the planning objective; and (iii) building density delivers positive returns within an inverted-U range of approximately 2–50%, with peak effects at 10–30%.
  • Open data viability. The framework demonstrates that reproducible, high-fidelity non-linear vitality analysis (R2 > 0.8) is achievable using exclusively free, open-source data, lowering the barrier for evidence-based urban renewal planning across diverse urban contexts.
These findings provide a scientific foundation for Zhengzhou’s ongoing urban regeneration, offering quantitative benchmarks to guide density targets, transit planning, and green-grey balance decisions. Future work should extend this framework to multi-city comparisons, integrate the VIIRS radiance composite, and incorporate temporal dynamics to track how renewal interventions shift vitality trajectories over time. It should be noted that the thresholds identified in this study pertain specifically to population-based residential vitality. Future research should integrate multi-dimensional vitality proxies, such as nighttime light intensity, social media check-in frequency, and street-level pedestrian counts, to validate whether these thresholds hold for broader conceptions of urban vitality encompassing economic, social, and cultural dimensions.

Author Contributions

Conceptualization, X.L. and S.N.; methodology, X.L., H.Z. and W.L.; software, H.Z. and Y.L.; validation, X.L., W.L. and Z.X.; formal analysis, X.L. and H.Z.; investigation, Y.L. and Z.X.; data curation, H.Z. and Y.L.; writing, original draft preparation, X.L. and H.Z.; writing, review and editing, S.N., W.L. and Z.X.; visualization, H.Z. and Y.L.; supervision, S.N.; project administration, S.N.; funding acquisition, S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this study are derived from publicly accessible open-source repositories. The 100 m WorldPop constrained population grid (release R2025A) is available at https://www.worldpop.org/. OpenStreetMap building footprints and AOI polygons can be obtained via the OSM Overpass API (https://overpass-api.de/ (accessed on 15 April 2026)) or Geofabrik regional extracts (https://download.geofabrik.de/ (accessed on 15 April 2026)). Points-of-Interest and bus station data were retrieved through the Amap (Gaode) Open Platform (https://lbs.amap.com/ (accessed on 15 April 2026)). The processed analysis-ready dataset, full Python 3.11 pipeline (data_processing.py), trained XGBoost model and figure-generation scripts used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wu, C.; Ye, X.; Ren, F.; Du, Q. Check-in behaviour and spatio-temporal vibrancy: An exploratory analysis in Shenzhen, China. Cities 2018, 77, 104–116. [Google Scholar] [CrossRef]
  2. Lan, F.; Gong, X.; Da, H.; Wen, H. How do population inflow and social infrastructure affect urban vitality? Evidence from 35 large- and medium-sized cities in China. Cities 2020, 100, 102454. [Google Scholar] [CrossRef]
  3. Montgomery, J. Making a city: Urbanity, vitality and urban design. J. Urban Des. 1998, 3, 93–116. [Google Scholar] [CrossRef]
  4. Glaeser, E.L.; Kolko, J.; Saiz, A. Consumer city. J. Econ. Geogr. 2001, 1, 27–50. [Google Scholar] [CrossRef]
  5. Zhang, A.; Li, W.; Wu, J.; Lin, J.; Chu, J.; Xia, C. How can the urban landscape affect urban vitality at the street block level? A case study of 15 metropolises in China. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 1245–1262. [Google Scholar] [CrossRef]
  6. Yue, W.; Chen, Y.; Thy, P.T.M.; Fan, P.; Liu, Y.; Zhang, W. Identifying urban vitality in metropolitan areas of developing countries from a comparative perspective: Ho Chi Minh City versus Shanghai. Sustain. Cities Soc. 2021, 65, 102573. [Google Scholar] [CrossRef]
  7. Openshaw, S. The Modifiable Areal Unit Problem; Geobooks: Norwich, UK, 1984. [Google Scholar]
  8. Wu, J.; Lu, Y.; Gao, H.; Wang, M. Cultivating historical heritage area vitality using urban morphology approach based on big data and machine learning. Comput. Environ. Urban Syst. 2022, 91, 101716. [Google Scholar] [CrossRef]
  9. Li, M.; Liu, J.; Lin, Y.; Xiao, L.; Zhou, J. Revitalizing historic districts: Identifying built environment predictors for street vibrancy based on urban sensor data. Cities 2021, 117, 103305. [Google Scholar] [CrossRef]
  10. Chen, Y.; Yu, B.; Shu, B.; Yang, L.; Wang, R. Exploring the spatiotemporal patterns of residential electricity consumption in Nanjing, China. Sustain. Cities Soc. 2023, 96, 104629. [Google Scholar]
  11. Xia, C.; Yeh, A.G.-O.; Zhang, A. Analyzing spatial relationships between urban land use intensity and urban vitality at street block level: A case study of five Chinese megacities. Landsc. Urban Plan. 2020, 193, 103669. [Google Scholar] [CrossRef]
  12. WorldPop. Open Spatial Demographic Data and Research. Available online: https://www.worldpop.org/ (accessed on 15 April 2026).
  13. Amap (Gaode Map). POI Data Service. Available online: https://lbs.amap.com/ (accessed on 15 April 2026).
  14. Ewing, R.; Cervero, R. Travel and the built environment: A meta-analysis. J. Am. Plan. Assoc. 2010, 76, 265–294. [Google Scholar] [CrossRef]
  15. Zhang, J.; Tan, P.Y.; Zeng, H.; Zhang, Y. Walkability assessment in a rapidly urbanizing city and its relationship with residential estate value. Sustainability 2019, 11, 2205. [Google Scholar] [CrossRef]
  16. Wu, W.; Niu, X. Influence of built environment on urban vitality: Case study of Shanghai using mobile phone location data. J. Urban Plan. Dev. 2019, 145, 04019007. [Google Scholar] [CrossRef]
  17. Delclòs-Alió, X.; Miralles-Guasch, C. Looking at Barcelona through Jane Jacobs’s eyes: Mapping the basic conditions for urban vitality in a Mediterranean conurbation. Land Use Policy 2018, 75, 505–517. [Google Scholar] [CrossRef]
  18. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  19. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  20. Jacobs, J. The Death and Life of Great American Cities; Vintage Books: New York, NY, USA, 1961. [Google Scholar]
  21. Kang, C.; Fan, D.; Jiao, H. Validating activity, time, and space diversity as essential components of urban vitality. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 1180–1197. [Google Scholar] [CrossRef]
  22. Cervero, R.; Kockelman, K. Travel demand and the 3Ds: Density, diversity, and design. Transp. Res. Part D Transp. Environ. 1997, 2, 199–219. [Google Scholar] [CrossRef]
  23. Zeng, C.; Song, Y.; He, Q.; Shen, F. Spatially explicit assessment on urban vitality: Case study in Chicago. Sustainability 2018, 10, 4861. [Google Scholar] [CrossRef]
  24. Li, X.; Li, Y.; Jia, T.; Zhou, L.; Hijazi, I.H. The six dimensions of built environment on urban vitality: Fusion evidence from multi-source data. Cities 2022, 121, 103482. [Google Scholar] [CrossRef]
  25. Lu, S.; Shi, C.; Yang, X. Impacts of built environment on urban vitality: Regression analyses of Beijing and Chengdu, China. Int. J. Environ. Res. Public Health 2019, 16, 4592. [Google Scholar] [CrossRef]
  26. Tu, W.; Zhu, T.; Zhong, C.; Zhang, X.; Xu, Y.; Li, Q. Exploring urban vitality and its driving mechanism through multi-source data: A case study of Shanghai. Sustain. Cities Soc. 2024, 100, 105050. [Google Scholar]
  27. Yue, H.; Zhu, X. Exploring the relationship between urban vitality and street centrality based on social network review data in Wuhan, China. Sustainability 2019, 11, 4356. [Google Scholar] [CrossRef]
  28. Huang, B.; Zhou, Y.; Li, Z.; Song, Y.; Cai, J.; Tu, W. Evaluating and characterizing urban vibrancy using spatial big data: Shanghai as a case study. Environ. Plan. B Urban Anal. City Sci. 2020, 47, 1543–1559. [Google Scholar] [CrossRef]
  29. Lyu, F.; Zhang, L. Using multi-source big data to understand the factors affecting urban park use in Wuhan. Urban For. Urban Green. 2019, 43, 126367. [Google Scholar] [CrossRef]
  30. Wang, Z.; Jiao, L.; Xu, G.; Luo, X.; Wang, C. Unraveling the Impact Mechanisms of Built Environment on Urban Vitality: Integrating Scale, Heterogeneity, and Interaction Effects. Buildings 2026, 16, 29. [Google Scholar]
  31. Kim, Y.-L. Seoul’s Wi-Fi hotspots: Wi-Fi access points as an indicator of urban vitality. Comput. Environ. Urban Syst. 2018, 72, 13–24. [Google Scholar] [CrossRef]
  32. Xu, X.; Xu, X.; Guan, P.; Ren, Y.; Wang, W.; Xu, N. The cause and evolution of urban street vitality under the time dimension: Nine cases of streets in Nanjing City, China. Sustainability 2018, 10, 2797. [Google Scholar] [CrossRef]
  33. Ma, S.; Long, Y. Functional urban area delineations of cities on the Chinese mainland using massive Didi ride-hailing records. Cities 2020, 97, 102532. [Google Scholar] [CrossRef]
  34. Chen, Y.; Liu, X.; Li, X.; Liu, X.; Yao, Y.; Hu, G.; Xu, X.; Pei, F. Delineating urban functional areas with building-level social media data: A dynamic time warping (DTW) distance based k-medoids method. Landsc. Urban Plan. 2017, 160, 48–60. [Google Scholar] [CrossRef]
  35. Yang, L.; Liu, J.; Liang, Y.; Lu, Y.; Yang, H. Spatially varying effects of street greenery on walking time of older adults. ISPRS Int. J. Geo-Inf. 2021, 10, 596. [Google Scholar] [CrossRef]
  36. Liu, S.; Zhang, L.; Long, Y.; Long, Y.; Xu, M. A new urban vitality analysis and evaluation framework based on human activity modeling using multi-source big data. ISPRS Int. J. Geo-Inf. 2020, 9, 617. [Google Scholar] [CrossRef]
  37. Ding, C.; Cao, X.; Næss, P. Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo. Transp. Res. Part A Policy Pract. 2018, 110, 107–117. [Google Scholar] [CrossRef]
  38. Li, Y.; Pan, Y.; Ning, C.; Ding, C. Examining the effect of the built environment on housing price at the macro and micro levels using a recursive approach. Sustainability 2019, 11, 3629. [Google Scholar]
  39. Gehl, J. Cities for People; Island Press: Washington, DC, USA, 2010. [Google Scholar]
  40. Mouratidis, K. Urban planning and quality of life: A review of pathways linking the built environment to subjective well-being. Cities 2021, 115, 103229. [Google Scholar] [CrossRef]
  41. Mehaffy, M.W.; Haas, T. New Urbanism in the New Urban Agenda: Threads of an unfinished reformation. Urban Plan. 2020, 5, 441–452. [Google Scholar] [CrossRef]
  42. Christaller, W. Central Places in Southern Germany; Prentice-Hall: Englewood Cliffs, NJ, USA, 1966. [Google Scholar]
  43. Ye, Y.; Li, D.; Liu, X. How block density and typology affect urban vitality: An exploratory analysis in Shenzhen, China. Urban Geogr. 2018, 39, 631–652. [Google Scholar] [CrossRef]
  44. Chen, L.; Zhao, L.; Xiao, Y.; Lu, Y. Investigating the spatiotemporal pattern between the built environment and urban vibrancy using big data in Shenzhen, China. Comput. Environ. Urban Syst. 2022, 95, 101827. [Google Scholar] [CrossRef]
  45. UN-Habitat. New Urban Agenda; United Nations: Quito, Ecuador, 2017. [Google Scholar]
  46. Sung, H.; Lee, S. Residential built environment and walking activity: Empirical evidence of Jane Jacobs’ urban vitality. Transp. Res. Part D Transp. Environ. 2015, 41, 318–329. [Google Scholar] [CrossRef]
Figure 1. Study area: Zhengzhou metropolitan region with 4084 OpenStreetMap AOIs colored by primary land-use class, 8612 bus stations (grey dots), and the five Main Business Districts (red stars). The black triangle in the upper-right indicates the north arrow.
Figure 1. Study area: Zhengzhou metropolitan region with 4084 OpenStreetMap AOIs colored by primary land-use class, 8612 bus stations (grey dots), and the five Main Business Districts (red stars). The black triangle in the upper-right indicates the north arrow.
Buildings 16 02229 g001
Figure 2. Spatial distribution of AOI-level urban vitality (population-density based, min–max normalized). Red stars denote CBDs; the colour ramp is stretched to the 98th percentile to enhance contrast in the suburban fabric.
Figure 2. Spatial distribution of AOI-level urban vitality (population-density based, min–max normalized). Red stars denote CBDs; the colour ramp is stretched to the 98th percentile to enhance contrast in the suburban fabric.
Buildings 16 02229 g002
Figure 3. Held-out test-set performance of OLS, Random Forest and XGBoost across three error metrics. XGBoost dominates on every metric, and its advantage over OLS quantifies the share of explained variance attributable to non-linear effects.
Figure 3. Held-out test-set performance of OLS, Random Forest and XGBoost across three error metrics. XGBoost dominates on every metric, and its advantage over OLS quantifies the share of explained variance attributable to non-linear effects.
Buildings 16 02229 g003
Figure 4. Global feature importance ranked by mean |SHAP value|. The top four features (highlighted) jointly account for the majority of the model’s explanatory power.
Figure 4. Global feature importance ranked by mean |SHAP value|. The top four features (highlighted) jointly account for the majority of the model’s explanatory power.
Buildings 16 02229 g004
Figure 5. SHAP summary (beeswarm). Features are ordered bottom-to-top by mean |SHAP|. Red points (high feature values) lying on the right of the zero line indicate a positive marginal effect; the colour-position pattern reveals each driver’s directionality and dispersion.
Figure 5. SHAP summary (beeswarm). Features are ordered bottom-to-top by mean |SHAP|. Red points (high feature values) lying on the right of the zero line indicate a positive marginal effect; the colour-position pattern reveals each driver’s directionality and dispersion.
Buildings 16 02229 g005
Figure 6. Non-linear SHAP dependence for the four dominant drivers. Red dashed lines mark the inferred thresholds: DistCBD ≈ 4.3 km, BusDen500 ≈ 4 stations, GreenRatio ≈ 8.5%, BD ≈ 2.1% (with an inverted-U peak between 10% and 30%).
Figure 6. Non-linear SHAP dependence for the four dominant drivers. Red dashed lines mark the inferred thresholds: DistCBD ≈ 4.3 km, BusDen500 ≈ 4 stations, GreenRatio ≈ 8.5%, BD ≈ 2.1% (with an inverted-U peak between 10% and 30%).
Buildings 16 02229 g006
Figure 7. Building footprint morphology of four representative AOIs spanning the identified BD threshold ranges. Building color indicates height (light: low-rise; dark: high-rise). Dashed red outline marks the AOI boundary. (a) Sparse peri-urban AOI (BD = 1.9%) with scattered low-rise structures, lacking critical mass for population vitality; (b) Mid-rise residential compound (BD = 14.9%), 4–8 story courtyard-plan buildings with ground-level amenities, vitality score 0.97; (c) Denser mixed-use configuration (BD = 26.5%) with taller buildings and compact street frontage, vitality 0.93; (d) Industrial site (BD = 52.0%) with bulky factory buildings covering >50% of the AOI, low residential vitality 0.13. Panels (b,c), within the optimal BD range of 10–30%, exhibit the highest vitality scores.
Figure 7. Building footprint morphology of four representative AOIs spanning the identified BD threshold ranges. Building color indicates height (light: low-rise; dark: high-rise). Dashed red outline marks the AOI boundary. (a) Sparse peri-urban AOI (BD = 1.9%) with scattered low-rise structures, lacking critical mass for population vitality; (b) Mid-rise residential compound (BD = 14.9%), 4–8 story courtyard-plan buildings with ground-level amenities, vitality score 0.97; (c) Denser mixed-use configuration (BD = 26.5%) with taller buildings and compact street frontage, vitality 0.93; (d) Industrial site (BD = 52.0%) with bulky factory buildings covering >50% of the AOI, low residential vitality 0.13. Panels (b,c), within the optimal BD range of 10–30%, exhibit the highest vitality scores.
Buildings 16 02229 g007
Figure 8. Visual representativeness of three urban vitality classes in Zhengzhou. Each column shows one representative AOI from a vitality quintile, with a street-level view (top) and an aerial view (bottom). Panel (a1,a2) AOI 669, lowest quintile (Y = 0.02): agricultural fringe of Zhongmu County, 34.840° N, 114.051° E. Panel (b1,b2) AOI 1896, middle quintile (Y = 0.51): traditional mid-density mixed-use neighbourhood along Daxue Road–Huaihe Road, Erqi District, 34.720° N, 113.657° E. Panel (c1,c2) AOI 3030, highest quintile (Y = 0.81): dense commercial-residential core along Jingsan Road–Hongzhuan Road, Jinshui District, 34.772° N, 113.682° E. These visual exemplars illustrate the physical morphology associated with the population-based residential vitality index defined in Section 3.4.1; they do not imply that high-vitality areas are necessarily more liveable in the broader Jacobsian sense; see Section 5.5 for a discussion of the index’s ethical scope and limitations.
Figure 8. Visual representativeness of three urban vitality classes in Zhengzhou. Each column shows one representative AOI from a vitality quintile, with a street-level view (top) and an aerial view (bottom). Panel (a1,a2) AOI 669, lowest quintile (Y = 0.02): agricultural fringe of Zhongmu County, 34.840° N, 114.051° E. Panel (b1,b2) AOI 1896, middle quintile (Y = 0.51): traditional mid-density mixed-use neighbourhood along Daxue Road–Huaihe Road, Erqi District, 34.720° N, 113.657° E. Panel (c1,c2) AOI 3030, highest quintile (Y = 0.81): dense commercial-residential core along Jingsan Road–Hongzhuan Road, Jinshui District, 34.772° N, 113.682° E. These visual exemplars illustrate the physical morphology associated with the population-based residential vitality index defined in Section 3.4.1; they do not imply that high-vitality areas are necessarily more liveable in the broader Jacobsian sense; see Section 5.5 for a discussion of the index’s ethical scope and limitations.
Buildings 16 02229 g008
Table 1. Comparison of representative studies on built environment and urban vitality.
Table 1. Comparison of representative studies on built environment and urban vitality.
StudyCitySpatial UnitVitality ProxyMethodKey Limitation
Wu et al., 2019 [16]ShanghaiGrid (1 km)Mobile phoneRF + GBTProprietary data; MAUP; no thresholds
Li et al., 2021 [9]ShenzhenStreet segmentPedestrian countOLS regressionLinear assumption; single proxy
Xia et al., 2020 [11]Beijing etc.Street blockPOI + social mediaSpatial regressionLinear; proprietary data
Wu et al., 2022 [8]MultipleGrid (500 m)Mobile phoneML + morphologyProprietary data; no SHAP
Wang et al., 2026 [30]WuhanGrid (500 m)Mobile phoneXGBoost + SHAPProprietary data; grid MAUP
Li et al., 2022 [24]WuhanGrid (500 m)Multi-sourceGWRLinear; arbitrary grid
Lu et al., 2019 [25]Beijing/ChengduTAZPopulationOLSLinear; admin. boundary
This studyZhengzhouAOI (functional)WorldPop 100 mXGBoost + SHAPOpen data only; see Section 5.4
Table 2. Data sources and descriptions.
Table 2. Data sources and descriptions.
Data TypeSourceResolution/ScaleYearRecords
Population densityWorldPop (constrained, R2025A)100 m20261.06 M valid pixels in study area
AOI polygonsOpenStreetMapVector20244084 polygons (15 land-use classes)
Building footprintsOpen Building DataVector2024201,584 buildings (height, function, age, quality)
Points of InterestAmap (Gaode)Point2024630,150 POIs (22 major categories)
Bus stationsAmap (Gaode)Point20248612 records
Main Business DistrictsUrban planning dataPolygon20245 CBD locations
Study boundaryAdministrative boundaryPolygon2024Zhengzhou Municipality
Table 3. Independent variable definitions and data sources.
Table 3. Independent variable definitions and data sources.
DimensionVariableAbbrev.FormulaSource
D1: DensityBuilding DensityBDTotal building footprint area/AOI areaBuilding.shp
Average Building HeightAHArea-weighted mean height (m)Building.shp
Floor Area RatioFARTotal floor area/AOI areaBuilding.shp
D2: DiversityFunctional MixENTShannon Entropy of POI categories within AOI + 100 m bufferPOI data
POI DensityPDPOI count/AOI area (per km2)POI data
D3: DesignBuilding AgeBldgAgeArea-weighted mean building age (years since construction)Building.shp
D4: TransitDistance to Bus StationDistBusEuclidean distance from AOI centroid to nearest bus station (m)Bus Station.shp
Bus Station DensityBusDen500Count of bus stations within 500 m buffer of AOI centroidBus Station.shp
D5: DestinationDistance to CBDDistCBDEuclidean distance from AOI centroid to nearest CBD centroid (m)CBD.shp
S: SurroundingsGreen Coverage RatioGreenRatioProportion of green-class AOI area within 500 m bufferAOI.shp
Table 4. Descriptive statistics of variables (n = 3920).
Table 4. Descriptive statistics of variables (n = 3920).
VariableMeanStd. Dev.MinMedianMax
Vitality (Y)0.2650.2660.0000.1591.000
BD0.0670.1130.0000.0000.720
AH (m)8.0911.080.000.0086.49
FAR0.390.730.000.007.90
ENT1.510.850.001.852.64
PD (per km2)629125808214,289
BldgAge (years)26.97.696.027.539.8
DistBus (m)6221419626315,605
BusDen5005.796.320442
DistCBD (m)16,01715,5185010,71088,461
GreenRatio0.1040.1390.0000.0531.292
Table 5. Model performance comparison.
Table 5. Model performance comparison.
ModelR2 (Test)RMSEMAE5-Fold CV R2
OLS0.6340.1610.124
Random Forest0.8330.1080.077
XGBoost0.8460.1040.0730.713 ± 0.115
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, X.; Zhang, H.; Li, W.; Li, Y.; Xu, Z.; Niu, S. Unraveling the Non-Linear Impact of the Built Environment on Population-Based Residential Vitality at the Block Scale: An Explainable AI Approach Using Multi-Source Open Data in Zhengzhou, China. Buildings 2026, 16, 2229. https://doi.org/10.3390/buildings16112229

AMA Style

Lu X, Zhang H, Li W, Li Y, Xu Z, Niu S. Unraveling the Non-Linear Impact of the Built Environment on Population-Based Residential Vitality at the Block Scale: An Explainable AI Approach Using Multi-Source Open Data in Zhengzhou, China. Buildings. 2026; 16(11):2229. https://doi.org/10.3390/buildings16112229

Chicago/Turabian Style

Lu, Xuefei, Haoran Zhang, Wei Li, Yutong Li, Ziruo Xu, and Shujie Niu. 2026. "Unraveling the Non-Linear Impact of the Built Environment on Population-Based Residential Vitality at the Block Scale: An Explainable AI Approach Using Multi-Source Open Data in Zhengzhou, China" Buildings 16, no. 11: 2229. https://doi.org/10.3390/buildings16112229

APA Style

Lu, X., Zhang, H., Li, W., Li, Y., Xu, Z., & Niu, S. (2026). Unraveling the Non-Linear Impact of the Built Environment on Population-Based Residential Vitality at the Block Scale: An Explainable AI Approach Using Multi-Source Open Data in Zhengzhou, China. Buildings, 16(11), 2229. https://doi.org/10.3390/buildings16112229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop