Next Article in Journal
Blockchain and Smart Cities: Co-Word Analysis and BERTopic Modeling
Previous Article in Journal
AI-Driven Sentiment Analysis for Discovering Climate Change Impacts
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Study on Imbalances in Urban Internal Spatial Capacity Allocation Based on High-Precision Population and Land Value Distribution Data

by
Peiru Wu
,
Maojun Zhai
and
Lingzhu Zhang
*
College of Architecture and Urban Planning, Tongji University, Shanghai 200092, China
*
Author to whom correspondence should be addressed.
Smart Cities 2025, 8(4), 110; https://doi.org/10.3390/smartcities8040110
Submission received: 17 April 2025 / Revised: 25 June 2025 / Accepted: 26 June 2025 / Published: 1 July 2025

Abstract

Highlights

What are the main findings?
  • The study constructed a composite land value prediction model using multi-source data, producing a high-precision spatial distribution of land values in Shanghai.
  • The study compared the spatial distributions of population density and land values in Shanghai, and the results indicate a clear separation between the two.
What is the implication of the main finding?
  • By integrating high-precision data on spatial distributions of population density and land values, the study enhanced the accuracy of identifying urban spatial mismatches, which will serve as a reference for optimizing the allocation of spatial resources.
  • The separation between spatial distributions of population density and land values implies that spatial resources are not optimally allocated. This inefficiency could adversely affect urban operational performance and development, and it indicates that there is significant room for adjustments in current urban spatial management policies.

Abstract

In the context of intelligent and fine-grained urban governance, the coordinated configuration of spatial capacity and locational value has become a key proposition for optimizing urban resource utilization. Using Shanghai as a case study, this paper represents spatial capacity with population density and locational value with land values. By quantifying the degree of spatial mismatch between population density and land values, the study reveals the imbalance between spatial capacity and locational value. First, the research calibrates the population grid data to obtain the population distribution within the study area; subsequently, a composite land value prediction model is constructed to compute the land value distribution across the study region; finally, the spatial mismatch index is calculated using the regression residual method to quantify the degree to which population density deviates from land values. The results indicate that there is a significant spatial mismatch between population density and land values in Shanghai, which unveils an imbalance in the allocation of spatial capacity within the city. This framework can be integrated into smart city digital twins and real-time monitoring platforms, providing 100 m resolution decision support for spatial resource optimization.

1. Introduction

China’s urbanization process is currently at a critical stage of transformation, shifting from decades of rapid, extensive urban expansion to a new development model that emphasizes quality [1]. The efficient integration and rational allocation of urban resources have become urgent focal issues in urban governance. Over time, a series of problems induced by previous extensive development have gradually emerged. On the economic front, excessively high housing prices constrain residents’ living choices, thereby triggering a separation between employment and residence—an issue particularly evident in the suburbs of large cities [2]. This distortion of the relationship between work and living, by diminishing the effective labor market size, undermines urban operational efficiency and becomes an invisible cost of urban development [3]. From a spatial perspective, experience-based controls on development intensity and the continued reliance on historically inherited spatial layouts have largely dictated urban form. In parallel, inflexible restrictive policies—such as mandatory daylight setbacks and rigid single-use zoning—have generated extensive areas that are functionally single-purpose and spatially homogeneous [4]. As a result, the actual utilization of spatial resources across different city locations has diverged markedly from their underlying locational advantages.
This study defines this sub-optimal allocation of spatial resources as a mismatch between urban spatial capacity and locational value. Spatial capacity refers to the buildings and urban spaces that support population and activities, while locational value reflects the comprehensive worth of a location as determined by factors such as transportation accessibility, public services, and environmental quality. The misalignment between these two indicates that the areas with the highest locational value are not accommodating a population and functions commensurate with that value; conversely, too many people are concentrated in areas with relatively lower locational value. At its core, the dilemma stems from a breakdown in the coordination of institutions and market forces: rigid zoning regulations and experience-based floor area controls prevent urban space supply from flexibly adjusting to demand [5]. The resulting price–signal distortion prevents the true scarcity of land from being accurately conveyed through market prices, disabling the market-driven housing filtering mechanism. As a consequence, supply caps in high-value areas drive land prices upward [6]; the inflated prices, in turn, foster “hold-for-value” vacancy and land hoarding [7], creating a “high-price–low-FAR” paradox [8]. Scarcity-induced competition then triggers “reverse filtering”, with higher-income households bidding away stock once intended for lower-income groups [9]. Conversely, low-value districts accumulate idle capacity, and oversupply further widens the spatial mismatch [10]. This causal chain underlies the economic and spatial dysfunctions discussed earlier and recurs across a variety of cities.
To measure the mismatch between spatial capacity and locational value, this study uses the relationship between population density and land price as its entry point. On one hand, higher population density implies that more people can be accommodated in the same area, suggesting a higher floor area ratio or a greater number of housing units [11]—in other words, a more efficient utilization of spatial resources. Thus, population density is used as a proxy for spatial capacity. On the other hand, land value is a critical measure of a location’s advantages [12], representing the capitalization of those comprehensive locational benefits, including cost savings and benefits gained from convenient transportation and abundant facilities [13]. Therefore, this study employs land values to represent locational value.
Classical urban economic theories provide a solid foundation for the relationship between population density and land values. As early as in Clark’s work [14], it was proposed that urban population density decays exponentially from the center outward. Subsequently, Alonso [15], Mills [16], and Muth [17] developed the monocentric city model, showing that land prices and population density decrease in tandem with increasing distance from the city center. As Brueckner [18] noted, higher land prices correspond to greater land use intensity and thus higher population densities, so the two generally exhibit similar spatial distribution patterns. Numerous studies have confirmed a significant positive correlation between land prices and population density [19,20]. Therefore, under ideal conditions, a city should display a coordinated distribution of population density and land prices; however, in reality, this correspondence is not always well aligned, and the literature has not sufficiently examined such cases of spatial divergence.
To explain the phenomenon of the separation between population density and land values, this study introduces the theoretical framework of spatial mismatch. In 1968, Kain introduced the concept of spatial mismatch in his study of the employment difficulties faced by African American residents in U.S. cities. Subsequent studies further extended this concept to explain various resource allocation imbalances [21], such as the uneven spatial distribution of green spaces, housing, and public service facilities [22,23,24]. This study focuses on the spatial mismatch between population density and land prices, manifested both as shortages from insufficient land supply and as surpluses from overdevelopment [25]. Existing research addresses this issue at multiple levels. First, at the descriptive level, related research has shown that empirical population–land-price relationships deviate from theoretical models [26], and that this divergence has intensified as cities evolve [27], confirming the widespread nature of such mismatches. Second, at the analytical level, investigations have identified micro-scale policy controls that misalign density signals and price signals [28], as well as macro-scale discrepancies between population urbanization and land urbanization [29]. This indicates that mismatch should be defined by deviations from optimal efficiency, equity, and urban form. Third, at the measurement level, various methods have been proposed, including the supply–demand elasticity differential index (population growth rate minus land development rate) [30], regulatory tax incidence metrics [31], and spatial statistical techniques such as Moran’s I and LISA local spatial association to quantify mismatch. However, most existing studies focus on regional scales, with limited micro-scale analyses of population density–land-price mismatches within individual cities.
In order to explore the spatial mismatch phenomenon in a real-world setting, this study adopts Shanghai as its case city. First, Shanghai’s land market is highly developed, and the publicly available population and land value datasets are both comprehensive and reliable. Second, previous research has identified a ring-shaped imbalance between employment and residence within Shanghai’s core [32] and a cost-driven spatial stratification of population and land values [33], suggesting that the distribution of key urban elements may be distorted. Third, as a representative Chinese megacity, insights gained from Shanghai can guide the optimization of spatial capacity allocation in other large cities.
In summary, this study quantitatively identifies and analyzes the degree and spatial pattern of mismatch between population density and land value within Shanghai, thereby providing a basis for optimizing spatial resource allocation. The study’s innovations and contributions lie on three levels. First, at the research-context level, it focuses on the key pain points of China’s current transition from an extensive to an intensive real-estate development phase and exposes the spatial resource misallocations that have emerged during past rapid expansion. Second, at the theoretical level, whereas traditional research emphasizes the correlation between land value and population distribution, this work shifts attention to their spatial separation and investigates the underlying causes of such mismatch. Finally, at the methodological level, it proposes a new land value prediction model which, when combined with high-resolution population data, enables intra-urban mismatch analysis at a 100 m raster scale. Therefore, this spatial mismatch identification framework can serve as a core module of a digital twin platform, seamlessly connecting with real-time population sensing and land transaction data to achieve high-frequency, high-precision monitoring of spatial resource allocation in real time. Furthermore, this method can be used to dynamically assess the impacts of varying development intensities or land use policies on the distribution of urban spatial capacity, thereby facilitating more precise spatial resource optimization decisions.
The experimental procedure and the structure of this paper are arranged as follows: First, high-resolution population grid data are corrected using census data to provide a detailed depiction of the spatial distribution of the population within the study area. Second, by combining multi-source data, the relevant variables for land value prediction are calculated, and a composite model is constructed by integrating ElasticNet regression, LightGBM, and geographically weighted regression models. This model is used in conjunction with these variables to predict land values, thereby obtaining the spatial distribution of land values within the study area. Finally, the regression residual method is employed to calculate the spatial mismatch index, which measures the degree of spatial mismatch between population density and land values. Through these methods, this paper quantifies the spatial mismatch between population density and land values within Shanghai and explores the spatial distribution patterns of the mismatch index, providing a basis for the rational allocation of spatial capacity.

2. Materials and Methods

2.1. Population Raster Data and Correction

This study selected 100 m resolution constrained population raster data provided by WorldPop and conducted zonal corrections on the population raster using census data, thereby establishing the data foundation that represents the population distribution. Existing population distribution data mainly fall into three categories: first, the registered population or permanent residents counted based on administrative divisions; second, the population is counted on a per-residential-unit basis; and third, population raster data. Compared to the first two methods, population raster data exhibit a more continuous distribution and finer unit delineation, and they can reflect the intensity of population activities on various land uses, which is why they are widely applied in urban planning, land value appraisal, and other fields. A wide range of population gridded datasets are now available and can be obtained from websites such as WorldPop and LandScan. Population rasters are typically generated using a top-down approach, in which various models leveraging multi-source data estimate a probability surface of population distribution, and the official census total is then proportionally allocated to each grid cell based on that surface. Because data sources typically include high-precision inputs such as nighttime light remote sensing, the resulting gridded population datasets can achieve spatial resolutions of 100 m or better.
The constrained population raster utilized in this paper comes from WorldPop, which models population distribution using the Random Forest-based dasymetric redistribution method developed by Stevens et al. [34]. Unlike traditional unconstrained population data, it limits the population distribution to actual residential areas, thereby enhancing the accuracy of population estimation and avoiding the long-tail effect [35,36]. The study defines the study area as the land area of Shanghai, excluding its islands, extracts the relevant raster data, and subsequently conducts population raster correction, land value prediction, and spatial mismatch index calculation within this rasterized region. The initial population raster is shown in Figure 1.
After obtaining the population raster, the study calibrated it using district-level census data from 2020, with each district’s census data based on the results of Shanghai’s seventh national census published by the municipal government. This calibration method is founded on two considerations. First, conventional population rasters are produced by distributing national census counts across administrative units, which inevitably introduces biases at the city level or finer scales. By applying the same top-down procedure used in the original raster creation—redistributing municipal or sub-municipal census totals back onto each grid cell—these biases can be effectively corrected. Second, the covariates driving the raster model—such as annual average nighttime lights, temperature, and precipitation—are reported on the same annual timescale as the census data. Calibrating the rasters with census totals therefore preserves temporal consistency and yields more accurate estimates of yearly average population activity. This approach of combining gridded estimates with authoritative census figures for population distribution mapping has been widely adopted [37].
The specific calibration process employed in this study is as follows: first, the population raster is aggregated by administrative district to calculate the total raster population within each district; then, these statistics are compared with the corresponding census data to compute the deviation ratio between the two; finally, the raster data within each district are adjusted according to this ratio, ensuring that the corrected total population in each district aligns with the census figures. The correction process can be represented by the following equation:
y i = y i P C j C y j
where y i denotes the initial population value assigned to raster cell i , and y i is the calibrated population value for the same cell after correction. P C represents the official census total population of administrative region C . The denominator j C y j is the sum of all initial raster-cell population values within region C , serving as a baseline against which the raster estimates are rescaled. By multiplying each y i by the ratio P C / j C y j , every cell is proportionally adjusted so that the aggregate of the corrected raster values exactly equals the census-reported total for region C , thereby eliminating systematic bias while preserving the relative spatial pattern.
In this manner, the corrected data not only maintain the accuracy of the census but also preserve the high spatial precision of the raster data, thereby providing a more reliable basis for subsequent spatial mismatch analysis.

2.2. Land Value Prediction Model Construction

This study builds a land value prediction model based on existing land transaction data and the detailed information of each parcel. To ensure temporal consistency with the population data, the research centers on the year 2020 and selects five years of land transaction data (from 2018 to 2022), with the data sourced from the Shanghai Land Market website [38]. First, the geographic location information of each parcel is calibrated one by one, and invalid data with unclear coordinates or missing transaction information are eliminated; ultimately, a total of 1248 valid transaction records with plot locations as shown in Figure 2 are retained as the basis for training the subsequent land value prediction model. During the model training process, a K-fold cross-validation method (K = 5) is employed to divide the data into training and validation sets, with each training set containing 999 records and each validation set containing 249 records, to ensure the accuracy and stability of the model evaluation.
In this paper, following existing studies [39,40], a total of 50 land value prediction variables are selected and categorized into four groups: facility quantity, facility distance, accessibility indicators, and non-spatial indicators. The specific variables are provided in Table 1.
Firstly, facility quantity reflects the density of specific types of facilities surrounding a parcel; it is intended to capture the agglomeration effect of regional service facilities and can indicate the richness and service level of basic living and service amenities in the area. Its calculation method is based on constructing a 5000 m service area using the actual road network and counting the number of POIs covered within that area.
Secondly, facility distance represents the shortest distance between the parcel and key urban facilities in its vicinity, such as large commercial centers, major roads, or large hospitals; this indicator is intended to reflect the ease of travel from the parcel to important facilities. Its calculation method is to compute the shortest distance from the parcel to a certain category of POIs, line segments, or AOIs using the real road network.
Thirdly, this paper employs space syntax indicators to characterize accessibility, so as to better accommodate the complexity of urban spatial networks and to compensate for the limitations of traditional monocentric models. In recent years, space syntax indicators have increasingly appeared in the prediction of land values and population density [41,42,43]. This study uses the sDNA spatial design network analysis software [44] under the ArcGIS platform, combined with the hybrid measurement method based on metric distance and angular changes [45,46]—with equal weights set at 1:1 [47]—to calculate closeness and betweenness centrality under different analysis radii. The closeness indicator measures the average shortest path length between a node and all other nodes; the closer the distance, the higher its value. The study uses MHD to reflect closeness, which in fact measures farness (the reciprocal of closeness) [36], which is defined as
M H D x = y R x d H x , y W y P y y R x W y P y
where R x is the set of polylines within link x ’s network radius. The distance according to a metric M, along a geodesic defined by M, between an origin polyline x and a destination polyline y is denoted as d H x , y . The weight of a polyline y is denoted as W y . The proportion of any polyline y within the radius is denoted as P y .
The other part, the betweenness indicator, reflects the frequency with which a node appears in the shortest paths between other nodes; the more often it acts as an “intermediary”, the higher its value. The study uses the BtH indicator to represent betweenness. The formula is shown below:
B t H x = y N z R y W y W z P z O D y , z , x
where the set of polylines in the global spatial system is denoted as N . Starting from an origin y , the set of reachable destinations within the given analysis range is denoted as R y , with z representing an element within that set. O D y , z , x is used to determine whether the subject x plays an intermediary role in the shortest path between each pair of origin y and destination z , with its value ranging from 0 to 1 under different conditions [48].
Finally, the non-spatial indicators encompass other attributes of the parcel, such as the transaction year, land use classification, surrounding slope, and the socioeconomic indicators of the area, serving as a complement to the aforementioned spatial indicators.
Subsequently, the study constructs a composite land value prediction model consisting of three sub-models: firstly, an ElasticNet model, which is mainly used to capture the linear relationship between the independent variables and land value; secondly, a LightGBM model, used to reflect the nonlinear relationships among variables; and thirdly, a geographically weighted regression (GWR) model, whose primary function is to capture the spatial dependency among variables. Finally, the outputs of the three sub-models are combined using linear regression to produce the final land value predictions.
Early land value prediction mainly relied on hedonic regression methods [39,49,50], which used multiple linear regression to explain land value differences by linearly fitting the marginal contributions of various variables; however, due to the simplicity of the model construction, such models cannot capture the spatial dependency of variables. To further improve on this shortcoming, subsequent studies introduced regression kriging [51], whose core operation is to apply kriging interpolation to the regression residuals and then add the results to the hedonic model’s predictions; this approach can account for some variables with spatial autocorrelation that were not considered, thereby providing a stronger explanatory power for land value. However, whether using hedonic regression or regression kriging, both methods can only capture the linear relationships among variables, making it difficult to reflect complex nonlinear effects, and their explanation of spatial dependency is not sufficient. Therefore, this study further expands and optimizes the traditional model by integrating three sub-models—ElasticNet, LightGBM, and GWR—which not only take into account the linear and nonlinear relationships between variables and their spatial dependencies but also maintain the simplicity of the model structure, thereby providing an efficient and robust solution for large-scale, full-domain land value prediction.
The first sub-model in this study uses ElasticNet regression, a linear method that combines L1 regularization and L2 regularization; it can eliminate variables that do not contribute to the prediction or contain excessive noise, and impose appropriate penalties on the retained variables, thus avoiding overfitting and enhancing the model’s generalization capability. The model estimates the coefficients of the variables by minimizing the weighted sum of the prediction error and the regularization penalty term. The specific objective function is expressed as
min β 0 , β 1 2 N i = 1 N y i β 0 j = 1 p β j x i j 2 + λ 1 α 1 2 j = 1 p β j 2 + α j = 1 p β j
where the first term is the mean squared error loss function used to measure the deviation between the model’s predictions and the actual values; the second term is the regularization penalty, where λ controls the overall strength of regularization and α determines the weight distribution between L1 and L2 regularization—when α = 1 , the model degenerates into a pure LASSO regression, and when α = 0 , it becomes ridge regression. Through this joint penalty approach, ElasticNet can not only select variables that significantly impact land value but also effectively handle multicollinearity issues.
The second sub-model of this study is LightGBM, an algorithm based on Gradient Boosting Decision Tree (GBDT), which is used to capture complex nonlinear relationships among variables. Compared with traditional linear models, LightGBM can integrate multiple decision trees to iteratively fit the residuals in the data, thereby continuously optimizing the prediction performance. At the same time, the model employs a histogram-based algorithm and a leaf-wise growth strategy, which provide strong robustness and overfitting prevention capabilities. The prediction function of LightGBM can be expressed as a weighted sum of multiple decision trees, mathematically represented as
y ^ x = k = 1 K f k x
where f k x represents the prediction contribution of the kth regression tree for input x , and K is the total number of trees. The model’s objective function consists of the loss function and a regularization term. The objective function is specified as follows:
L = i = 1 N L y i , y i ^ + k = 1 K Ω f k
where L is the objective function, L y i , y i ^ represents the prediction error for the ith sample, and Ω f k is used to measure the complexity of the kth tree to help control model complexity and prevent overfitting. Through this iterative approximation combined with regularization, LightGBM can efficiently capture nonlinear features in large-scale, high-dimensional data and significantly improve the accuracy of land value prediction.
The third sub-model of this study is the geographically weighted regression (GWR) model, a regression method capable of capturing spatial heterogeneity and local dependency. Unlike traditional global regression models, GWR allows the regression coefficients of each parcel to vary with its geographic location, thereby not only revealing global land value trends but also identifying abnormal patterns or local effects within specific regions. The GWR model is formulated as
y i = β 0 u i , v i + k = 1 p β k u i , v i x i k + ϵ i
where y i is the land value of the ith parcel, u i , v i represents the geographic coordinates of that parcel, β k u i , v i are the local regression coefficients varying with location, x i k is the kth independent variable for the ith parcel, and ϵ i is the error term.
After obtaining the prediction results from the ElasticNet, LightGBM, and GWR sub-models, this study integrates them using a linear regression model. The purpose of this step is to automatically learn the optimal combination of weights for each model’s predictions, thereby fully integrating the information contained in them. Specifically, the study first merges the prediction results of the validation sets from each fold of the cross-validation of the three sub-models into a complete dataset in the order of samples; then, using these out-of-sample prediction results as inputs, a final fusion model is trained via linear regression. The rationale behind this approach is that cross-validation ensures that each sample’s prediction comes from an out-of-sample result not used in training, thereby accurately reflecting the model’s generalization performance, and that unifying the out-of-sample predictions from all folds fully utilizes the information from the entire dataset, enhancing the robustness of the fusion model—a strategy well validated in the literature [52,53]. The ensemble prediction model is expressed as
y ^ = β 0 + β 1 y ^ ElasticNet   + β 2 y ^ LightGBM   + β 3 y ^ GWR
where β 0 represents the intercept, and β 1 , β 2 , and β 3 are the weights automatically obtained through linear regression, reflecting the respective contributions of the ElasticNet, LightGBM, and GWR predictions in the final model.

2.3. Design of the Spatial Mismatch Index

This study employs the regression residual method to quantify the spatial mismatch between population density and land value. The specific steps are as follows: First, the study adopts a power law relationship to describe the correspondence between the two sets of values. Mills’ research indicates that population density can be approximately regarded as a fixed power of land value, and this exponent reflects residents’ sensitivity to changes in housing prices [54]. Subsequent research has further expanded this formula [26], but the core remains based on the power law relationship—that is, on a logarithmic scale, land value and population density exhibit a linear relationship. Mills’ original power law relationship is as follows:
D x = R x 1 B
where D x denotes the population density function, R x denotes the land value function, x is the distance from the city center, and B is a parameter. The parameter B is defined as B = a 1 + E , where a is the share of land in housing production and E is the price elasticity of housing demand. Later, to simplify the relationship between population density and land value, the study applies a logarithmic transformation to convert the original power law relationship into a linear one. After logarithmic transformation, the relationship becomes
log Value = a + b log Pop + ϵ
where a and b are estimated using the least squares method and ϵ is the random error term. Finally, the spatial mismatch index is defined as the regression residual:
Mismatch = ϵ = log Value a + b log Pop
The advantage of this method is that extensive research has shown that population and land value exhibit a linear relationship on a logarithmic scale, and the residual removes the variation in land value explained by population, reflecting only the variation caused by other factors. Therefore, when the mismatch index in a given area is positive, it indicates that, at a given population level, the actual land value is higher than the predicted value, implying that the area is relatively underpopulated compared to its land value, whereas a negative mismatch index indicates that the actual land value is lower than the model prediction, possibly reflecting an excessive concentration of population in low land value areas. Traditional methods typically compute the location quotient of population density to land value or directly calculate the difference between the two; however, due to the power law relationship between population and land value, this often results in a mismatch index that correlates with one of the two variables, making it difficult to accurately reflect the spatial mismatch. In contrast, the regression residual method, by removing the correlation between the two variables, significantly reduces the dependency of the mismatch index on the variables themselves, thereby providing a more accurate measure of spatial mismatch.

3. Results

3.1. Population Density Distribution

In this study, the 2020 Seventh Population Census data of Shanghai were used to perform zonal correction on the WorldPop constrained population raster data, thereby obtaining a more accurate spatial distribution of the population. In this research, the ratio of census population to the population counted by the raster is used to measure the consistency between the two; the closer the value is to 1, the better the match. Before correction, the total population counted by the raster within the study area was approximately 31.251 million, while the total census population was 24.233 million. The population counts from the raster data in each district were generally higher than the census values, with an overall consistency of 0.775—clearly below 1—indicating an overall overestimation of the population scale by the raster data. Among the districts, Changning District had the lowest consistency at only 0.685, signifying a particularly severe overestimation in its raster data; additionally, the consistencies in Hongkou District and Baoshan District were 0.706 and 0.739, respectively, while Jiading District achieved a consistency of 0.836, making it the district with the smallest error. After correction, the total population represented by the raster for each district was consistent with the census data. For detailed population correction information, see Table 2, and the corrected population distribution is shown in Figure 3.
After correction, the overall population raster exhibits a gradient distribution of “high in the center and low at the periphery.” Before correction, the raster values ranged from 6 to 479 (people), whereas after correction, they were reduced to 5–361 (people). Following calibration, while maintaining the overall distribution trend unchanged, the raster values in each district more closely match the data from the Seventh Population Census. The central urban area still shows a high population density, while the suburban areas far from the city center exhibit significantly lower population density—reflecting the agglomeration effect of various urban elements and demonstrating the intensity of land use and the level of economic development in the central region, in accordance with fundamental economic principles.

3.2. Land Value Distribution

In the land value prediction results section of the composite model, we evaluated the importance of 50 independent variables across the three sub-models—ElasticNet, LightGBM, and GWR—and ranked the variables accordingly. Table 3 below presents the ten most important variables in each of the three sub-models. The ElasticNet model uses regression coefficients to measure each variable’s marginal contribution, with many variables contributing little to land value being automatically eliminated due to the use of L1 regularization. In contrast, the LightGBM model calculates feature importance by accumulating the reduction in prediction error at each node split, enabling it to capture nonlinear relationships and interaction effects; higher values indicate a greater role of the variable in nonlinear modeling. Additionally, for the GWR model, the overall impact level of each variable on land value across different regions is reflected by computing the average of the absolute values of the regression coefficients for each variable from all the local regression models.
In the final composite model, the LightGBM model holds the highest weight, and its five most important variables are the number of schools around the parcel, the distance from the parcel to the nearest hospital, the year of the parcel transaction, the closeness centrality within a 3000 m radius, and the global betweenness centrality indicator. The results indicate that space syntax indicators representing network centrality, as a whole, play an indispensable role in the model. The model effectively captures the impact of closeness centrality on land values—especially the closeness centrality indicator within a roughly 3000 m radius—while among the betweenness centrality indicators, only global betweenness centrality shows a relatively significant influence on land value. In the ElasticNet model, the best performance is observed for non-spatial indicators, such as parcel function type, surrounding housing prices, and transaction year, which have a clear linear relationship with land value. However, the performance of various centrality indicators in the ElasticNet model is subpar, possibly indicating that the association between these indicators and land value is difficult to capture through a simple linear relationship, thereby reflecting the limitations of linear models. In the GWR model, the betweenness centrality indicators perform extremely well overall, with the betweenness centrality at a 9000 m radius being the most important—indicating that this indicator has a strong spatial local variation in its effect on land value. In contrast, such variation is smoothed out in both linear and nonlinear models, making it difficult to capture the importance of betweenness centrality. In summary, these findings demonstrate the importance of spatial syntax indicators in land value prediction: nonlinear models capture the impact of central proximity on land prices more effectively, whereas geographically weighted regression models excel at capturing the effects of intermediary centrality. Moreover, by combining multiple models, we integrate each approach’s strengths, resulting in higher predictive accuracy and greater stability.
In the land value prediction results of the composite model, evaluations were first conducted separately for the three sub-models, for the combined model of linear and nonlinear methods, and for the final composite model that integrates all three sub-models. The performance of each model is presented in Table 4. Experimental results show that among the three sub-models, the nonlinear model performs the best. Furthermore, after combining the ElasticNet model with LightGBM, the adjusted R2 of the model increases to 0.788. Finally, by fusing the prediction results of the three sub-models on the validation set using linear regression to determine the optimal combination weights, the adjusted R2 further improves to 0.795. The linear regression fusion coefficients of the final composite model reveal that the contributions of the ElasticNet model, LightGBM, and the GWR model to land value prediction are 0.11, 0.68, and 0.22, respectively, with an intercept of −0.10. This outcome not only demonstrates the dominant role of nonlinear factors in land value prediction but also highlights the unique value of GWR in explaining local spatial heterogeneity. Overall, the composite model explains approximately 79.5% of the variability in land value on the validation set and, compared to each individual traditional model, it shows higher stability and generalization ability—thus, it is chosen for land value prediction across the entire study area.
In this section, we extend the previously trained composite model to predict land values over the entire study area. The specific steps are as follows: first, convert the population raster data into point data and, using the same method as in the training phase, calculate the variable values corresponding to each point; next, employ the best parameters determined from the 5th fold for each of the three sub-models to predict the full dataset, ensuring that the prediction results are representative and consistent. For the ElasticNet model, the new data are first standardized using the pre-saved normalization parameters, and then the intercept and coefficients obtained in the 5th-fold training are loaded to calculate the predicted value for each point. For the LightGBM model, the model saved during the 5th-fold training is directly called; this model quantifies the importance of each variable by calculating the reduction in prediction error each time a node is split during the construction of each decision tree and accumulating the error reduction contributions of all variables across all trees. For the GWR model, the key challenge is how to extend the small-sample training results to a larger scale. Therefore, the study first computes the Euclidean distance between the features of each point involved in the final land value prediction and all training points, and then uses the bisquare kernel function with a preset bandwidth to obtain weights reflecting spatial proximity (with closer distances receiving higher weights). Subsequently, local regression coefficients are solved using weighted least squares, and the inner product of the point features’ independent variables with the obtained regression coefficients is computed to derive the GWR model’s predicted value. Finally, the prediction results of the three sub-models across the study area are taken as inputs; using the coefficients and intercept previously obtained through linear regression fusion, the final land value prediction values are calculated. This method ensures consistency in data processing at each stage and achieves a smooth transition from small-sample training to large-sample extrapolation. Figure 4 shows the distribution of predicted land values.
The land value prediction map shows that the spatial distribution of predicted land values in Shanghai generally exhibits a center–periphery gradient. High-land-value areas are primarily concentrated in the Lujiazui financial district in the eastern bank of the Huangpu River and in the areas west of the river; the high-value belt is mainly distributed along a northeast–southwest direction, aligning well with Shanghai’s spatial pattern. In numerical terms, the predicted land values in the study area range from 574 CNY/m2 to 241,044 CNY/m2. The high land values in the central area are not only closely related to economic activities and commercial agglomeration but also reflect the advantages in transportation accessibility and public service provision; in contrast, land values in the urban periphery and far suburbs are relatively lower, consistent with their lower development intensity and facility provision levels. In summary, these results provide reliable, high-precision land value distribution data for subsequent spatial mismatch analysis. Achieving 100 m level resolution is difficult with traditional data and models, thereby further avoiding the modifiable areal unit problem [55].

3.3. Spatial Mismatch Index Distribution

This study employs the regression residual method to measure the spatial mismatch between population density and land value, calculating the spatial mismatch index distribution across the study area and computing the average mismatch index for each subdistrict, as shown in Figure 5. Colors closer to deep blue indicate that population density is relatively low compared to land value, while colors closer to deep red indicate that population density is relatively high compared to land value; light-colored areas imply a relative balance between population density and land value. Overall, the urban center exhibits a phenomenon where land values are relatively high compared to the population scale, suggesting that although the central area has strong locational advantages, it does not accommodate a proportionate population. At the same time, lower values are mainly distributed in the western semicircular area of the city center, indicating that although this region has relatively poor locational advantages, it has absorbed an excessive number of residents.
After obtaining the subdistrict-level spatial mismatch index distribution, SKATER spatial clustering was used to divide the study area into different regions, as shown in Figure 6. The study progressively increased the number of clusters, and to ensure consistent cluster scales, further subdivision was halted when a cluster became too fragmented (i.e., when a cluster contained fewer than 10% of the total number of subdistrict-level administrative areas). Through testing, the study area was divided into four clusters. In general, Cluster 1 essentially overlaps with the core of the central urban area, while the remaining clusters are distributed around Cluster 1.
The central Cluster 1 overall exhibits a northeast–southwest orientation. It covers the central urban area west of the Huangpu River, extending north to Yangpu District and south to Xuhui District, and also includes parts of Lujiazui in the east. This cluster has an average spatial mismatch index of 1.1, with a very concentrated distribution of high mismatch values—notably, the 49 subdistricts with the highest mismatch indices are all situated within this cluster. Cluster 2 is located in the western part of Shanghai, adjacent to Suzhou and Jiaxing, and serves as the gateway between the central urban area and the outside. This cluster has an average spatial mismatch index of −0.4; among areas with negative mismatch values, the regions with the most pronounced mismatch are largely concentrated within this cluster. Cluster 3 mainly covers most of Songjiang District and Jinshan District, and has an average spatial mismatch index of −0.3, indicating that, at a given population level, the actual land value in this area is lower than the baseline level predicted by population. Cluster 4 includes most of Fengxian District and Pudong New Area, and also surrounds Cluster 1. This cluster has an average spatial mismatch index of 0.1, indicating that population density and land value are relatively balanced.
In summary, positive mismatch indices are concentrated in Cluster 1, which corresponds to the core of the central urban area; negative indices are predominantly found in Clusters 2 and 3, forming the western semicircular belt around the center; and Cluster 4 displays a relatively balanced relationship between population density and land value.
Finally, a comparative analysis of the cumulative distributions of population density and land value was conducted to further validate their separation. The detailed cumulative distribution curve is presented in Figure 7. First, the data were arranged in ascending order by land value, and then the cumulative percentages of land value and the corresponding population density were calculated in that order. By comparing the cumulative curves, the study found that the cumulative population curve consistently lies above the cumulative land value curve. This indicates that within the study area, as land values increase, the change in population density is relatively gradual, leading to a situation where high-land-value areas have a relatively lower population density compared to their land value, while low-land-value areas have a relatively higher population density. Overall, by analyzing the distribution of the spatial mismatch index and the cumulative curves, the study confirms that there is indeed a spatial mismatch between population density and land value within the study area, and that this mismatch index exhibits distinct regional clustering characteristics.

4. Discussion

4.1. Interpretation of Results

This study’s findings confirm the decoupling of population density and land value distributions in Shanghai and, through the cumulative-percentage analysis, further reveal the deep drivers of this mismatch. According to the previous discussion of Mills’ formula [54], population density can be approximately regarded as a fixed power of land value, with the exponent reflecting the price elasticity of housing demand, defined as the percentage change in housing demand resulting from a 1% change in price. Related research has shown that typically, the land value gradient in cities is flatter than the population gradient [56], meaning that this exponent is generally positive. According to this conclusion, the cumulative land value curve should lie above the cumulative population curve; however, this study’s findings are exactly the opposite, with the calculated exponent between the two being negative. According to Mills’ formula, this indicates that the price elasticity of housing demand is relatively low, meaning that residents are relatively insensitive to changes in housing prices. This low elasticity stems from housing’s quasi-necessity and the scarcity of substitutes, as well as supply-side constraints imposed by land use policies and planning rigidity, making high demand difficult to alleviate through increased supply or reduced demand [57]. It also implies that rising housing prices will not significantly curb demand, thereby reinforcing expectations and behaviors that sustain high prices, which in turn feed into land values and manifest as relatively high land values in certain areas. To explore this issue in urban spatial practice, this study further analyzes four mismatch zones.
Targeting Cluster 1, further analysis shows that the areas with high mismatch values can be divided into two types: one type consists of large, concentrated residential areas—where the study observed that extensive residential zones significantly elevate the mismatch index, suggesting that the spatial capacity for residential land is relatively low. On one hand, many central-area residential districts are aging and low-rise—due to their early construction dates—and as the city has expanded, these plots have become part of the urban core, but without urban renewal, their capacity fails to match their locational value. On the other hand, areas concentrated with new residential developments also exhibit high mismatch, reflecting over-regulation of development intensity in planning policies and intense competition for land adjacent to high-quality educational and medical facilities, which drives up land prices without a commensurate increase in housing capacity. The other type is located near major spatial barriers; a particularly striking example is the train depot on the east side of Shanghai Station, where the surrounding Beizhan Subdistrict, Baoshan Road Subdistrict, and North Sichuan Road Subdistrict rank among the top five in mismatch value across the city. This demonstrates that while these large surface transportation facilities can diminish locational value through impaired accessibility, the space they occupy causes an even greater capacity loss. Therefore, redevelopment projects such as railway overbuild development to remove spatial barriers are needed to further enhance spatial capacity in the area. Overall, the underlying cause of high mismatch in both types of areas lies in insufficient spatial capacity, which results from the combined effects of factors such as the age of construction, regulatory constraints, and spatial barriers.
Cluster 2 exhibits a negative mismatch index with a pronounced degree of imbalance. Through in-depth analysis, three interlinked mechanisms explain this pattern. First, the demand-filtering mechanism: as the built-up area most closely connected to the central city, high housing prices in the core and strict land supply constraints have diverted a large share of housing demand to this cluster, resulting in relatively high population density. Second, the accessibility-gain mechanism: major regional transportation routes run through this cluster, and three of Shanghai’s five new towns—namely Jiading New Town, Qingpu New Town, and Songjiang New Town—are located within it. Owing to its unique locational conditions and the associated new town planning, a large influx of new and migrant populations has entered this area. Third, the amenity-lag mechanism: the cluster is still under development, and the provision of public services such as education, healthcare, and retail lags behind population growth, leading to relatively low locational value despite high density. Overall, the excessive inflow of population combined with insufficient amenities leads to a significant disparity between population density and land value in this cluster.
Cluster 3 also exhibits a negative mismatch index, although its magnitude is slightly lower than that of Cluster 2. The mismatch in this cluster arises from three factors. First, the preferential pricing of industrial land: large tracts of industrial land north of the Huangpu River benefit from government incentives, driving down the cost of land supply. Second, the lack of major transportation arteries: provincial-level highways bypass this cluster and instead run through Cluster 2, while the Huangpu River creates a geographic barrier, both of which diminish its locational value. Third, the presence of extensive contiguous farmland and fragmented construction land—reinforced by farmland protection policies—prevents the emergence of large-scale urban development. Overall, the primary cause of mismatch in this cluster is its relatively poor locational conditions, compounded by the low average land prices driven by industrial land, resulting in locational values that underperform relative to its population levels.
Cluster 4 exhibits the lowest degree of mismatch, with population density and land value broadly balanced. In the peripheral riverfront and coastal areas, industrial land and logistics warehousing predominate, resulting in negative mismatch indices. In the central zone, population density and land value are generally aligned, although some newly developed subareas display positive mismatch due to preemptive planning and incomplete population settlement. Overall, the spatial mismatch remains minimal across this cluster.
Taken together, Clusters 1 and 2 best illustrate the two contrasting faces of spatial mismatch within the study area. Cluster 1, the high-density urban core, is hampered by a rigid and inelastic development capacity; the displaced middle- and low-income households therefore pour into Cluster 2, the zone with the greatest physical continuity to the center. This spill-over allows Cluster 2 to absorb an oversized population at relatively low land prices, producing a mirror image of “high price–low capacity” in Cluster 1 versus “low price–high population” in Cluster 2. The pattern corroborates the earlier argument on the breakdown of the institution–market nexus: excessive regulatory intervention and restrictive policies distort urban land supply and demand [58,59], while the inherent rigidity of the built environment further amplifies the imbalance. The result is soaring land and housing prices in the core and a population shift toward the periphery, which in turn aggravates job–housing separation and tidal traffic flows. By contrast, Clusters 2, 3 and 4 form a ring around the core yet display markedly different mismatch directions because of their divergent locational attributes and internal structures. Although Cluster 2 enjoys the most advantageous location in the outer ring, it shows a negative mismatch due to the sheer scale of population inflow. Cluster 3 also records a negative mismatch, but mainly because its land value is depressed by poor locational conditions. Cluster 4, despite sharing the absence of regional-scale transport corridors with Cluster 3, achieves a near balance between population and land value; its tighter connectivity to the central city, the presence of major infrastructure such as the airport, and intensive development in areas like Lingang collectively enhance its locational value.
Overall, spatial mismatch is not driven by a single factor; it emerges from the combined effects of development–capacity elasticity, transport accessibility and policy orientation. These forces jointly shape population movements and locational values, and only their relative equilibrium can eliminate mismatch. Moreover, a severe mismatch in one cluster can trigger cascading effects in adjacent clusters, further magnifying system-wide imbalance. Hence, an all-encompassing spatial perspective is essential for the efficient allocation of urban land resources.

4.2. Policy Recommendations

Based on an analysis of the mismatch in Shanghai, this study finds that overly strict limits on development capacity are the core cause of the problem, and the same mechanism is widespread across cities. For Shanghai, the mismatch in the central area stems from two factors. One is the high cost of renovating existing buildings, and the other, which is more important, is the excessive restriction on development intensity. The clearest evidence is that new residential projects, like older neighborhoods awaiting renewal, still contribute to spatial mismatch. Such market-distorting policies are common worldwide. Urban character conservation in cities such as Paris and Xi’an, for example, may reflect legitimate values but still restricts central city development, leading to a fragmented urban structure, high housing costs and long commuting times [4]. These less visible efficiency losses have become significant hidden costs in urban growth. Postwar Japanese danchi estates also obeyed strict sunlight rules [60], yet as cities expanded, these large, homogeneous, and low-intensity districts generated a series of urban problems [61]. Japan therefore took two steps. It promoted danchi regeneration to raise capacity and diversify functions, and it replaced rigid sunlight regulations with flexible local guidelines [62,63]. In short, China at this stage faces conditions similar to those once experienced in Japan and Korea, and worldwide the distortion of spatial capacity by policy remains a common issue.
This paper presents several recommendations for cities of all sizes to address spatial capacity mismatch. At the policy level, a flexible control model should replace one-size-fits-all or purely empirical limits: floor area ratios, daylight requirements, and height restrictions can be moderately relaxed, drawing on Japan’s shift from rigid standards to adaptable guidelines and setting tailored rules at subdistrict or grid scales to avoid unnecessary market distortions; planning guidance must also rest on simulation experiments that test various capacity scenarios, evaluating impacts on traffic congestion, public service loads, and spatial mismatch indices to identify and regularly update the combination of parameters that best balance efficiency, equity, and health. At the spatial optimization level, inner districts where population lags behind land prices—such as aging neighborhoods—should be revitalized through targeted redevelopment, mixed-use upgrades, and vertical infill to boost supply and urban quality, while peripheral areas with excessive population concentration are steered toward more balanced growth. Finally, in urban governance, the mismatch identification framework should be embedded in the smart city platform: high-frequency data feeds from mobile phone signals, nighttime imagery, and transaction prices keep mismatch maps continuously updated to flag capacity imbalances, and a digital twin sandbox allows real-time comparison of development intensities and regulatory settings to equip decision-makers with evidence-based guidance for optimizing spatial capacity.

4.3. Research Limitations and Future Directions

This study still has some limitations. First, the scope of the research is limited to Shanghai. Although the mismatch identification model performed well in Shanghai, its applicability in other cities remains to be validated. In preliminary experiments, cities such as Beijing and Shenzhen were selected and traditional linear models were used for land price prediction; however, the resulting goodness-of-fit was unsatisfactory. Although the land price prediction model developed herein substantially improved accuracy across these cities, the initially low baseline meant that the final accuracy was still insufficient for reliable forecasting. This may be due to the significant impact of policy-related factors on land prices, which are difficult to integrate into the prediction model. The main drawback of a single-case study is its inability to capture the natural relationship between population density and land price distribution—for example, a high mismatch index in Shanghai might represent only a low level of mismatch elsewhere. Therefore, the mismatch index proposed here signifies a relative relationship: it still reveals the imbalance between population density and land prices, but only through comparative studies of a large number of cases can a method for accurately measuring a given city’s overall mismatch level be established. Second, although the regression residual method was employed, it cannot completely eliminate all correlations between the mismatch index and land prices. Subsequent research should further optimize the model—for example, by incorporating exogenous variables such as policy stringency and land supply into the regression framework to more comprehensively control for omitted-variable bias. Additionally, machine learning-based orthogonalization techniques could be adopted to extract mismatch signals unrelated to land prices. Finally, this study focuses on intra-city issues, emphasizing the distribution of spatial capacity rather than its total quantity. If sub-regional mismatches are corrected, the total urban population could undergo significant change, and such external shocks to spatial capacity may have a greater impact on urban performance than the effects of correcting internal mismatches.
Overall, the current study has quantified the decoupling between population density and land prices, revealing an imbalance in spatial capacity allocation. Subsequent research will focus primarily on three key aspects. First, through comparative studies across multiple cities, the universality and specificity of the mismatch phenomenon will be examined, establishing a benchmark evaluation system for formulating targeted regulatory policies. Second, a simplified urban analysis model will be developed to explore the impact mechanism of spatial capacity on urban operational efficiency and spatial mismatch, thereby providing scientifically based policy recommendations. Third, by integrating the spatial mismatch index with multidimensional data such as urban productivity, residents’ well-being, and environmental indicators, a novel indicator system reflecting overall urban efficiency and sustainability will be constructed, which will in turn support research on spatial capacity optimization to promote the rational allocation of spatial resources and facilitate more refined and intelligent urban governance.

5. Conclusions

This study quantitatively identifies and analyzes the degree and spatial pattern of mismatch between population density and land value distribution within Shanghai, reveals the uneven allocation of urban spatial capacity, and provides a foundation for subsequent optimization of spatial resource allocation. First, by performing zonal correction on the population raster using census data, we obtained a high-resolution population distribution within the study area. Next, 50 variables for land value prediction were calculated and used in a composite model—one that combines the strengths of the ElasticNet model, LightGBM, and GWR—thereby deriving a high-precision land value distribution throughout the area. These two high-precision datasets strongly support a detailed study of intra-urban mismatch phenomena. Finally, we used the regression residual method to compute the spatial mismatch index. Overall, the distribution of the mismatch index exhibits distinct zonal characteristics. In Shanghai’s urban core, land values are generally high relative to population density, whereas the western periphery with lower locational value experiences excessive population concentration. Finally, further interpretation of the experimental results indicates that excessive constraints on spatial capacity have predominantly driven the formation of this mismatch.
In the context of intelligent and precise urban governance, rational distribution of spatial resources is the premise and core of solving numerous urban issues. This is because spatial resources constitute the foundation of urban functions and their distribution directly influences the efficiency of urban operations and thus economic performance. At the same time, spatial resources materialize as the city’s physical form, and their arrangement shapes urban morphology. Hence the allocation of spatial resources becomes the link between overarching economic drivers and the tangible urban environment. Its pivotal role demands that any approach to urban challenges consider the impact on these resources. Traditional restrictive policy measures have clearly overlooked this principle. Such measures often originate from positive values such as heritage conservation, daylight access or green space provision, but when applied without thorough examination, they collectively impose excessive constraints on spatial capacity and inflict significant hidden costs by undermining urban benefits. With technological progress, it has become possible to accurately identify distortions in spatial resource distribution. Furthermore, as smart city development advances, dynamic monitoring and simulation based on big data and artificial intelligence will further enhance the optimization of spatial resource allocation and realize maximal integrated urban performance.

Author Contributions

Conceptualization, P.W. and L.Z.; methodology, P.W. and M.Z.; software, P.W.; validation, P.W.; formal analysis, P.W.; investigation, P.W. and M.Z.; resources, L.Z.; data curation, P.W.; writing—original draft preparation, P.W. and M.Z.; writing—review and editing, L.Z.; visualization, P.W.; supervision, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the large file size of the original datasets for land value prediction and the involvement of high-precision urban information. The gridded population data are from WorldPop (https://hub.worldpop.org/geodata/summary?id=49730, accessed on 24 March 2025), and the census data are from the Shanghai Municipal Government website (https://tjj.sh.gov.cn/tjgb/20210517/2d1d4f05a2cc42ea94f991c9f19e6d4f.html, accessed on 24 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

GWRGeographically Weighted Regression
sDNASpatial Design Network Analysis
LASSOLeast Absolute Shrinkage and Selection Operator
GDPGross Domestic Product
GDPpcGross Domestic Product Per Capita
DEMDigital Elevation Model
MHDMean Hybrid Distance
BtHBetweenness Hybrid
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
RMSERoot Mean Square Error
R2Coefficient of Determination
Adjusted R2Adjusted Coefficient of Determination
SKATERSpatial ‘K’luster Analysis by Tree Edge Removal

References

  1. Wang, H.; Qiu, Y. Effect of new urbanization on cities’ innovation in China: Evidence from a quasi-natural experiment of a comprehensive pilot. PLoS ONE 2023, 18, e0284772. [Google Scholar] [CrossRef] [PubMed]
  2. Xiao, W.; Wei, D.; Li, H. Understanding Jobs-Housing Imbalance in Urban China: A Case Study of Shanghai. J. Transp. Land Use 2021, 14, 389–415. [Google Scholar] [CrossRef]
  3. Prud’homme, R.; Lee, C.-W. Size, sprawl, speed and the efficiency of Cities. Urban Stud. 1999, 36, 1849–1858. [Google Scholar] [CrossRef]
  4. Bertaud, A. Order Without Design: How Markets Shape Cities; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  5. Cheshire, P.; Sheppard, S. The Introduction of Price Signals into Land Use Planning Decision-making: A Proposal. Urban Stud. 2005, 42, 647–663. [Google Scholar] [CrossRef]
  6. Glaeser, E.L.; Gyourko, J.; Saks, R.E. Why Have Housing Prices Gone Up? Am. Econ. Rev. 2005, 95, 329–333. [Google Scholar] [CrossRef]
  7. Beswick, J.; Alexandri, G.; Byrne, M.; Vives-Miró, S.; Fields, D.; Hodkinson, S.; Janoschka, M. Speculating on London’s housing future. City 2016, 20, 321–341. [Google Scholar] [CrossRef]
  8. Hsieh, C.-T.; Moretti, E. Housing Constraints and Spatial Misallocation. Am. Econ. J. Macroecon. 2019, 11, 1–39. [Google Scholar] [CrossRef]
  9. Chapple, K.; Song, T. Can New Housing Supply Mitigate Displacement and Exclusion? J. Am. Plan. Assoc. 2025, 91, 1–15. [Google Scholar] [CrossRef]
  10. Serrano-Martínez, J.M.; García-Marín, R.; Lagar-Timón, D. Housing, population and region in Spain: A currently saturated property market with marked regional differences. Geogr. J. 2017, 183, 126–139. [Google Scholar] [CrossRef]
  11. Li, B.; Guan, M.; Zhan, L.; Liu, C.; Zhang, Z.; Jiang, H.; Zhang, Y.; Dong, G. Urban Comprehensive Carrying Capacity and Development Order: A “Pressure-Capacity-Potential” Logical Framework. Front. Environ. Sci. 2022, 10, 935498. [Google Scholar] [CrossRef]
  12. Füss, R.; Koller, J.A.; Weigand, A. Determining Land Values from Residential Rents. Land 2021, 10, 336. [Google Scholar] [CrossRef]
  13. Bertaud, A. The Spatial Distribution of Land Prices and Densities: The Models Developed by Economists. Working Paper #23, Marron Institute, New York University, 19 February 2015. Available online: https://marroninstitute.nyu.edu/uploads/content/Bertaud_-_The_Spatial_Distribution_of_Land_Prices_and_Densities.pdf (accessed on 24 March 2025).
  14. Clark, C. Urban Population Densities. J. R. Stat. Society. Ser. A (Gen.) 1951, 114, 490. [Google Scholar] [CrossRef]
  15. Alonso, W. Location and Land Use: Toward a General Theory of Land Rent; Harvard Univeristy Press: Cambridge, MA, USA, 1964. [Google Scholar]
  16. Mills, E.S. An Aggregative Model of Resource Allocation in a Metropolitan Area. Am. Econ. Rev. 1967, 57, 197–210. [Google Scholar]
  17. Muth, R.F. Cities and Housing: The Spatial Pattern of Urban Residential Land Use; University of Chicago Press: Chicago, IL, USA, 1969. [Google Scholar]
  18. Brueckner, J.K. Lectures on Urban Economics; MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
  19. Ottensmann, J.R. Urban Sprawl, Land Values and the Density of Development. Land Econ. 1977, 53, 389–400. [Google Scholar] [CrossRef]
  20. Hansen, J.D.; Kristensen, G. Price Profiles for Land in Danish Urban Areas. Urban Stud. 1991, 28, 277–287. [Google Scholar] [CrossRef]
  21. Grengs, J. Job Accessibility and the Modal Mismatch in Detroit. J. Transp. Geogr. 2010, 18, 42–54. [Google Scholar] [CrossRef]
  22. Sun, W.; Jin, H.; Chen, Y.; Hu, X.; Li, Z.; Kidd, A.; Liu, C. Spatial mismatch analyses of school land in China using a spatial statistical approach. Land Use Policy 2021, 108, 105543. [Google Scholar] [CrossRef]
  23. Xu, G.; Su, J.; Xia, C.; Li, X.; Xiao, R. Spatial mismatches between nighttime light intensity and building morphology in Shanghai, China. Sustain. Cities Soc. 2022, 81, 103851. [Google Scholar] [CrossRef]
  24. Li, J.; Geneletti, D.; Wang, H. Understanding supply-demand mismatches in ecosystem services and interactive effects of drivers to support spatial planning in Tianjin metropolis, China. Sci. Total Environ. 2023, 895, 165067. [Google Scholar] [CrossRef]
  25. Shen, L.; Zhang, L.; Bao, H.; Wong, S.; Du, X.; Wei, X. An Empirical Study on the Mismatch Phenomenon in Utilizing Urban Land Resources in China. Land 2023, 12, 1196. [Google Scholar] [CrossRef]
  26. Liotta, C.; Viguié, V.; Lepetit, Q. Testing the monocentric standard urban model in a global sample of cities. Reg. Sci. Urban Econ. 2022, 97, 103832. [Google Scholar] [CrossRef]
  27. Combes, P.-P.; Duranton, G.; Gobillon, L. The Costs of Agglomeration: House and Land Prices in French Cities. Rev. Econ. Stud. 2019, 86, 1556–1589. [Google Scholar] [CrossRef]
  28. Duranton, G.; Puga, D. The Economics of Urban Density. J. Econ. Perspect. 2020, 34, 3–26. [Google Scholar] [CrossRef]
  29. Wang, X.; Zhang, X. A Regional Comparative Study on the Mismatch between Population Urbanization and Land Urbanization in China. PLoS ONE 2023, 18, e0287366. [Google Scholar] [CrossRef]
  30. Egidi, G.; Cividino, S.; Quaranta, G.; Alhuseen, A.; Salvati, L. Land mismatches, urban growth and spatial planning: A contribution to metropolitan sustainability. Environ. Impact Assess. Rev. 2020, 84, 106439. [Google Scholar] [CrossRef]
  31. Ben-Moshe, D.; Genesove, D. Regulation and Frontier Housing Supply. arXiv 2022, arXiv:2208.01969. [Google Scholar] [CrossRef]
  32. Zhou, X.; Chen, X.; Zhang, T. Impact of Megacity Jobs-Housing Spatial Mismatch on Commuting Behaviors: A Case Study on Central Districts of Shanghai, China. Sustainability 2016, 8, 122. [Google Scholar] [CrossRef]
  33. Xiao, W.; Wei, Y.D.; Li, H. Spatial Inequality of Job Accessibility in Shanghai: A Geographical Skills Mismatch Perspective. Habitat Int. 2021, 115, 102401. [Google Scholar] [CrossRef]
  34. Stevens, F.R.; Gaughan, A.E.; Linard, C.; Tatem, A.J. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef]
  35. Reed, F.J.; Gaughan, A.E.; Stevens, F.R.; Yetman, G.; Sorichetta, A.; Tatem, A.J. Gridded Population Maps Informed by Different Built Settlement Products. Data 2018, 3, 33. [Google Scholar] [CrossRef]
  36. Stevens, F.R.; Gaughan, A.E.; Nieves, J.J.; King, A.; Sorichetta, A.; Linard, C.; Tatem, A.J. Comparisons of Two Global Built Area Land Cover Datasets in Methods to Disaggregate Human Population in Eleven Countries from the Global South. Int. J. Digit. Earth 2019, 13, 78–100. [Google Scholar] [CrossRef]
  37. Thomson, D.R.; Leasure, D.R.; Bird, T.; Tzavidis, N.; Tatem, A.J. How accurate are WorldPop-Global-Unconstrained gridded population data at the cell-level? A simulation analysis in urban Namibia. PLoS ONE 2022, 17, e0271504. [Google Scholar] [CrossRef] [PubMed]
  38. Shanghai Public Resource Trading Center. Available online: https://biz.ghzyj.sh.gov.cn/shtdsc/wz/ (accessed on 24 March 2025).
  39. Liu, Y.; Zheng, B.; Turkstra, J.; Huang, L. A Hedonic Model Comparison for Residential Land Value Analysis. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, S181–S193. [Google Scholar] [CrossRef]
  40. Morales, J.; Stein, A.; Flacke, J.; Zevenbergen, J. Predictive Land Value Modelling in Guatemala City Using a Geostatistical Approach and Space Syntax. Int. J. Geogr. Inf. Sci. 2020, 34, 1451–1474. [Google Scholar] [CrossRef]
  41. Giannopoulou, M.; Vavatsikos, A.P.; Lykostratis, K. A Process for Defining Relations between Urban Integration and Residential Market Prices. Procedia—Soc. Behav. Sci. 2016, 223, 153–159. [Google Scholar] [CrossRef]
  42. Xiao, Y.; Orford, S.; Webster, C.J. Urban Configuration, Accessibility, and Property Prices: A Case Study of Cardiff, Wales. Environ. Plan. B Plan. Des. 2015, 43, 108–129. [Google Scholar] [CrossRef]
  43. Morales, J.; Flacke, J.; Zevenbergen, J. Modelling Residential Land Values Using Geographic and Geometric Accessibility in Guatemala City. Environ. Plan. B Urban Anal. City Sci. 2017, 46, 751–776. [Google Scholar] [CrossRef]
  44. Cooper, C.H.V.; Chiaradia, A.J.F. SDNA: 3-D Spatial Network Analysis for GIS, CAD, Command Line & Python. SoftwareX 2020, 12, 100525. [Google Scholar] [CrossRef]
  45. Hillier, B.; Iida, S. Network and Psychological Effects in Urban Movement. In Proceedings of the Spatial Information Theory, Ellicottville, NY, USA, 14–18 September 2005; pp. 475–490. [Google Scholar] [CrossRef]
  46. Turner, A. From Axial to Road-Centre Lines: A New Representation for Space Syntax and a New Model of Route Choice for Transport Network Analysis. Environ. Plan. B Plan. Des. 2007, 34, 539–555. [Google Scholar] [CrossRef]
  47. Zhang, L.; Chiaradia, A.J. Walking in the Cities without Ground, How 3D Complex Network Volumetrics Improve Analysis. Environ. Plan. B Urban Anal. City Sci. 2022, 49, 1857–1874. [Google Scholar] [CrossRef]
  48. Cooper, C. Spatial Design Network Analysis (sDNA) Version 4.1 Manual; Cardiff University: Cardiff, UK, 2024; Available online: http://sdna.cardiff.ac.uk/sdna/software/documentation (accessed on 24 March 2025).
  49. Des Rosiers, F.; Thériault, M.; Villeneuve, P. Sorting Out Access and Neighbourhood Factors in Hedonic Price Modelling. J. Prop. Invest. Financ. 2000, 18, 291–315. [Google Scholar] [CrossRef]
  50. Law, S. Defining Street-Based Local Area and Measuring Its Effect on House Price Using a Hedonic Price Approach: The Case Study of Metropolitan London. Cities 2017, 60, 166–179. [Google Scholar] [CrossRef]
  51. Hengl, T.; Heuvelink, G.B.M.; Stein, A. A Generic Framework for Spatial Prediction of Soil Variables Based on Regression-Kriging. Geoderma 2004, 120, 75–93. [Google Scholar] [CrossRef]
  52. Breiman, L. Stacked Regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
  53. Sill, J.; Takács, G.; Mackey, L.; Lin, D. Feature-Weighted Linear Stacking. arXiv 2009, arXiv:0911.0460. [Google Scholar]
  54. Mills, E.S. Urban Economics; Scott, Foresman: Glenview, IL, USA, 1972. [Google Scholar]
  55. Openshaw, S. The Modifiable Areal Unit Problem; Geo Books: Norwich, UK, 1983. [Google Scholar]
  56. World Bank. Land is Perspective: Its Role in the Structure of Cities; World Bank: Washington, DC, USA, 1980; Available online: https://documents1.worldbank.org/curated/en/588491468913783287/pdf/Land-is-perspective-its-role-in-the-structure-of-cities.pdf (accessed on 24 March 2025).
  57. Glaeser, E.L.; Gyourko, J.; Saks, R.E. Why Is Manhattan So Expensive? Regulation and the Rise in Housing Prices. J. Law Econ. 2005, 48, 331–369. [Google Scholar] [CrossRef]
  58. Bertaud, A.; Renaud, B. Socialist Cities without Land Markets. J. Urban Econ. 1997, 41, 137–151. [Google Scholar] [CrossRef]
  59. Bertaud, A. Government Intervention and Urban Land Markets: The Case of China. J. Archit. Plan. Res. 2012, 29, 335–346. [Google Scholar]
  60. Koga, Y.; Nakamura, H.; Matsuura, K. Daylighting Codes, Standards and Policies Mainly in Japan. In Proceedings of the Daylighting ’98, Ottawa, ON, Canada, 13–15 May 1998. [Google Scholar]
  61. Sunikka-Blank, M.; Kiyono, Y. Why do you need more towers? Four approaches to sustainable urban regeneration in Japan. arq Environ. Des. 2021, 25, 372–383. [Google Scholar] [CrossRef]
  62. Sorensen, A.; Okata, J.; Fujii, S. Urban Renaissance as Intensification: Building Regulation and the Rescaling of Place Governance in Tokyo’s High-rise Manshon Boom. Urban Stud. 2010, 47, 556–583. [Google Scholar] [CrossRef]
  63. Council for Promotion of Regulatory Reform, Cabinet Office, Government of Japan. First Report by the Council for Promotion of Regulatory Reform—Opening the Door to Tomorrow 23 May 2017. Available online: https://www8.cao.go.jp/kisei-kaikaku/english/pdf/170523/item1.pdf (accessed on 24 March 2025).
Figure 1. Initial population raster.
Figure 1. Initial population raster.
Smartcities 08 00110 g001
Figure 2. Distribution of land price data used for training.
Figure 2. Distribution of land price data used for training.
Smartcities 08 00110 g002
Figure 3. Population raster after correction.
Figure 3. Population raster after correction.
Smartcities 08 00110 g003
Figure 4. Land value distribution.
Figure 4. Land value distribution.
Smartcities 08 00110 g004
Figure 5. (a) Spatial mismatch index represented by raster; (b) average spatial mismatch index by sub-district.
Figure 5. (a) Spatial mismatch index represented by raster; (b) average spatial mismatch index by sub-district.
Smartcities 08 00110 g005
Figure 6. Four mismatch clusters.
Figure 6. Four mismatch clusters.
Smartcities 08 00110 g006
Figure 7. Cumulative curves of land value and population.
Figure 7. Cumulative curves of land value and population.
Smartcities 08 00110 g007
Table 1. Variable information.
Table 1. Variable information.
GroupVariableDescriptionSource
Facility densityParksNumber of various facilities within 5 kmAMAP open platform
Clinics
Schools
Banks
Groceries
Restaurants
Scenic Spots
Entertainment Venues
Facility distanceMallsThe nearest distance to each type of facility
Hospitals
Universities
Museums
Metro Stations
Train Stations
CBD
District Center
Major Roads
Street accessibilityMHD1000MHD values within radii from 1000 m to 10,000 m and also in global analysis, reflecting the street network’s Closeness centralityBaidu Map open platform
MHD2000
MHD3000
MHD4000
MHD5000
MHD6000
MHD7000
MHD8000
MHD9000
MHD10000
MHDn
BtH1000BtH values within radii from 1000 m to 10,000 m and also in global analysis, reflecting the street network’s Betweenness centrality
BtH2000
BtH3000
BtH4000
BtH5000
BtH6000
BtH7000
BtH8000
BtH9000
BtH10000
BtHn
Non-spatial factorsYearYear of the land parcel transactionshtdsc.com
Type_RResidential land or not
Type_BCommercial land or not
Type_IIndustrial land or not
Point_XLongitude of the land parcel
Point_YLatitude of the land parcel
House PriceAverage surrounding housing priceHomeLink
GDPGDP of the districtShanghai
Government
GDPpcPer capita GDP of the district
DEM Elevation (DEM) of the locationGeospatial Data Cloud
SlopeSlope of the location
Table 2. Comparison of district raster population and census population.
Table 2. Comparison of district raster population and census population.
DistrictRaster PopulationCensus PopulationDeviation Ratio
Jiading2,195,1641,834,2580.836
Fengxian1,517,7731,140,8720.752
Baoshan3,023,4632,235,2180.739
Xuhui1,418,7311,113,0750.785
Putuo1,634,0821,239,8000.759
Yangpu1,696,9271,242,5480.732
Songjiang2,318,1621,909,7130.824
Pudong6,985,3215,681,5120.813
Hongkou1,073,232757,4980.706
Jinshan1,037,991822,7760.793
Changning1,011,985693,0510.685
Minhang3,333,8662,653,4890.796
Qingpu1,500,5281,271,4240.847
Jing’an1,550,025975,7070.629
Huangpu953,322.9662,0300.694
Total31,250,57224,232,9710.775
Table 3. Top 10 best-performing variables in the three sub-models.
Table 3. Top 10 best-performing variables in the three sub-models.
RankLightGBMElasticNetGWR
1SchoolsType_RBtH9000
2HospitalsType_BType_R
3YearType_IType_B
4MHD3000House PriceBtH10000
5BtHnYearType_I
6Metro StationsMuseumsBtH3000
7MuseumsParksBtH7000
8UniversitiesGDPBtH4000
9MHD1000BanksBtH1000
10POINT_YDistrict CenterBtH8000
Table 4. Performance comparison between the composite model and each sub-model.
Table 4. Performance comparison between the composite model and each sub-model.
ModelMAEMAPERMSER2Adjusted R2
ElasticNet0.6056.6040.7390.7630.753
LightGBM0.5315.7760.6990.7870.734
GWR0.58818.9160.7400.7610.701
ElasticNet + LightGBM0.5325.7910.6990.7890.788
Composite Model0.5365.8240.6880.7950.795
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, P.; Zhai, M.; Zhang, L. A Study on Imbalances in Urban Internal Spatial Capacity Allocation Based on High-Precision Population and Land Value Distribution Data. Smart Cities 2025, 8, 110. https://doi.org/10.3390/smartcities8040110

AMA Style

Wu P, Zhai M, Zhang L. A Study on Imbalances in Urban Internal Spatial Capacity Allocation Based on High-Precision Population and Land Value Distribution Data. Smart Cities. 2025; 8(4):110. https://doi.org/10.3390/smartcities8040110

Chicago/Turabian Style

Wu, Peiru, Maojun Zhai, and Lingzhu Zhang. 2025. "A Study on Imbalances in Urban Internal Spatial Capacity Allocation Based on High-Precision Population and Land Value Distribution Data" Smart Cities 8, no. 4: 110. https://doi.org/10.3390/smartcities8040110

APA Style

Wu, P., Zhai, M., & Zhang, L. (2025). A Study on Imbalances in Urban Internal Spatial Capacity Allocation Based on High-Precision Population and Land Value Distribution Data. Smart Cities, 8(4), 110. https://doi.org/10.3390/smartcities8040110

Article Metrics

Back to TopTop