Next Article in Journal
Escaping the Rising Flow: A Social Force Model for Underground Flood Evacuation Incorporating Drag, Heterogeneity, and Leader-Following
Previous Article in Journal
A Hybrid ARIMA-CNN-LSTM Framework Based on Serial Decomposition for Non-Stationary Water Level Forecasting in Qinghai Lake
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model

1
School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
2
School of Civil Engineering, Beijing Jiaotong University, Beijing 100044, China
3
Beijing Urban Construction and Transport Planning & Design Institute Co., Ltd., Beijing 100037, China
4
Beijing Urban Construction Design & Development Group Co., Ltd., Beijing 100037, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(6), 264; https://doi.org/10.3390/ijgi15060264
Submission received: 30 January 2026 / Revised: 2 April 2026 / Accepted: 7 June 2026 / Published: 12 June 2026

Abstract

Although fine-grained spatial knowledge of the urban population distribution is fundamental for effective urban management, traditional census data lack sufficient resolution. Current disaggregation methods often struggle to probabilistically fuse heterogeneous data, such as noisy mobile signaling and building attributes, while ensuring hierarchical consistency between micro-level predictions and macro-level ground truth. To address these gaps, this study proposes a Bayesian-informed hierarchical learning (BIHL) model framework for building-level population estimation. The methodology integrates three distinct layers: (1) a data-driven prior model using a LightGBM ensemble to generate initial probabilistic estimates and uncertainty weights; (2) an enhanced neural network posterior estimator featuring a multi-branch architecture—incorporating Zone Bias Embedding and Zone Interaction networks—to capture non-linear urban dynamics and spatial heterogeneity; and (3) a constrained optimization layer utilizing a hierarchical loss function that enforces strict consistency between aggregated building estimates and official census data through dynamic curriculum learning. Through empirical validation in Haidian District, Beijing, it is demonstrated that the BIHL framework significantly outperforms baseline models (MLR, Random Forest, and LightGBM), achieving a Mean Absolute Percentage Error (MAPE) of 11.36%. This study confirms that incorporating building-level spatial locations and residential categories is vital for mitigating “spatial smoothing” and systematic under-prediction in high-density areas. This framework provides a robust, high-fidelity solution for generating residential population layers, which are essential for city planning.

1. Introduction

Accurate and fine-grained spatial knowledge of urban population distribution is fundamental to effective urban management and transportation planning [1]. Modern transportation systems, including optimized public transit networks and precise Travel Demand Models (TDMs), rely heavily on population density information [2,3]. However, the demographic data traditionally provided by national census reports are aggregated to coarse administrative or statistical boundaries (e.g., census tracts or townships), thus failing to capture the significant population heterogeneity within these zones [4,5]. This fundamental limitation leads to suboptimal traffic analysis, inaccurate demand forecasting, and ineffective emergency response strategies that cannot account for population at the micro-level [6]. Therefore, the task of spatializing coarse census counts into highly resolved population layers, specifically at the scale of individual buildings or residential units, remains a persistent and high-value frontier in urban transport research [6,7].
The most widely adopted technique for population disaggregation is dasymetric mapping, which uses ancillary spatial data to guide the reallocation of population from large source areas to smaller target units [8,9,10]. Early dasymetric approaches relied on simple land cover classifications or large-area indicators like Nighttime Light (NTL) imagery [10,11,12]. More recently, the field has advanced significantly by leveraging high-resolution imagery, LiDAR, and Point-of-Interest (POI) data to capture the urban form at the micro-scale. Specifically, models integrating building characteristics (e.g., footprint area, height, volume) and urban functional zoning (derived from POI/AOI data) have demonstrated superior accuracy by establishing direct, data-driven relationships between residential structures and population counts [13]. Machine learning approaches, such as Random Forest (RF) and ensemble methods, have become state-of-the-art tools for capturing the complex, non-linear correlations between these geospatial features and population distributions [14].
However, traditional machine learning models often rely on shallow architectures that may struggle to extract high-level semantic representations from high-dimensional urban features [15]. Consequently, deep learning (DL) techniques, such as Convolutional Neural Networks (CNNs) and multi-layer perceptrons (MLPs), have recently gained traction due to their superior capability in modeling non-linear urban dynamics [16]. Despite their predictive power, a critical limitation in most standard DL-based population models is the neglect of spatial dependency and spatial non-stationarity [17]. This oversight often leads to ‘spatial smoothing’ issues where local population spikes (e.g., high-density apartments) are under-predicted because the model fails to encode the specific spatial context or neighborhood effects [17].
While building-based models excel at capturing the static residential population, the increasing necessity for understanding the dynamic ambient population distribution—which is essential for real-time traffic management and dynamic OD generation—has led to the introduction of mobile phone signaling data (or Location-Based Service data) as a critical, high-frequency data source [8,18,19]. However, when integrating these multi-source data, specifically fusing noisy, spatiotemporally sparse mobile sensing data with static, structural building information presents a major methodological challenge [20]. Current fusion practices often involve deterministic overlay rules or rely on non-probabilistic machine learning models [21,22]. A key limitation of these non-probabilistic methods is their inability to formally quantify and propagate the uncertainty inherent in each heterogeneous data source, nor can they naturally integrate domain knowledge (such as a building’s likely function) as a probabilistic prior [23,24]. This rigidity often leads to suboptimal allocation and limited interpretability in fine-scale predictions. Furthermore, another challenge in multi-source fusion is ensuring hierarchical consistency between micro-scale predictions and macro-scale ground truth [25]. Most existing disaggregation methods operate in a ‘bottom-up’ manner without a feedback mechanism to strictly constrain the sum of estimated building populations to match the official census data [26], which breaks the optimality of the model and fails to propagate the error gradients back to the feature learning stage. Developing a framework that can intrinsically learn micro-level features while satisfying macro-level aggregate constraints remains an open problem.
To address the critical research gap in robust, probabilistic data fusion for micro-scale residential population estimation, the major contributions of this study are twofold.
1. Methodological innovation: We propose a Bayesian-informed hierarchical learning model framework that probabilistically fuses heterogeneous data (mobile signaling, POI, building features) [27]. Unlike deterministic approaches, this model is rigorously optimized based on both micro-level feature fidelity and macro-level statistical consistency.
2. Practical application: By effectively incorporating building functional types as probabilistic priors, the framework significantly enhances the discrimination of residential capacity at the individual building level. This yields superior prediction accuracy over state-of-the-art baselines, providing high-fidelity spatial data that are essential for city planning.
The remainder of this paper is organized as follows. Section 2 describes the data sources and data preprocessing, while Section 3 details the proposed model framework. Section 4 presents the results, including a comparative analysis against several benchmark models, and discusses the findings. Finally, Section 5 provides the conclusions and suggests avenues for future research.

2. Data Description and Preprocessing

For clarity, Table 1 summarizes the main symbols used in this study.

2.1. Data Description

This study utilized multiple heterogeneous data sources, including building footprint (BF) data, Area of Interest (AOI) data, Point of Interest (POI) data, public transit route and stop data, grid-based population data derived from mobile signaling, and subdistrict-level census data. These datasets differ substantially in spatial scale and information type, and were used to jointly support the task of fine-scale population estimation. A comprehensive summary of their sources, spatial and temporal coverage, key attributes, and roles in this study is provided in Table 2.
BF data define the fundamental spatial units at the building level and provide associated structural attributes. AOI and POI data capture urban functional characteristics, which are used to infer building usage and construct prior information. Public transit route and stop data reflect accessibility features, while grid-based population data derived from mobile signaling data provide information on population distribution and activity intensity. Subdistrict-level census data serve as macro-level statistical constraints to ensure the consistency of the estimation results.
These heterogeneous data sources are jointly integrated within the proposed framework. On one hand, functional and spatial attributes (e.g., AOI, POI, and building features) are used to construct prior information; on the other hand, mobile signaling data provide observational support, while census data impose constraint conditions. Together, they enable fine-grained population allocation at the building level.

2.2. Data Preprocessing

2.2.1. Building Type Inference

To acquire a more precise building typology of BF data, this research fused the AOI and POI datasets to construct a building-type inference algorithm. First, we leveraged geospatial features to integrate the BF data with the AOI and POI datasets using a geographic information system (GIS) tool. Subsequently, we calculated the intersection characteristics (Ra,i and Rp,i) between the footprint of each building i and the AOI and POI datasets, as shown in Equation (1).
R a , i = S a , i S i , R p , i = N p , i N i
where Sa,i is the spatial overlap area between building i and the AOI of type a, Np,i is the number of POIs of type p in the area of building i, Si is the area of building i, and Ni is the total number of POIs in the area of building i.
Finally, two inference models were developed for buildings characterized by different feature parameters: an AOI-based method and an AOI-POI-based method.
The AOI-based classification method is applied to buildings satisfying the condition Ra,i > threshold. With this method, the building is directly assigned the type attribute of the AOI with maximum Ra,i. These types include general residential housing, apartment complexes, villas, dormitories, and mixed-use (commercial–residential) buildings. Here, the threshold represents the critical value for the spatial overlap ratio parameter between the building footprint and the AOI, which was set to 0.5 in this study [28].
The AOI-POI-based inference approach is utilized for buildings satisfying the condition Ra,i ≤ threshold. We count the amount (Np,i) of POIs with type p in the area of building i and calculate the POI amount ratio Rp,i using Equation (1). Type k of AOI or POI with the maximum Rk,i is set as the type of building i.

2.2.2. Residential Area Calculation

After obtaining the building types, the residential floor area Ai of each building i is calculated. In this study, buildings related to residential area include residential (such as residential housing, apartment, villas, dormitories) and mixed-use buildings.
For residential buildings, the residential floor area is calculated using the number of floors multiplied by the GFA. For mixed-use buildings, it is assumed that the first floor is designated for commercial purposes and the upper floors are residential; therefore, the residential floor area is calculated as GFA × (number of floors − 1). For other types of buildings, the residential floor area is assigned a value of 0.

2.2.3. Pseudo-Residential Population

To obtain an initial value for each building, we disaggregate the grid-level mobile user count to the building level with an area-weighted approach. This method assumes a uniform distribution of mobile users across each grid cell. Consequently, the pseudo-mobile user count (Mi) for building i can be calculated with Equation (2), as follows:
M i = A i j G ( i ) A j × M G ( i )
where G(i) is the grid cell where building i is located and Ai is the residential floor area of building i.

2.2.4. Other Features

Prior research has demonstrated that both the density of POIs and transport accessibility around buildings exhibit a significant correlation with population size [3,12,29]. Specifically, POI density reflects the intensity of human activities and the distribution of service facilities in cities, and has been widely used as a core proxy variable for fine-scale residential population estimation [12,29]. Transport accessibility (e.g., bus stop density, road network connectivity) is closely related to residential attractiveness and population distribution patterns, and this conclusion has been fully validated in multi-source data-driven population mapping studies [3]. Subsequently, the numbers of POIs (Pi) and bus stops (Bi) within a 500 m buffer for each building were counted. To mitigate the impact of extreme values on the model training process, we applied truncation to the continuous features (i.e., POI count and bus stop count); in particular, any value above the 99th percentile was truncated to the value of the 99th percentile. Following this principle, the features were standardized using Z-score normalization, ensuring that each feature has a mean of 0 and standard deviation of 1.
In addition, because different residential types—such as standard residential buildings, dormitories, apartments, and villas—exhibit substantial variation in population density, this study further extracted residential sub-categories as input features. Moreover, since Beijing’s central districts developed earlier while the peripheral areas developed later, even standard residential buildings show significant differences in per capita living space across different locations. Therefore, building location information is also incorporated as an input feature. Considering the ring road development pattern of Beijing, ring roads are used as spatial boundaries to assign locational attributes to individual buildings.

3. Methodology

This study proposes a Bayesian-informed hierarchical learning (BIHL) model framework to estimate individual building populations, as shown in Figure 1, which enables fine-grained disaggregation of a population from coarse-grained census data to a building-level distribution with three layers. The first layer is a data-driven prior model that employs a LightGBM ensemble to generate initial probabilistic estimates and quantify the uncertainty of building-level populations. The second layer is a neural network-based estimator that learns the non-linear relationship between building attributes and population. The third layer acts as a deterministic proxy of Bayesian updating. It optimizes a heuristic multi-objective loss function, where the prior uncertainty from the first layer dynamically modulates the micro-level feature learning using confidence weights (wi), while macroscopic census totals and spatial smoothing act as robust regularizing constraints.

3.1. First Layer: Data-Driven Prior Model

This layer aims to formulate a regression model to generate prior values of building-level population derived from some key building features. A LightGBM model is employed to train the regression model, which is an efficient machine learning framework based on Gradient Boosting Decision Trees. It offers a fast training speed, low memory consumption, and captures non-linear relationships between variables, making it well-suited for large-scale data analytics. However, before establishing the model, it is crucial to determine the underlying statistical distribution of the target variable to ensure the theoretical validity of the prior formulation.

3.1.1. Exploratory Data Analysis of Prior Distribution

To identify a suitable prior distribution for the pseudo-residential population (Mi) derived from area-weighted mobile signaling, exploratory data analysis was conducted. Building-level population counts typically exhibit extreme right-skewness and heavy-tail characteristics, as illustrated in Figure 2A. To mitigate this, a natural logarithmic transformation, log(1 + Mi), was applied. Figure 2B shows that this transformation effectively suppresses the heavy tail, yielding a roughly symmetric, bell-shaped distribution (with empirical fit parameters μ = 5.41 and σ = 1.83). The corresponding Q–Q plot (Figure 2C) further confirms that the central quantiles of the transformed data align tightly with the theoretical normal distribution, despite minor deviations at the extreme tails caused by the deterministic outlier clipping during preprocessing.
Moreover, goodness-of-fit tests were performed to compare the adopted log-normal assumption against the log-skew-normal distribution with log-transformed data, as well as to evaluate Poisson and Negative Binomial (NB) distributions fitted to the original pseudo-residential population data. Poisson and NB models are commonly used for count data [30]. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were calculated to evaluate model fit, as summarized in Table 3.
As the initial proxy targets (Mi) in this study are continuous real numbers derived from spatial area-weighted disaggregation, artificial rounding was applied to force discrete fitting for Poisson and NB distributions. However, these distributions yield higher AIC and BIC values, indicating a fundamentally poor model fit, as demonstrated in Table 3. In comparing the two continuous alternatives, the log-skew-normal distribution accounts for residual skewness and achieves a better statistical fit (with the lowest AIC and BIC). However, within the proposed BIHL framework, the prior distribution needs to facilitate the propagation of the LightGBM ensemble’s mean and variance into the neural network’s loss function. Introducing complex skewness integration parameters would compromise the stability and differentiability of the deep learning optimization process. Given the negligible difference in goodness-of-fit between the two continuous models, this study utilizes the standard log-normal distribution to formulate the population prior distribution for balancing symmetric error assumptions with computational efficiency.

3.1.2. Prior Model Formulation

Informed by the log-normal distribution assumption and existing research [14,31,32], the residential area, POI count, bus stop count, residential type, building location, and mobile phone user amount are utilized as independent variables, and the log-transformation of the pseudo-residential population log(1 + Mi) is utilized as the target variable.
To quantify the uncertainty associated with this initial estimation, an ensemble of K models were trained (in this study, K = 5 was set based on multiple simulations), with each model initialized using a different random seed. The ensemble’s predictive mean and variance are defined in Equation (3).
μ i prior = 1 K k = 1 K M ^ i , k σ i prior 2 = 1 K k = 1 K M ^ i , k μ i prior 2
where M ^ i , k   is the value for building i predicted by the kth LightGBM model,   μ i p r i o r and σ i p r i o r are the prior mean and prior standard deviation for building i, respectively. To align with the log-normal assumption, the prior mean (μlog,i) and variance ( σ l o g , i 2 ) in the logarithmic space are derived from the ensemble statistics with Equation (4), as follows:
μ log , i = log 1 + μ i prior σ log , i 2 2 σ log , i 2 = log 1 + σ i prior 2 ( 1 + μ i prior ) 2
Consequently, the prior distribution of the log-transformed building population is defined as shown in Equation (5).
log 1 + p i N μ log , i , σ log , i 2

3.2. Second Layer: Enhanced Neural Network Posterior Estimator

To address the spatial heterogeneity and feature sparsity inherent in building-level population estimation, this layer includes an enhanced neural network posterior estimator, as illustrated in Figure 3. This model overcomes the limitations of conventional single-path fully connected networks and introduces an innovative multi-branch fusion architecture, which takes as input a high-dimensional building feature vector and corresponding census zone information. These inputs are processed through three parallel subnetworks.

3.2.1. MLP Base Network

This branch network extracts fundamental feature representations. It consists of three hidden layers with dimensions of 256, 128, and 64, respectively. Each linear transformation is followed by batch normalization to accelerate convergence and an ReLU activation function to introduce non-linearity. To alleviate overfitting, a progressively decreasing dropout scheme is applied. The final layer adopts layer normalization, ensuring stable output distributions and enabling the extraction of generalizable building–population mapping patterns.

3.2.2. Zone Bias Embedding

To capture systematic differences across census zones (e.g., baseline density differences between core urban areas and peripheral areas), the model incorporates an embedding layer to learn a zone-specific prior intercept, as shown in Figure 4.
Let Z be the total number of unique census zones within the study area. For any building located in a census zone z, the layer extracts a unique scalar bias term bz from matrix E based on the discrete zone ID. Then, this bias bz is added to the output of the base feature extraction MLP network, allowing the model to shift the baseline of the population density estimation to that specific spatial unit. From an implementation perspective, the weights of the matrix E are randomly initialized using a small-standard-deviation Gaussian distribution prior to training—which was set as N(0, 0.05) in this study—and the final values are jointly learned by minimizing the final hierarchical loss function.

3.2.3. Zone Interaction Network

To model the interaction between building features and spatial context, such as the differing population capacities of buildings with identical floor areas in core urban areas and peripheral areas, an interaction branch is introduced. This branch follows two linear layers with a ReLU layer structure and applies a scaling factor α, which dynamically modulates the contribution of building features to the final prediction, enhancing the model’s adaptability to complex spatial environments. In this study, α = 0.1 after multiple simulations.
The outputs of the three branches are combined additively, as shown in Equation (6).
y ^ i std = f MLP ( x i ) + Embed z + α f interact x i
where y ^ i s t d is the standardized prediction for building i, and α is the weight coefficient for the interaction term.
The final output represents the predicted population in the logarithmic space, which is defined in Equation (7). This fusion strategy effectively decouples global structural patterns (captured by the MLP branch) from local spatial deviations (captured by the embedding and interaction branches). As a result, the model maintains strong generalization capability while substantially improving estimation accuracy for specific spatial zones.
y ^ i std = log ( 1 + p i ) μ log σ log
where pi is the true population of building i, which is a latent variable in this study, and μlog is the mean deviation of the log-transformed population, which is derived from the LightGBM prior. Therefore, the prediction on the original population scale can be obtained through an inverse transformation, as shown in Equation (8).
p ^ i = exp ( σ log y ^ i std + μ log ) 1
where P ^ i is the predicted population for building i.

3.3. Third Layer: Hierarchical Loss Function and Spatial Constraints

To address the challenge of disaggregating population data from the coarse census level to the fine-grained building level, we propose a multi-objective optimization framework that integrates micro-level feature learning and macro-level consistency constraints into a unified differentiable loss function. The total objective function Ltotal is composed of two distinct components, as shown in Equation (9).
L total = L micro + λ t L macro
where Lmicro and Lmacro are the loss functions at the micro- and macro-level, respectively, and λ(t) is a time-dependent dynamic weight, which is formulated to prevent optimization instability caused by the conflicting gradients of micro- and macro-objectives. It is designed as a linear warm-up and cosine annealing strategy, calculated with Equation (10) as follows.
λ t = λ max t T warm   if   t T warm λ max 0.5 1 + cos t T warm T T warm π   otherwise
where λmax is the maximum penalty weight for the macroscopic constraint, Twarm is the designated number of warm-up epochs, T is the total number of training epochs, and t is the current training epoch.
During the initial phase (tTwarm), a linear warm-up is employed. Applying a macro-penalty from the first epoch may cause gradient shocks to the uninitialized network. The linear ramp-up gradually introduces the macroscopic constraint, allowing the network to first prioritize learning fundamental micro-level feature representations from the proxy labels before heavily incorporating regional boundaries.
Subsequently (t > Twarm), a cosine annealing schedule is applied to smoothly decay the weight. In the context of spatial population disaggregation, once the initial macro-penalty successfully constrains the optimization space—ensuring that the model’s aggregated predictions broadly align with the regional census total—maintaining a rigid penalty can over-constrain the network. As widely recognized in spatial analysis, regional census data inherently suffer from spatial aggregation biases (e.g., the Modifiable Areal Unit Problem) [33], which can conflict with the fine-grained physical realities. By smoothly annealing the macroscopic weight, the network realizes high-precision micro-level fine-tuning in the later stages of training. This dynamic relaxation ensures that the model converges to an optimal state, thus achieving fine-grained spatial micro-heterogeneity without escaping the established macroscopic demographic bounds.

3.3.1. Micro-Level Proxy Loss (Lmicro)

Since ground-truth population data at the building level are typically not available, this study utilized the soft labels generated by the ensemble-LightGBM model as a proxy supervision signal. To ensure robustness against noise and outliers in these proxy labels, SmoothL1loss—a piecewise loss function that combines the advantages of both L1 and L2 loss and improves training stability—was employed rather than the mean squared error (MSE).
Let N be the number of buildings in a mini-batch, y ^ i the predicted value, and yproxy,i the standardized proxy label for building i. The micro-level proxy loss is defined in Equation (11).
L proxy = 1 N i = 1 N w i SmoothL 1 Loss ( y ^ i , y proxy , i )
where wi is the confidence weight derived from the ensemble variance, ensuring that the network prioritizes learning from high-confidence samples while down-weighting ambiguous predictions.

3.3.2. Macro-Level Census Constraint (Lmacro)

The core of the proposed hierarchical framework is the differentiable aggregation constraint. This study enforces a constraint which ensures that the sum of predicted populations for all buildings within a specific census zone must approximate the true census count.
Let Z be the number of census zones. For a given zone z, let Bz denote the set of buildings belonging to the zone. The aggregated prediction P ^ z is calculated by summing the exponentiated outputs of the network, as shown in Equation (12).
P ^ z = i B z exp ( σ y y ^ i + μ y ) 1
where σy and μy are the normalization parameters. To handle the large variance in population magnitudes across different zones, this study minimizes the mean squared relative error (MSRE) rather than the absolute error, as shown in Equation (13).
L macro = 1 Z Z = 1 Z P ^ z P census , z P census , z 2
where Lmacro is the macro-level loss, Z is the number of zones, and Pcensus,z is the census population for zone z.
This constraint acts as a posterior regularization term, correcting the bias in the micro-level predictions by grounding them to the official census statistics.
The training process employs an alternating optimization strategy. In each training epoch, mini-batch data are used to optimize the micro-level proxy loss. Subsequently, the full-batch dataset is utilized to optimize the macro-level census constraint.

3.4. Evaluation Metrics

Based on the BIHL model constructed above, the population of each building was estimated. As the ground-truth residential population at the building level is difficult to obtain, the predicted building-level population was aggregated at the subdistrict level. The aggregated results are then compared with the census population of each subdistrict. The evaluation metrics include the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE), as shown in Equations (14)–(16).
MAE = 1 n i = 1 n y i f i
MAPE = 100 % n i = 1 n y i f i y i
SMAPE = 100 % n i = 1 n 2 × y i f i y i + f i
where n is the number of communities in the test set, yi denotes the true population of community i, and fi denotes its estimated population.

3.5. Model Implementation

The proposed BIHL framework was implemented on a workstation equipped with an Intel Xeon W-2235 CPU (3.80 GHz), 32.0 GB of RAM, and an NVIDIA RTX A2000 GPU (12 GB). The deep learning framework was implemented using PyTorch 2.4.1; the complete configuration and hyperparameter settings are detailed in Table 4. And the implementation pseudo-code of BHIL model framework is shown in Appendix A.
Regarding the network architecture, the micro-level feature extraction module was constructed with three hidden layers, progressively decreasing in dimensionality [256, 128, 64]. To mitigate overfitting and ensure stable gradient propagation across deep layers, a combination of Batch Normalization and Layer Normalization was utilized. Furthermore, to regularize the network while maintaining the precision required for continuous population estimation, we implemented a layer-decaying dropout strategy [34]. The dropout probability starts at 0.2 for the first hidden layer—to effectively break the co-adaptation of raw input features—and decays by a factor of 0.8i for subsequent layers (where i is the layer index). This progressive decay ensures a smooth transition from robust feature extraction at the bottom layer to stable, deterministic representations at the final regression head.
During the training phase, the model was optimized using the AdamW optimizer with a batch size of 128. To prevent the previously discussed gradient shocks—especially when the macroscopic penalty is introduced—gradient clipping was performed with a maximum norm empirically set to 0.5 to ensure stable parameter updating. The maximum number of training epochs was set to 80; to prevent potential overfitting, a standard early stopping mechanism was employed with a patience of 30 epochs.

4. Results and Discussions

4.1. Study Area Overview

To empirically validate the proposed BIHL model framework, a comprehensive case study was conducted in Haidian District, Beijing, with a specific focus on the Beixiaguan Subdistrict, as shown in Figure 5. Located in the northwestern part of Beijing city core, Haidian District is the city’s primary academic and technological hub. The selected study area comprises a dense mixture of building typologies, including old residential compounds, modern high-rise apartment complexes, student dormitories, and mixed-use commercial–residential structures. This architectural diversity, coupled with a remarkably high population density, makes it an ideal example for micro-scale population estimation.
The modeled region includes a total of 2763 individual buildings. The official macroscopic census population for this target area is 146,366 residents, with an average regional population density of approximately 24,340 persons per square kilometer. The initial grid-based mobile signaling data used to formulate the probabilistic priors have a spatial resolution of 250 m × 250 m. A comprehensive statistical summary of the dataset is presented in Table 5, which details the distributions of key building-level variables—such as footprint area, gross floor area (GFA), and number of floors—reporting their mean values, standard deviations, and extreme deciles (10th and 90th percentiles).

4.2. Ablation Experiment

Given that this study introduces additional variables (such as building-level spatial location and residential category information), which have received limited attention in prior work, an ablation experiment was conducted to assess their incremental contribution to the predictive performance. The prediction metrics are reported in Table 6.
The results show that omitting building-level spatial information leads to a substantial rise in MAPE, reaching 23.40%, along with a pronounced increase in the discrepancy between MAPE and SMAPE. This indicates that spatial location is a key determinant of prediction accuracy and stability in building-level population estimation. This effect is consistent with Beijing’s ring-structured urban form: the central districts were developed earlier and are dominated by compact, high-density residential stock, whereas peripheral areas were developed later and typically feature larger dwelling units to meet higher living-quality expectations. Such spatial heterogeneity directly influences population density patterns, and thus affects model performance when spatial variables are excluded.
Similarly, removing residential category information results in a significant increase in MAPE to 16.42%, again accompanied by a notable widening of the MAPE–SMAPE gap. This underscores the strong contribution of residential category information to reducing prediction errors and improving output stability. The underlying mechanism is that different residential types—such as standard apartments, villas, and dormitories—exhibit substantial variation in per capita floor area, which directly affects population allocation at the building scale. Incorporating residential type therefore provides essential structural information that enhances the predictive accuracy of building-level population models.
Moreover, to justify the proposed dynamic weighting strategy for the macroscopic constraint λ(t)—which combines a linear warm-up phase with a cosine annealing schedule—an ablation experiment was conducted by comparing it against a constant λ baseline. Figure 6 illustrates the training loss convergence trajectories for both strategies.
As observed for the constant weighting strategy (yellow dots), applying the maximum macroscopic penalty from the very first epoch induces gradient shock. The uninitialized network is forced to resolve conflicting gradients between micro-feature learning and macro-aggregation. This results in a high initial loss spike (exceeding 0.3) and a highly unstable, inefficient early descent trajectory. Furthermore, the conflicting gradients trap the model in a suboptimal state, converging to a noticeably higher final loss.
In contrast, our proposed dynamic weighting strategy (blue dots) effectively eliminates this optimization bottleneck. During the linear warm-up phase, the near-zero initial penalty allows the network to undergo a robust initialization, rapidly and smoothly minimizing the loss to capture essential spatial micro-heterogeneity. As the training progresses into the cosine annealing phase, the network calibrates the multi-objective optimization without disrupting the already learned feature representations.
Consequently, the dynamic strategy not only converges significantly faster but also achieves a lower and more stable final loss. This ablation test demonstrates that the curriculum learning approach (linear warm-up followed by cosine annealing) is a necessary mechanism to ensure stable and efficient gradient descent in our BIHL framework.

4.3. Model Verification

To verify the advancement of the BIHL model framework proposed in this study, some commonly used approaches in the existing literature, including multiple linear regression (MLR) and random forest (RF) models, were implemented for comparison in this study, and the prediction metrics are shown in Table 7.
The results reveal that the MLR model performs the worst, with a MAPE as high as 30.20%. Moreover, it shows the largest gap between MAPE and SMAPE, indicating poor stability in the error distribution and the presence of extreme prediction values. In contrast, the HBM model proposed in this study demonstrates superior performance across all three subdistrict-level error metrics, achieving a MAPE and SMAPE of 11.36% and 11.26%, respectively, exhibiting the smallest gap between them, suggesting a more stable error distribution.
To validate the fine-grained disaggregation performance of the proposed model, an independent manual survey was conducted. Recognizing that absolute building-level ground truth is inherently unavailable, we established an estimated proxy ground truth for validation. Specifically, a stratified random sampling strategy was employed to select 20 residential buildings within each subdistrict. This stratification ensured that buildings of varying residential types (e.g., general residential housing, mixed-use building) and building location (e.g., core urban area or peripheral area) were proportionally represented. For each selected building, the exact number of households was determined through manual field surveys. The building-level residential population was then estimated by multiplying the household count by the average household size reported in the Haidian District Statistical Yearbook (2023). This surveyed dataset was only utilized as an independent hold-out validation set and was completely excluded from the training phase of all evaluated models.
These estimated validation values were compared with the building-level predictions produced by the four methods (MLR, RF, LightGBM, and the proposed framework), as visually presented in the scatter plots in Figure 7. To quantify the performance and address the uncertainty of building-level estimations, we calculated the Coefficient of Determination (R2), Mean Bias Error (MBE), and 95% Confidence Intervals (95% CIs) of the absolute errors for each model, which are given in Table 8.
As illustrated in Figure 7, the blue scatter points represent the predicted versus actual estimated populations, with the orange diagonal line denoting perfect alignment. Scatter points closer to this reference line indicate higher consistency. The MLR model exhibits highly dispersed scatter points and a low R2, indicating a poor capture of micro-level spatial heterogeneity. The RF and LightGBM models show similar, improved patterns; however, they exhibit a noticeable negative bias (MBE < 0), consistently underestimating populations, particularly for medium-to-high density buildings (e.g., actual populations of 400–500).
In contrast, the proposed framework demonstrates superior disaggregation fidelity. By integrating dynamic curriculum weighting and total population constraints, the proposed method effectively mitigates the underestimation bias observed in the tree-based models, yielding the MBE closest to zero and the highest R2. Furthermore, the narrower 95% CI of the proposed method indicates significantly reduced predictive uncertainty at the micro-level. This quantitative and visual evidence solidifies the effectiveness of the proposed multi-objective optimization approach in maintaining spatial micro-heterogeneity without violating macro-level boundaries.

4.4. Result Visualization

To intuitively evaluate the spatial disaggregation performance of the proposed BIHL model framework, we visualize the building-level population estimation results for the Beixiaguan Subdistrict in Haidian District, Beijing. Figure 8 presents the fine-grained residential population distribution from a 2D perspective, providing a clear overview of the population density variations across the urban fabric, while Figure 9 displays the results from a 3D perspective, where the color gradient is proportional to the estimated population count with the real building height.
As illustrated in Figure 8, the BIHL framework successfully captures the high degree of spatial heterogeneity inherent in complex urban environments, effectively discriminating between different building typologies. High-rise apartment complexes and student dormitories, depicted in deep red, are accurately identified as population hotspots. Even within the same census block, Figure 8 reveals substantial variations among adjacent buildings, as the model integrates building geometry and POI-based typology to assign higher values to residential buildings while maintaining low values for mixed-use buildings. Figure 8 highlights the Jiaoda community (comprising the academic and residential areas of Beijing Jiaotong University), delineated by a green dotted line, and the China Meteorological Administration (CMA) community (encompassing its working and residential zones), enclosed by a blue dashed line. Driven by the high concentration of high-rise apartment blocks and dense dormitories, the population distribution in the Jiaoda community exhibits a dark red color. Conversely, the residential areas within the CMA community are dominated by low-rise residential buildings, resulting in a more orange representation of its population distribution. These findings are highly consistent with the 3D building characteristics depicted in Figure 9.

5. Conclusions

In this study, a Bayesian-informed hierarchical learning (BIHL) model framework was designed to fill the gap between data-driven deep learning and formal probabilistic constraints for building-level population estimation. By treating residential population as a latent variable constrained by mobile signaling priors and hard macro-level census data, the proposed method provides a robust solution for fine-grained urban modeling.
To validate the model’s performance, a case study was conducted using data from Haidian District, Beijing. The experimental results demonstrated that the proposed framework significantly outperforms traditional models (MLR, RF), achieving superior predictive accuracy and robustness with a MAPE of 11.36%. The minimal divergence observed between the MAPE and Symmetric MAPE (SMAPE) metrics further corroborates the model’s stability in handling complex error distributions compared to deterministic approaches. Through rigorous ablation studies, this research confirmed the critical role of micro-spatial features, underscoring that the integration of building-level spatial coordinates with residential category priors is indispensable for mitigating ‘spatial smoothing’ effects and enhancing model generalization. Furthermore, unlike standard machine learning approaches that typically exhibit pronounced under-prediction in high-density residential areas, the BIHL model framework effectively eliminates systematic bias and ensures physical consistency by employing a dynamic curriculum learning strategy coupled with hierarchical consistency constraints. This mechanism successfully balances micro-level feature extraction with macro-level census regularization. Ultimately, this work not only provides a robust methodology for generating high-fidelity, residential-level population layers but also establishes a solid foundation for next-generation Travel Demand Models (TDMs), offering significant theoretical and practical implications for optimizing urban public transit networks and formulating precise emergency response strategies in complex urban environments.
While the proposed BIHL model framework demonstrated significant accuracy in the empirical validation within Haidian District, certain limitations remain. First, the model is relatively sensitive to the quality of multi-source inputs; specifically, the inherent noise and spatiotemporal sparsity of mobile signaling data may constrain the stability of micro-level predictions. Second, the model’s reliance on site-specific Zone Embeddings hinders its direct transferability across different cities or regions, typically necessitating parameter recalibration when deployed in new environments. Third, from a methodological perspective, the current neural estimator provides deterministic point predictions optimized by gradient descent, lacking full posterior probability sampling to quantify the absolute uncertainty of the final outputs.
To address these challenges, future research could integrate multi-temporal mobile signaling data to extend the current static estimation into dynamic spatiotemporal forecasting. Simultaneously, to enable robust model transferability, future work should leverage richer urban spatial datasets and employ advanced methods, such as graph neural networks, to establish a generalized mapping between urban spatial features and population biases. Finally, a critical direction is the development of a fully probabilistic Bayesian inference system—such as one integrating Bayesian Neural Networks or Markov Chain Monte Carlo sampling—to rigorously provide posterior uncertainty bounds alongside the micro-level population estimates.

Author Contributions

Conceptualization, Jin Deng, Jianfeng Liu, and Yadi Zhu; methodology, Jin Deng and Yadi Zhu; software, Jin Deng; validation, Guanhua Yang and Zhou Hu; investigation, Guanhua Yang; data curation, Zhou Hu and Ying Deng; writing—original draft, Ying Deng; visualization, Jianfeng Liu; supervision, Jianfeng Liu; project administration, Jianfeng Liu; funding acquisition, Yadi Zhu. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Fundamental Research Funds for the Central Universities [Grant No.: 2024JBMC021], and the National Key Research and Development Programme of China under the title of ‘Intelligent Assessment and Dynamic Prediction of Spatial Operational Characteristics of Station-City Converged Stereoscopic Networks’ [Grant No. 2023YFC3807503].

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The original dataset includes mobile phone records and census data, and confidentiality agreements were signed with the data providers. Requests for access will require a clear description of the requester’s background and the intended use of the data to ensure compliance with these agreements.

Conflicts of Interest

Author Jianfeng Liu and Guanhua Yang were employed by the Beijing Urban Construction and Transport Planning & Design Institute Co., Ltd., Beijing 100037, China. Zhou Hu was employed by the Beijing Urban Construction Design & Development Group Co., Ltd., Beijing 100037, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Algorithm A1: Optimization Workflow of the BIHL Framework
Input: Building features X, Regional census populations Pcensus, Zone identifiers Z, Maximum epochs T, Warm-up epochs Twarm, Dynamic weight parameters λmax, Ensemble size K
Output: Fine-grained building-level population estimates P final
1: // Layer 1: Proxy Label Generation & Uncertainty Weighting
2: Normalize features XXnorm
3: Train baseline LightGBM model on Xnorm to obtain proxy labels yproxy
4: Train ensemble of K LightGBM models with varying hyperparameters
5: Calculate prediction variance σi2 for each building i from ensemble outputs
6: Compute confidence weights wi 1/(σi2 + ϵ) and normalize them
7: // Layer 2: Enhanced Neural Network Posterior Estimator Architecture
8: Initialize Base MLP network parameters and Interaction network parameters
9: Initialize Zone Bias Embedding matrix E
10: Define prediction function y ^ b = f θ X b , Z b
11:     y ^ base = MLP X b
12: bzb = E[zb] // Extract specific scalar bias for zone zb
13:      y ^ interact = Interaction X b
14:  return y ^ b = y ^ base + b z b + α · y ^ interact
15: // Layer 2: Multi-Objective Neural Network Training
16: Initialize neural network parameters θ
17: for t = 1 to T do
18:  // Micro-level Optimization (Mini-batch)
19:  for each mini-batch (Xb, yb, Zb, wb) do
20:       Predict standardized population log-values using Layer 2
21:       Compute unweighted Smooth L1 Loss: Lunweighted
22:       Compute weighted micro-loss: Lmicro
23:       Update θ by descending gradient ∇θLmicro with gradient clipping
24:  end for
25:  // Macro-level Spatial Constraint (Full-batch)
26:  Predict full-batch values y ^ all = f θ X all , Z all
27:  Transform predictions to original population scale: P ^ all = exp y ^ all · σ y + μ y 1
28: Aggregate building populations by zone: P ^ zone = Agg P ^ all , Z all
29: Calculate macro-level loss (Relative MSE): Lmacro
30:  // Dynamic Curriculum Weighting
31:  if tTwarm then
32:       λt = λmax·(t/Twarm)  // Linear warm-up
33:  else
34:       λt = λmax·0.5· [1 + cos(π (tTwarm)/(TTwarm))]  // Cosine annealing
35:  end if
36:  Compute total constraint loss: Ltotal = λt · Lmacro
37:  Update θ by descending gradient ∇θLmacro
38:  Update Learning Rate using Advanced Scheduler (Warm-up + Cosine Annealing)
39:  Check Early Stopping condition based on Validation Loss
40: end for
41: // Post-Calibration Stage42: Load θbest and generate raw predictions P ^ raw
43: for each zone j do
44:  Scale factor S j = P census j / i zone   j P ^ raw i
45:  Apply calibration: P ^ final i = P ^ raw i · S j
46: end for
47: return P ^ final

References

  1. Chen, M.; Xian, Y.; Huang, Y.; Zhang, X.; Hu, M.; Guo, S.; Chen, L.; Liang, L. Fine-scale population spatialization data of China in 2018 based on real location-based big data. Sci. Data 2022, 9, 624. [Google Scholar] [CrossRef] [PubMed]
  2. Wang, L.; Fan, H.; Wang, Y. Improving population mapping using Luojia 1-01 nighttime light image and location-based social media data. Sci. Total Environ. 2020, 730, 139148. [Google Scholar] [CrossRef] [PubMed]
  3. Zhou, Y.; Ma, M.G.; Shi, K.F.; Peng, Z.Y. Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data. ISPRS Int. J. Geo-Inf. 2020, 9, 369. [Google Scholar] [CrossRef]
  4. Azar, D.; Engstrom, R.; Graesser, J.; Comenetz, J. Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data. Remote Sens. Environ. 2013, 130, 219–232. [Google Scholar] [CrossRef]
  5. Szarka, N.; Biljecki, F. Population estimation beyond counts-Inferring demographic characteristics. PLoS ONE 2022, 17, e0266484. [Google Scholar] [CrossRef]
  6. Jing, C.; Zhou, W.; Qian, Y.; Yan, J. Mapping the Urban Population in Residential Neighborhoods by Integrating Remote Sensing and Crowdsourcing Data. Remote Sens. 2020, 12, 3235. [Google Scholar] [CrossRef]
  7. Xie, Y.; Weng, A.; Weng, Q. Population Estimation of Urban Residential Communities Using Remotely Sensed Morphologic Data. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1111–1115. [Google Scholar] [CrossRef]
  8. Tomás, L.; Fonseca, L.; Almeida, C.; Leonardi, F.; Pereira, M. Urban population estimation based on residential buildings volume using IKONOS-2 images and lidar data. Int. J. Remote Sens. 2016, 37, 1–28. [Google Scholar] [CrossRef]
  9. Ural, S.; Hussain, E.; Shan, J. Building population mapping with aerial imagery and GIS data. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 841–852. [Google Scholar] [CrossRef]
  10. Stevens, F.; Gaughan, A.; Linard, C.; Tatem, A. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef]
  11. Li, G.; Weng, Q. Fine-scale population estimation: How Landsat ETM plus imagery can improve population distribution mapping. Can. J. Remote Sens. 2010, 36, 155–165. [Google Scholar] [CrossRef]
  12. Qing, Y.X.; Wu, H.Y.; Qi, K.L.; Gui, Z.P.; Liu, Y.H.; Li, Z.Q.; Li, R. Integrating street-view imagery and points of interest for refining population spatialization: A case study in Wuhan City. Sustain. Cities Soc. 2024, 115, 105883. [Google Scholar] [CrossRef]
  13. Wu, B.; Yang, C.; Wu, Q.; Wang, C.; Wu, J.; Yu, B. A building volume adjusted nighttime light index for characterizing the relationship between urban population and nighttime light intensity. Comput. Environ. Urban Syst. 2023, 99, 101911. [Google Scholar] [CrossRef]
  14. Wang, M.; Wang, Y.; Li, B.; Cai, Z.; Kang, M. A Population Spatialization Model at the Building Scale Using Random Forest. Remote Sens. 2022, 14, 1811. [Google Scholar] [CrossRef]
  15. Robinson, C.; Hohman, F.; Dilkina, B.; Machinery, A.C. (Eds.) A Deep Learning Approach for Population Estimation from Satellite Imagery. In Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities (GeoHumanities), Los Angeles, CA, USA, 7–10 November 2017; Association for Computing Machinery: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
  16. Tiecke, T.G.; Liu, X.; Zhang, A.; Gros, A.; Li, N.; Yetman, G.; Kilic, T.; Murray, S.; Blankespoor, B.; Prydz, E.B.; et al. Mapping the world population one building at a time. arXiv 2017, arXiv:1712.05839. [Google Scholar] [CrossRef]
  17. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
  18. Cai, B.; Shao, Z.; Huang, X.; Zhou, X.; Fang, S. Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103399. [Google Scholar] [CrossRef]
  19. Li, M.; Zhang, H.; Chen, J. (Eds.) Fine-grained Dynamic Population Mapping Method Based on Large-scale Sparse Mobile Phone Data. In Proceedings of the 20th IEEE International Conference on Mobile Data Management (IEEE MDM), Hong Kong, China, 10–13 June 2019; IEEE Computer Society: Los Alamitos, CA, USA, 2019. [Google Scholar] [CrossRef]
  20. Liu, G.; Li, R.; Xia, J.; Liu, Z.; Cai, J.; Wu, H.; Peng, M. Dual-environment feature fusion-based method for estimating building-scale population distributions. Geo-Spat. Inf. Sci. 2024, 27, 1943–1958. [Google Scholar] [CrossRef]
  21. Lu, W.; Weng, Q. An ANN-based method for population Dasymetric mapping to avoid the scale heterogeneity: A case study in Hong Kong, 2016–2021. Comput. Environ. Urban Syst. 2024, 108, 102072. [Google Scholar] [CrossRef]
  22. Wang, Y.; Zhang, J. Integrating BP and MGWR-SL Model to Estimate Village-Level Poor Population: An Experimental Study from Qianjiang, China. Soc. Indic. Res. 2018, 138, 639–663. [Google Scholar] [CrossRef]
  23. Cao, Y.; Liu, J.; Wang, Y.; Wang, L.; Wu, W.; Su, F. A study on the method for functional classification of urban buildings by using POI data. J. Geo-Inf. Sci. 2020, 22, 1339–1348. [Google Scholar] [CrossRef]
  24. Liang, S.; Jiang, J. Analysis of Commercial Space Pattern and Driving Factors Based on POI Data and Nighttime Lighting Data--Taking Xuzhou Main City as an Example. Geospat. Inf. 2025, 1–6. Available online: https://link.cnki.net/urlid/42.1692.P.20251126.1111.010 (accessed on 24 March 2026).
  25. Tobler, W. Smooth Pycnophylactic Interpolation for Geographical Regions. J. Am. Stat. Assoc. 1979, 74, 519–530. [Google Scholar] [CrossRef]
  26. Zandbergen, P.A. Dasymetric mapping using high resolution address point datasets. Trans. GIS 2011, 15, 5–27. [Google Scholar] [CrossRef]
  27. Liu, X.; Clarke, K.; Herold, M. Population density and image texture: A comparison study. Photogramm. Eng. Remote Sens. 2006, 72, 187–196. [Google Scholar] [CrossRef]
  28. Longley, P.A.; Goodchild, M.F.; Maguire, D.J.; Rhind, D.W. Geographic Information Science and Systems; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  29. Bakillah, M.; Liang, S.; Mobasheri, A.; Arsanjani, J.J.; Zipf, A. Fine-resolution population mapping using OpenStreetMap points-of-interest. Int. J. Geogr. Inf. Sci. 2014, 28, 1940–1963. [Google Scholar] [CrossRef]
  30. Zhu, Y.; Chen, F.; Wang, Z.; Deng, J. Spatio-temporal analysis of rail station ridership determinants in the built environment. Transportation 2019, 46, 2269–2289. [Google Scholar] [CrossRef]
  31. Wang, S.; Li, R.; Jiang, J.; Meng, Y. Fine-scale population estimation based on building classifications: A case study in Wuhan. Future Internet 2021, 13, 251. [Google Scholar] [CrossRef]
  32. Liu, Z.; Gui, Z.; Wu, H.; Qin, K.; Wu, J.; Mei, Y.; Zhao, J. Fine-scale population spatialization by synthesizing building data and POI data. J. Geomat. 2021, 46, 102–106. [Google Scholar]
  33. Du, K.; Song, J.; Chen, D.; Li, M.; Zhu, Y. A Novel Traffic Analysis Zone Division Methodology Based on Individual Travel Data. Appl. Sci. 2025, 15, 156. [Google Scholar] [CrossRef]
  34. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Figure 1. BIHDL model framework.
Figure 1. BIHDL model framework.
Ijgi 15 00264 g001
Figure 2. Exploratory data analysis of pseudo-residential population: (A) lightcoral-colored histogram showcasing the heavy-tailed distribution of the original population data; (B) skyblue-colored density histogram of the log-transformed data, overlaid with a fitted normal density curve (black solid line), where μ and σ denote the fitted mean and standard deviation, respectively; (C) Quantile-Quantile (Q-Q) plot of the log-transformed data against a theoretical normal distribution, where skyblue dots represent empirical data points.
Figure 2. Exploratory data analysis of pseudo-residential population: (A) lightcoral-colored histogram showcasing the heavy-tailed distribution of the original population data; (B) skyblue-colored density histogram of the log-transformed data, overlaid with a fitted normal density curve (black solid line), where μ and σ denote the fitted mean and standard deviation, respectively; (C) Quantile-Quantile (Q-Q) plot of the log-transformed data against a theoretical normal distribution, where skyblue dots represent empirical data points.
Ijgi 15 00264 g002
Figure 3. Enhanced neural network posterior model architecture.
Figure 3. Enhanced neural network posterior model architecture.
Ijgi 15 00264 g003
Figure 4. Zone bias embedding architecture.
Figure 4. Zone bias embedding architecture.
Ijgi 15 00264 g004
Figure 5. Location of study area.
Figure 5. Location of study area.
Ijgi 15 00264 g005
Figure 6. Training loss convergence trajectories for dynamic and constant weighting strategies.
Figure 6. Training loss convergence trajectories for dynamic and constant weighting strategies.
Ijgi 15 00264 g006
Figure 7. Scatter plot comparison of proxy ground truth versus predicted building-level residential population across MLR, RF, LightGBM, and BIHL frameworks. Blue dots represent individual building-level data points. The orange dashed line represents the perfect alignment reference (y = x), while the green dash-dotted line denotes the empirical linear fit surrounded by a semi-transparent 95% confidence interval (CI) band. Insets display the performance metrics for each model: R2 (coefficient of determination), MBE (mean bias error), and the 95% CI of absolute error.
Figure 7. Scatter plot comparison of proxy ground truth versus predicted building-level residential population across MLR, RF, LightGBM, and BIHL frameworks. Blue dots represent individual building-level data points. The orange dashed line represents the perfect alignment reference (y = x), while the green dash-dotted line denotes the empirical linear fit surrounded by a semi-transparent 95% confidence interval (CI) band. Insets display the performance metrics for each model: R2 (coefficient of determination), MBE (mean bias error), and the 95% CI of absolute error.
Ijgi 15 00264 g007
Figure 8. Two-dimensional spatial distribution of estimated building-level population in Beixiaguan Subdistrict.
Figure 8. Two-dimensional spatial distribution of estimated building-level population in Beixiaguan Subdistrict.
Ijgi 15 00264 g008
Figure 9. Three-dimensional visualization of estimated building-level population in Beixiaguan Subdistrict.
Figure 9. Three-dimensional visualization of estimated building-level population in Beixiaguan Subdistrict.
Ijgi 15 00264 g009
Table 1. Notation used in this study.
Table 1. Notation used in this study.
SymbolDescriptionUnit
i, a, zIndices of buildings, AOIs, and zones-
N, K, ZNumber of buildings, samples, and zones-
Sa,i, SiOverlap area and building aream2
Ra,iOverlap ratio between building and AOI-
AiResidential floor area of building im2
MG(i)Population in grid containing building ipersons
xiFeature vector of building i-
p i , p ^ i True and estimated populationpersons
P ^ z , Pcensus,zEstimated and census population in zone zpersons
μiprior, σipriorPrior mean and standard deviation for building i-
μlog, σlogMean and standard deviation of the log-transformed population-
y ^ i s t d Standardized prediction-
fMLP (·)Multi-layer perceptron (MLP) mapping function-
wiConfidence weight-
Ltotal, Lmicro, LmacroTotal, micro-level, and macro-level loss functions-
λ (t)Dynamic loss weight-
Table 2. Summary of datasets used in this study.
Table 2. Summary of datasets used in this study.
Dataset NameData SourceKey AttributesSpatial ScaleNumerical Range
Building Footprint (BF) Data\building IDBuilding level1~2763
geometryPolygon
floor number1~40
GFA7~115,168
Area of Interest (AOI) Data\area IDArea level1~3890
typeCommercial and Residential, Science/Education/Culture, Healthcare, Government, Accommodation, Corporate, Scenic, etc.
boundariesSpatial extent of AOI parcels (polygon geometry)
Point of Interest (POI) DataAmap.comPOI IDPoint level1~7650
categoryCommercial Services, Residential, Office, Science/Education/Culture, Transportation, Government, Finance, Catering, Entertainment, etc.
location(Longitude, Latitude)
Public Transit Route and Stop DataAmap.comroutesLine/point levelPolyline geometry (line features for transit routes)
stop location(Longitude, Latitude)
Grid-based Population DataTelecommunication operator grid IDGrid level1~7046
population count79,222
Subdistrict-level Census DataNational Bureau of Statisticssubdistrict IDSubdistrict level1–29
population count27,614–226,315
Table 3. Goodness-of-fit comparison for alternative prior distribution assumptions.
Table 3. Goodness-of-fit comparison for alternative prior distribution assumptions.
DistributionAICBIC
Log-Normal107,718.82107,735.20
Log-Skew-Normal107,470.53107,495.10
Poisson (Rounded)47,711,715.3647,711,723.55
Negative Binomial (Rounded)405,760.21405,776.59
Table 4. Hardware configuration and hyperparameters for the proposed BIHL framework.
Table 4. Hardware configuration and hyperparameters for the proposed BIHL framework.
ItemsParameterValue
HardwareGPU ConfigurationNVIDIA RTX A2000 (12 GB)
CPU and RAMIntel(R) Xeon(R) W-2235 CPU @ 3.80 GHz, 32.0 GB RAM
DataFeature Dimensionality5 (area, POI count, bus stops, mobile users, location category)
Network ArchitectureHidden Layers[256, 128, 64]
RegularizationBatch Normalization, Layer Normalization, Layer-decaying Dropout (initial p = 0.2, decaying factor 0.8i)
Training HyperparametersBatch Size128
Maximum Epochs80 (with early stopping patience = 30)
OptimizerAdamW
Gradient ClippingMax norm = 0.5
Table 5. Descriptive statistics of key building-level variables.
Table 5. Descriptive statistics of key building-level variables.
MeanStandard DeviationsP10P90
footprint area580.80933.6465.451203.41
gross floor area (GFA)2943.676395.6778.097338.09
number of floors3.764.4217
Table 6. Ablation experiment results: impact of dynamic weighting in loss function, spatial location, and residential category variables on building-level population estimation accuracy.
Table 6. Ablation experiment results: impact of dynamic weighting in loss function, spatial location, and residential category variables on building-level population estimation accuracy.
MAEMAPE (%)SMAPE (%)
BIHL12,271.3311.3611.26
BIHL—no spatial location22,533.9523.4026.76
BIHL—no residential category15,365.6516.4218.63
Note: Bold text denotes the model with the best performance.
Table 7. Performance comparison of BIHL model and baseline models for subdistrict-level population estimation.
Table 7. Performance comparison of BIHL model and baseline models for subdistrict-level population estimation.
MAEMAPE (%)SMAPE (%)
MLR29,392.9630.2034.61
RF12,368.7512.4012.65
LightGBM12,533.1411.5811.74
BIHL12,271.3311.3611.26
Note: Bold text denotes the model with the best performance.
Table 8. R2, MBE, and 95% CI of the absolute errors for MLR, RF, LightGBM, and BIHL frameworks.
Table 8. R2, MBE, and 95% CI of the absolute errors for MLR, RF, LightGBM, and BIHL frameworks.
ModelR2MBEMAE95% CI Lower95% CI Upper
MLR0.95564.5416.5514.2518.85
RF0.9555−912.8410.2615.43
LightGBM0.9495−8.113.9511.2116.7
BIHL0.9835−2.888.577.0510.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, J.; Deng, Y.; Liu, J.; Zhu, Y.; Yang, G.; Hu, Z. Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS Int. J. Geo-Inf. 2026, 15, 264. https://doi.org/10.3390/ijgi15060264

AMA Style

Deng J, Deng Y, Liu J, Zhu Y, Yang G, Hu Z. Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS International Journal of Geo-Information. 2026; 15(6):264. https://doi.org/10.3390/ijgi15060264

Chicago/Turabian Style

Deng, Jin, Ying Deng, Jianfeng Liu, Yadi Zhu, Guanhua Yang, and Zhou Hu. 2026. "Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model" ISPRS International Journal of Geo-Information 15, no. 6: 264. https://doi.org/10.3390/ijgi15060264

APA Style

Deng, J., Deng, Y., Liu, J., Zhu, Y., Yang, G., & Hu, Z. (2026). Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS International Journal of Geo-Information, 15(6), 264. https://doi.org/10.3390/ijgi15060264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop