Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model

Deng, Jin; Deng, Ying; Liu, Jianfeng; Zhu, Yadi; Yang, Guanhua; Hu, Zhou

doi:10.3390/ijgi15060264

Open AccessArticle

Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model

by

Jin Deng

¹,

Ying Deng

²

,

Jianfeng Liu

³,

Yadi Zhu

^2,*,

Guanhua Yang

³ and

Zhou Hu

⁴

¹

School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China

²

School of Civil Engineering, Beijing Jiaotong University, Beijing 100044, China

³

Beijing Urban Construction and Transport Planning & Design Institute Co., Ltd., Beijing 100037, China

⁴

Beijing Urban Construction Design & Development Group Co., Ltd., Beijing 100037, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(6), 264; https://doi.org/10.3390/ijgi15060264

Submission received: 30 January 2026 / Revised: 2 April 2026 / Accepted: 7 June 2026 / Published: 12 June 2026

Download

Browse Figures

Versions Notes

Abstract

Although fine-grained spatial knowledge of the urban population distribution is fundamental for effective urban management, traditional census data lack sufficient resolution. Current disaggregation methods often struggle to probabilistically fuse heterogeneous data, such as noisy mobile signaling and building attributes, while ensuring hierarchical consistency between micro-level predictions and macro-level ground truth. To address these gaps, this study proposes a Bayesian-informed hierarchical learning (BIHL) model framework for building-level population estimation. The methodology integrates three distinct layers: (1) a data-driven prior model using a LightGBM ensemble to generate initial probabilistic estimates and uncertainty weights; (2) an enhanced neural network posterior estimator featuring a multi-branch architecture—incorporating Zone Bias Embedding and Zone Interaction networks—to capture non-linear urban dynamics and spatial heterogeneity; and (3) a constrained optimization layer utilizing a hierarchical loss function that enforces strict consistency between aggregated building estimates and official census data through dynamic curriculum learning. Through empirical validation in Haidian District, Beijing, it is demonstrated that the BIHL framework significantly outperforms baseline models (MLR, Random Forest, and LightGBM), achieving a Mean Absolute Percentage Error (MAPE) of 11.36%. This study confirms that incorporating building-level spatial locations and residential categories is vital for mitigating “spatial smoothing” and systematic under-prediction in high-density areas. This framework provides a robust, high-fidelity solution for generating residential population layers, which are essential for city planning.

Keywords:

building-level population estimation; Bayesian-informed hierarchical learning model; multi-source data; spatial heterogeneity

1. Introduction

Accurate and fine-grained spatial knowledge of urban population distribution is fundamental to effective urban management and transportation planning [1]. Modern transportation systems, including optimized public transit networks and precise Travel Demand Models (TDMs), rely heavily on population density information [2,3]. However, the demographic data traditionally provided by national census reports are aggregated to coarse administrative or statistical boundaries (e.g., census tracts or townships), thus failing to capture the significant population heterogeneity within these zones [4,5]. This fundamental limitation leads to suboptimal traffic analysis, inaccurate demand forecasting, and ineffective emergency response strategies that cannot account for population at the micro-level [6]. Therefore, the task of spatializing coarse census counts into highly resolved population layers, specifically at the scale of individual buildings or residential units, remains a persistent and high-value frontier in urban transport research [6,7].

The most widely adopted technique for population disaggregation is dasymetric mapping, which uses ancillary spatial data to guide the reallocation of population from large source areas to smaller target units [8,9,10]. Early dasymetric approaches relied on simple land cover classifications or large-area indicators like Nighttime Light (NTL) imagery [10,11,12]. More recently, the field has advanced significantly by leveraging high-resolution imagery, LiDAR, and Point-of-Interest (POI) data to capture the urban form at the micro-scale. Specifically, models integrating building characteristics (e.g., footprint area, height, volume) and urban functional zoning (derived from POI/AOI data) have demonstrated superior accuracy by establishing direct, data-driven relationships between residential structures and population counts [13]. Machine learning approaches, such as Random Forest (RF) and ensemble methods, have become state-of-the-art tools for capturing the complex, non-linear correlations between these geospatial features and population distributions [14].

However, traditional machine learning models often rely on shallow architectures that may struggle to extract high-level semantic representations from high-dimensional urban features [15]. Consequently, deep learning (DL) techniques, such as Convolutional Neural Networks (CNNs) and multi-layer perceptrons (MLPs), have recently gained traction due to their superior capability in modeling non-linear urban dynamics [16]. Despite their predictive power, a critical limitation in most standard DL-based population models is the neglect of spatial dependency and spatial non-stationarity [17]. This oversight often leads to ‘spatial smoothing’ issues where local population spikes (e.g., high-density apartments) are under-predicted because the model fails to encode the specific spatial context or neighborhood effects [17].

While building-based models excel at capturing the static residential population, the increasing necessity for understanding the dynamic ambient population distribution—which is essential for real-time traffic management and dynamic OD generation—has led to the introduction of mobile phone signaling data (or Location-Based Service data) as a critical, high-frequency data source [8,18,19]. However, when integrating these multi-source data, specifically fusing noisy, spatiotemporally sparse mobile sensing data with static, structural building information presents a major methodological challenge [20]. Current fusion practices often involve deterministic overlay rules or rely on non-probabilistic machine learning models [21,22]. A key limitation of these non-probabilistic methods is their inability to formally quantify and propagate the uncertainty inherent in each heterogeneous data source, nor can they naturally integrate domain knowledge (such as a building’s likely function) as a probabilistic prior [23,24]. This rigidity often leads to suboptimal allocation and limited interpretability in fine-scale predictions. Furthermore, another challenge in multi-source fusion is ensuring hierarchical consistency between micro-scale predictions and macro-scale ground truth [25]. Most existing disaggregation methods operate in a ‘bottom-up’ manner without a feedback mechanism to strictly constrain the sum of estimated building populations to match the official census data [26], which breaks the optimality of the model and fails to propagate the error gradients back to the feature learning stage. Developing a framework that can intrinsically learn micro-level features while satisfying macro-level aggregate constraints remains an open problem.

To address the critical research gap in robust, probabilistic data fusion for micro-scale residential population estimation, the major contributions of this study are twofold.

1. Methodological innovation: We propose a Bayesian-informed hierarchical learning model framework that probabilistically fuses heterogeneous data (mobile signaling, POI, building features) [27]. Unlike deterministic approaches, this model is rigorously optimized based on both micro-level feature fidelity and macro-level statistical consistency.

2. Practical application: By effectively incorporating building functional types as probabilistic priors, the framework significantly enhances the discrimination of residential capacity at the individual building level. This yields superior prediction accuracy over state-of-the-art baselines, providing high-fidelity spatial data that are essential for city planning.

The remainder of this paper is organized as follows. Section 2 describes the data sources and data preprocessing, while Section 3 details the proposed model framework. Section 4 presents the results, including a comparative analysis against several benchmark models, and discusses the findings. Finally, Section 5 provides the conclusions and suggests avenues for future research.

2. Data Description and Preprocessing

For clarity, Table 1 summarizes the main symbols used in this study.

2.1. Data Description

This study utilized multiple heterogeneous data sources, including building footprint (BF) data, Area of Interest (AOI) data, Point of Interest (POI) data, public transit route and stop data, grid-based population data derived from mobile signaling, and subdistrict-level census data. These datasets differ substantially in spatial scale and information type, and were used to jointly support the task of fine-scale population estimation. A comprehensive summary of their sources, spatial and temporal coverage, key attributes, and roles in this study is provided in Table 2.

BF data define the fundamental spatial units at the building level and provide associated structural attributes. AOI and POI data capture urban functional characteristics, which are used to infer building usage and construct prior information. Public transit route and stop data reflect accessibility features, while grid-based population data derived from mobile signaling data provide information on population distribution and activity intensity. Subdistrict-level census data serve as macro-level statistical constraints to ensure the consistency of the estimation results.

These heterogeneous data sources are jointly integrated within the proposed framework. On one hand, functional and spatial attributes (e.g., AOI, POI, and building features) are used to construct prior information; on the other hand, mobile signaling data provide observational support, while census data impose constraint conditions. Together, they enable fine-grained population allocation at the building level.

2.2. Data Preprocessing

2.2.1. Building Type Inference

To acquire a more precise building typology of BF data, this research fused the AOI and POI datasets to construct a building-type inference algorithm. First, we leveraged geospatial features to integrate the BF data with the AOI and POI datasets using a geographic information system (GIS) tool. Subsequently, we calculated the intersection characteristics (R_a_,i and R_p_,i) between the footprint of each building i and the AOI and POI datasets, as shown in Equation (1).

R_{a, i} = \frac{S_{a, i}}{S_{i}}, R_{p, i} = \frac{N_{p, i}}{N_{i}}

(1)

where S_a_,i is the spatial overlap area between building i and the AOI of type a, N_p_,i is the number of POIs of type p in the area of building i, S_i is the area of building i, and N_i is the total number of POIs in the area of building i.

Finally, two inference models were developed for buildings characterized by different feature parameters: an AOI-based method and an AOI-POI-based method.

The AOI-based classification method is applied to buildings satisfying the condition R_a_,i > threshold. With this method, the building is directly assigned the type attribute of the AOI with maximum R_a_,i. These types include general residential housing, apartment complexes, villas, dormitories, and mixed-use (commercial–residential) buildings. Here, the threshold represents the critical value for the spatial overlap ratio parameter between the building footprint and the AOI, which was set to 0.5 in this study [28].

The AOI-POI-based inference approach is utilized for buildings satisfying the condition R_a_,i ≤ threshold. We count the amount (N_p_,i) of POIs with type p in the area of building i and calculate the POI amount ratio R_p_,i using Equation (1). Type k of AOI or POI with the maximum R_k_,i is set as the type of building i.

2.2.2. Residential Area Calculation

After obtaining the building types, the residential floor area A_i of each building i is calculated. In this study, buildings related to residential area include residential (such as residential housing, apartment, villas, dormitories) and mixed-use buildings.

For residential buildings, the residential floor area is calculated using the number of floors multiplied by the GFA. For mixed-use buildings, it is assumed that the first floor is designated for commercial purposes and the upper floors are residential; therefore, the residential floor area is calculated as GFA × (number of floors − 1). For other types of buildings, the residential floor area is assigned a value of 0.

2.2.3. Pseudo-Residential Population

To obtain an initial value for each building, we disaggregate the grid-level mobile user count to the building level with an area-weighted approach. This method assumes a uniform distribution of mobile users across each grid cell. Consequently, the pseudo-mobile user count (M_i) for building i can be calculated with Equation (2), as follows:

M_{i} = \frac{A_{i}}{\sum_{j \in G (i)} A_{j}} \times M_{G (i)}

(2)

where G(i) is the grid cell where building i is located and A_i is the residential floor area of building i.

2.2.4. Other Features

Prior research has demonstrated that both the density of POIs and transport accessibility around buildings exhibit a significant correlation with population size [3,12,29]. Specifically, POI density reflects the intensity of human activities and the distribution of service facilities in cities, and has been widely used as a core proxy variable for fine-scale residential population estimation [12,29]. Transport accessibility (e.g., bus stop density, road network connectivity) is closely related to residential attractiveness and population distribution patterns, and this conclusion has been fully validated in multi-source data-driven population mapping studies [3]. Subsequently, the numbers of POIs (P_i) and bus stops (B_i) within a 500 m buffer for each building were counted. To mitigate the impact of extreme values on the model training process, we applied truncation to the continuous features (i.e., POI count and bus stop count); in particular, any value above the 99th percentile was truncated to the value of the 99th percentile. Following this principle, the features were standardized using Z-score normalization, ensuring that each feature has a mean of 0 and standard deviation of 1.

In addition, because different residential types—such as standard residential buildings, dormitories, apartments, and villas—exhibit substantial variation in population density, this study further extracted residential sub-categories as input features. Moreover, since Beijing’s central districts developed earlier while the peripheral areas developed later, even standard residential buildings show significant differences in per capita living space across different locations. Therefore, building location information is also incorporated as an input feature. Considering the ring road development pattern of Beijing, ring roads are used as spatial boundaries to assign locational attributes to individual buildings.

3. Methodology

This study proposes a Bayesian-informed hierarchical learning (BIHL) model framework to estimate individual building populations, as shown in Figure 1, which enables fine-grained disaggregation of a population from coarse-grained census data to a building-level distribution with three layers. The first layer is a data-driven prior model that employs a LightGBM ensemble to generate initial probabilistic estimates and quantify the uncertainty of building-level populations. The second layer is a neural network-based estimator that learns the non-linear relationship between building attributes and population. The third layer acts as a deterministic proxy of Bayesian updating. It optimizes a heuristic multi-objective loss function, where the prior uncertainty from the first layer dynamically modulates the micro-level feature learning using confidence weights (w_i), while macroscopic census totals and spatial smoothing act as robust regularizing constraints.

3.1. First Layer: Data-Driven Prior Model

This layer aims to formulate a regression model to generate prior values of building-level population derived from some key building features. A LightGBM model is employed to train the regression model, which is an efficient machine learning framework based on Gradient Boosting Decision Trees. It offers a fast training speed, low memory consumption, and captures non-linear relationships between variables, making it well-suited for large-scale data analytics. However, before establishing the model, it is crucial to determine the underlying statistical distribution of the target variable to ensure the theoretical validity of the prior formulation.

3.1.1. Exploratory Data Analysis of Prior Distribution

To identify a suitable prior distribution for the pseudo-residential population (M_i) derived from area-weighted mobile signaling, exploratory data analysis was conducted. Building-level population counts typically exhibit extreme right-skewness and heavy-tail characteristics, as illustrated in Figure 2A. To mitigate this, a natural logarithmic transformation, log(1 + M_i), was applied. Figure 2B shows that this transformation effectively suppresses the heavy tail, yielding a roughly symmetric, bell-shaped distribution (with empirical fit parameters μ = 5.41 and σ = 1.83). The corresponding Q–Q plot (Figure 2C) further confirms that the central quantiles of the transformed data align tightly with the theoretical normal distribution, despite minor deviations at the extreme tails caused by the deterministic outlier clipping during preprocessing.

Moreover, goodness-of-fit tests were performed to compare the adopted log-normal assumption against the log-skew-normal distribution with log-transformed data, as well as to evaluate Poisson and Negative Binomial (NB) distributions fitted to the original pseudo-residential population data. Poisson and NB models are commonly used for count data [30]. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were calculated to evaluate model fit, as summarized in Table 3.

As the initial proxy targets (M_i) in this study are continuous real numbers derived from spatial area-weighted disaggregation, artificial rounding was applied to force discrete fitting for Poisson and NB distributions. However, these distributions yield higher AIC and BIC values, indicating a fundamentally poor model fit, as demonstrated in Table 3. In comparing the two continuous alternatives, the log-skew-normal distribution accounts for residual skewness and achieves a better statistical fit (with the lowest AIC and BIC). However, within the proposed BIHL framework, the prior distribution needs to facilitate the propagation of the LightGBM ensemble’s mean and variance into the neural network’s loss function. Introducing complex skewness integration parameters would compromise the stability and differentiability of the deep learning optimization process. Given the negligible difference in goodness-of-fit between the two continuous models, this study utilizes the standard log-normal distribution to formulate the population prior distribution for balancing symmetric error assumptions with computational efficiency.

3.1.2. Prior Model Formulation

Informed by the log-normal distribution assumption and existing research [14,31,32], the residential area, POI count, bus stop count, residential type, building location, and mobile phone user amount are utilized as independent variables, and the log-transformation of the pseudo-residential population log(1 + M_i) is utilized as the target variable.

To quantify the uncertainty associated with this initial estimation, an ensemble of K models were trained (in this study, K = 5 was set based on multiple simulations), with each model initialized using a different random seed. The ensemble’s predictive mean and variance are defined in Equation (3).

\begin{array}{l} μ_{i}^{prior} = \frac{1}{K} \sum_{k = 1}^{K} {\hat{M}}_{i, k} \\ {(σ_{i}^{prior})}^{2} = \frac{1}{K} {\sum_{k = 1}^{K} ({\hat{M}}_{i, k} - μ_{i}^{prior})}^{2} \end{array}

(3)

where

{\hat{M}}_{i, k}

is the value for building i predicted by the k^th LightGBM model,

μ_{i}^{p r i o r}

and

σ_{i}^{p r i o r}

are the prior mean and prior standard deviation for building i, respectively. To align with the log-normal assumption, the prior mean (μ_log,i) and variance (

σ_{l o g, i}^{2}

) in the logarithmic space are derived from the ensemble statistics with Equation (4), as follows:

\begin{array}{l} μ_{\log, i} = \log (1 + μ_{i}^{prior}) - \frac{σ_{\log, i}^{2}}{2} \\ σ_{\log, i}^{2} = \log [1 + \frac{{(σ_{i}^{prior})}^{2}}{{(1 + μ_{i}^{prior})}^{2}}] \end{array}

(4)

Consequently, the prior distribution of the log-transformed building population is defined as shown in Equation (5).

\log (1 + p_{i}) \sim N (μ_{\log, i}, σ_{\log, i}^{2})

(5)

3.2. Second Layer: Enhanced Neural Network Posterior Estimator

To address the spatial heterogeneity and feature sparsity inherent in building-level population estimation, this layer includes an enhanced neural network posterior estimator, as illustrated in Figure 3. This model overcomes the limitations of conventional single-path fully connected networks and introduces an innovative multi-branch fusion architecture, which takes as input a high-dimensional building feature vector and corresponding census zone information. These inputs are processed through three parallel subnetworks.

3.2.1. MLP Base Network

This branch network extracts fundamental feature representations. It consists of three hidden layers with dimensions of 256, 128, and 64, respectively. Each linear transformation is followed by batch normalization to accelerate convergence and an ReLU activation function to introduce non-linearity. To alleviate overfitting, a progressively decreasing dropout scheme is applied. The final layer adopts layer normalization, ensuring stable output distributions and enabling the extraction of generalizable building–population mapping patterns.

3.2.2. Zone Bias Embedding

To capture systematic differences across census zones (e.g., baseline density differences between core urban areas and peripheral areas), the model incorporates an embedding layer to learn a zone-specific prior intercept, as shown in Figure 4.

Let Z be the total number of unique census zones within the study area. For any building located in a census zone z, the layer extracts a unique scalar bias term b_z from matrix E based on the discrete zone ID. Then, this bias b_z is added to the output of the base feature extraction MLP network, allowing the model to shift the baseline of the population density estimation to that specific spatial unit. From an implementation perspective, the weights of the matrix E are randomly initialized using a small-standard-deviation Gaussian distribution prior to training—which was set as N(0, 0.05) in this study—and the final values are jointly learned by minimizing the final hierarchical loss function.

3.2.3. Zone Interaction Network

To model the interaction between building features and spatial context, such as the differing population capacities of buildings with identical floor areas in core urban areas and peripheral areas, an interaction branch is introduced. This branch follows two linear layers with a ReLU layer structure and applies a scaling factor α, which dynamically modulates the contribution of building features to the final prediction, enhancing the model’s adaptability to complex spatial environments. In this study, α = 0.1 after multiple simulations.

The outputs of the three branches are combined additively, as shown in Equation (6).

{\hat{y}}_{i}^{std} = f_{MLP} (x_{i}) + Embed (z) + α f_{interact} (x_{i})

(6)

where

{\hat{y}}_{i}^{s t d}

is the standardized prediction for building i, and α is the weight coefficient for the interaction term.

The final output represents the predicted population in the logarithmic space, which is defined in Equation (7). This fusion strategy effectively decouples global structural patterns (captured by the MLP branch) from local spatial deviations (captured by the embedding and interaction branches). As a result, the model maintains strong generalization capability while substantially improving estimation accuracy for specific spatial zones.

{\hat{y}}_{i}^{std} = \frac{\log (1 + p_{i}) - μ_{\log}}{σ_{\log}}

(7)

where p_i is the true population of building i, which is a latent variable in this study, and μ_log is the mean deviation of the log-transformed population, which is derived from the LightGBM prior. Therefore, the prediction on the original population scale can be obtained through an inverse transformation, as shown in Equation (8).

{\hat{p}}_{i} = \exp (σ_{\log} \cdot {\hat{y}}_{i}^{std} + μ_{\log}) - 1

(8)

where

{\hat{P}}_{i}

is the predicted population for building i.

3.3. Third Layer: Hierarchical Loss Function and Spatial Constraints

To address the challenge of disaggregating population data from the coarse census level to the fine-grained building level, we propose a multi-objective optimization framework that integrates micro-level feature learning and macro-level consistency constraints into a unified differentiable loss function. The total objective function L_total is composed of two distinct components, as shown in Equation (9).

L_{total} = L_{micro} + λ (t) \cdot L_{macro}

(9)

where L_micro and L_macro are the loss functions at the micro- and macro-level, respectively, and λ(t) is a time-dependent dynamic weight, which is formulated to prevent optimization instability caused by the conflicting gradients of micro- and macro-objectives. It is designed as a linear warm-up and cosine annealing strategy, calculated with Equation (10) as follows.

λ (t) = \{\begin{cases} λ_{\max} \frac{t}{T_{warm}} & if t \leq T_{warm} \\ λ_{\max} 0.5 (1 + \cos (\frac{t - T_{warm}}{T - T_{warm}} π)) & otherwise \end{cases}

(10)

where λ_max is the maximum penalty weight for the macroscopic constraint, T_warm is the designated number of warm-up epochs, T is the total number of training epochs, and t is the current training epoch.

During the initial phase (t ≤ T_warm), a linear warm-up is employed. Applying a macro-penalty from the first epoch may cause gradient shocks to the uninitialized network. The linear ramp-up gradually introduces the macroscopic constraint, allowing the network to first prioritize learning fundamental micro-level feature representations from the proxy labels before heavily incorporating regional boundaries.

Subsequently (t > T_warm), a cosine annealing schedule is applied to smoothly decay the weight. In the context of spatial population disaggregation, once the initial macro-penalty successfully constrains the optimization space—ensuring that the model’s aggregated predictions broadly align with the regional census total—maintaining a rigid penalty can over-constrain the network. As widely recognized in spatial analysis, regional census data inherently suffer from spatial aggregation biases (e.g., the Modifiable Areal Unit Problem) [33], which can conflict with the fine-grained physical realities. By smoothly annealing the macroscopic weight, the network realizes high-precision micro-level fine-tuning in the later stages of training. This dynamic relaxation ensures that the model converges to an optimal state, thus achieving fine-grained spatial micro-heterogeneity without escaping the established macroscopic demographic bounds.

3.3.1. Micro-Level Proxy Loss (L_micro)

Since ground-truth population data at the building level are typically not available, this study utilized the soft labels generated by the ensemble-LightGBM model as a proxy supervision signal. To ensure robustness against noise and outliers in these proxy labels, SmoothL1loss—a piecewise loss function that combines the advantages of both L1 and L2 loss and improves training stability—was employed rather than the mean squared error (MSE).

Let N be the number of buildings in a mini-batch,

{\hat{y}}_{i}

the predicted value, and y_proxy,i the standardized proxy label for building i. The micro-level proxy loss is defined in Equation (11).

L_{proxy} = \frac{1}{N} \sum_{i = 1}^{N} w_{i} \cdot SmoothL 1 Loss ({\hat{y}}_{i}, y_{proxy, i})

(11)

where w_i is the confidence weight derived from the ensemble variance, ensuring that the network prioritizes learning from high-confidence samples while down-weighting ambiguous predictions.

3.3.2. Macro-Level Census Constraint (L_macro)

The core of the proposed hierarchical framework is the differentiable aggregation constraint. This study enforces a constraint which ensures that the sum of predicted populations for all buildings within a specific census zone must approximate the true census count.

Let Z be the number of census zones. For a given zone z, let B_z denote the set of buildings belonging to the zone. The aggregated prediction

{\hat{P}}_{z}

is calculated by summing the exponentiated outputs of the network, as shown in Equation (12).

{\hat{P}}_{z} = \sum_{i \in B_{z}} \exp (σ_{y} \cdot {\hat{y}}_{i} + μ_{y}) - 1

(12)

where σ_y and μ_y are the normalization parameters. To handle the large variance in population magnitudes across different zones, this study minimizes the mean squared relative error (MSRE) rather than the absolute error, as shown in Equation (13).

L_{macro} = \frac{1}{Z} \sum_{Z = 1}^{Z} {(\frac{{\hat{P}}_{z} - P_{census, z}}{P_{census, z}})}^{2}

(13)

where L_macro is the macro-level loss, Z is the number of zones, and P_census,z is the census population for zone z.

This constraint acts as a posterior regularization term, correcting the bias in the micro-level predictions by grounding them to the official census statistics.

The training process employs an alternating optimization strategy. In each training epoch, mini-batch data are used to optimize the micro-level proxy loss. Subsequently, the full-batch dataset is utilized to optimize the macro-level census constraint.

3.4. Evaluation Metrics

Based on the BIHL model constructed above, the population of each building was estimated. As the ground-truth residential population at the building level is difficult to obtain, the predicted building-level population was aggregated at the subdistrict level. The aggregated results are then compared with the census population of each subdistrict. The evaluation metrics include the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE), as shown in Equations (14)–(16).

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - f_{i}|

(14)

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} \frac{|y_{i} - f_{i}|}{y_{i}}

(15)

SMAPE = \frac{100 %}{n} \sum_{i = 1}^{n} \frac{2 \times |y_{i} - f_{i}|}{|y_{i}| + |f_{i}|}

(16)

where n is the number of communities in the test set, y_i denotes the true population of community i, and f_i denotes its estimated population.

3.5. Model Implementation

The proposed BIHL framework was implemented on a workstation equipped with an Intel Xeon W-2235 CPU (3.80 GHz), 32.0 GB of RAM, and an NVIDIA RTX A2000 GPU (12 GB). The deep learning framework was implemented using PyTorch 2.4.1; the complete configuration and hyperparameter settings are detailed in Table 4. And the implementation pseudo-code of BHIL model framework is shown in Appendix A.

Regarding the network architecture, the micro-level feature extraction module was constructed with three hidden layers, progressively decreasing in dimensionality [256, 128, 64]. To mitigate overfitting and ensure stable gradient propagation across deep layers, a combination of Batch Normalization and Layer Normalization was utilized. Furthermore, to regularize the network while maintaining the precision required for continuous population estimation, we implemented a layer-decaying dropout strategy [34]. The dropout probability starts at 0.2 for the first hidden layer—to effectively break the co-adaptation of raw input features—and decays by a factor of 0.8ⁱ for subsequent layers (where i is the layer index). This progressive decay ensures a smooth transition from robust feature extraction at the bottom layer to stable, deterministic representations at the final regression head.

During the training phase, the model was optimized using the AdamW optimizer with a batch size of 128. To prevent the previously discussed gradient shocks—especially when the macroscopic penalty is introduced—gradient clipping was performed with a maximum norm empirically set to 0.5 to ensure stable parameter updating. The maximum number of training epochs was set to 80; to prevent potential overfitting, a standard early stopping mechanism was employed with a patience of 30 epochs.

4. Results and Discussions

4.1. Study Area Overview

To empirically validate the proposed BIHL model framework, a comprehensive case study was conducted in Haidian District, Beijing, with a specific focus on the Beixiaguan Subdistrict, as shown in Figure 5. Located in the northwestern part of Beijing city core, Haidian District is the city’s primary academic and technological hub. The selected study area comprises a dense mixture of building typologies, including old residential compounds, modern high-rise apartment complexes, student dormitories, and mixed-use commercial–residential structures. This architectural diversity, coupled with a remarkably high population density, makes it an ideal example for micro-scale population estimation.

The modeled region includes a total of 2763 individual buildings. The official macroscopic census population for this target area is 146,366 residents, with an average regional population density of approximately 24,340 persons per square kilometer. The initial grid-based mobile signaling data used to formulate the probabilistic priors have a spatial resolution of 250 m × 250 m. A comprehensive statistical summary of the dataset is presented in Table 5, which details the distributions of key building-level variables—such as footprint area, gross floor area (GFA), and number of floors—reporting their mean values, standard deviations, and extreme deciles (10th and 90th percentiles).

4.2. Ablation Experiment

Given that this study introduces additional variables (such as building-level spatial location and residential category information), which have received limited attention in prior work, an ablation experiment was conducted to assess their incremental contribution to the predictive performance. The prediction metrics are reported in Table 6.

The results show that omitting building-level spatial information leads to a substantial rise in MAPE, reaching 23.40%, along with a pronounced increase in the discrepancy between MAPE and SMAPE. This indicates that spatial location is a key determinant of prediction accuracy and stability in building-level population estimation. This effect is consistent with Beijing’s ring-structured urban form: the central districts were developed earlier and are dominated by compact, high-density residential stock, whereas peripheral areas were developed later and typically feature larger dwelling units to meet higher living-quality expectations. Such spatial heterogeneity directly influences population density patterns, and thus affects model performance when spatial variables are excluded.

Similarly, removing residential category information results in a significant increase in MAPE to 16.42%, again accompanied by a notable widening of the MAPE–SMAPE gap. This underscores the strong contribution of residential category information to reducing prediction errors and improving output stability. The underlying mechanism is that different residential types—such as standard apartments, villas, and dormitories—exhibit substantial variation in per capita floor area, which directly affects population allocation at the building scale. Incorporating residential type therefore provides essential structural information that enhances the predictive accuracy of building-level population models.

Moreover, to justify the proposed dynamic weighting strategy for the macroscopic constraint λ(t)—which combines a linear warm-up phase with a cosine annealing schedule—an ablation experiment was conducted by comparing it against a constant λ baseline. Figure 6 illustrates the training loss convergence trajectories for both strategies.

As observed for the constant weighting strategy (yellow dots), applying the maximum macroscopic penalty from the very first epoch induces gradient shock. The uninitialized network is forced to resolve conflicting gradients between micro-feature learning and macro-aggregation. This results in a high initial loss spike (exceeding 0.3) and a highly unstable, inefficient early descent trajectory. Furthermore, the conflicting gradients trap the model in a suboptimal state, converging to a noticeably higher final loss.

In contrast, our proposed dynamic weighting strategy (blue dots) effectively eliminates this optimization bottleneck. During the linear warm-up phase, the near-zero initial penalty allows the network to undergo a robust initialization, rapidly and smoothly minimizing the loss to capture essential spatial micro-heterogeneity. As the training progresses into the cosine annealing phase, the network calibrates the multi-objective optimization without disrupting the already learned feature representations.

Consequently, the dynamic strategy not only converges significantly faster but also achieves a lower and more stable final loss. This ablation test demonstrates that the curriculum learning approach (linear warm-up followed by cosine annealing) is a necessary mechanism to ensure stable and efficient gradient descent in our BIHL framework.

4.3. Model Verification

To verify the advancement of the BIHL model framework proposed in this study, some commonly used approaches in the existing literature, including multiple linear regression (MLR) and random forest (RF) models, were implemented for comparison in this study, and the prediction metrics are shown in Table 7.

The results reveal that the MLR model performs the worst, with a MAPE as high as 30.20%. Moreover, it shows the largest gap between MAPE and SMAPE, indicating poor stability in the error distribution and the presence of extreme prediction values. In contrast, the HBM model proposed in this study demonstrates superior performance across all three subdistrict-level error metrics, achieving a MAPE and SMAPE of 11.36% and 11.26%, respectively, exhibiting the smallest gap between them, suggesting a more stable error distribution.

To validate the fine-grained disaggregation performance of the proposed model, an independent manual survey was conducted. Recognizing that absolute building-level ground truth is inherently unavailable, we established an estimated proxy ground truth for validation. Specifically, a stratified random sampling strategy was employed to select 20 residential buildings within each subdistrict. This stratification ensured that buildings of varying residential types (e.g., general residential housing, mixed-use building) and building location (e.g., core urban area or peripheral area) were proportionally represented. For each selected building, the exact number of households was determined through manual field surveys. The building-level residential population was then estimated by multiplying the household count by the average household size reported in the Haidian District Statistical Yearbook (2023). This surveyed dataset was only utilized as an independent hold-out validation set and was completely excluded from the training phase of all evaluated models.

These estimated validation values were compared with the building-level predictions produced by the four methods (MLR, RF, LightGBM, and the proposed framework), as visually presented in the scatter plots in Figure 7. To quantify the performance and address the uncertainty of building-level estimations, we calculated the Coefficient of Determination (R²), Mean Bias Error (MBE), and 95% Confidence Intervals (95% CIs) of the absolute errors for each model, which are given in Table 8.

As illustrated in Figure 7, the blue scatter points represent the predicted versus actual estimated populations, with the orange diagonal line denoting perfect alignment. Scatter points closer to this reference line indicate higher consistency. The MLR model exhibits highly dispersed scatter points and a low R², indicating a poor capture of micro-level spatial heterogeneity. The RF and LightGBM models show similar, improved patterns; however, they exhibit a noticeable negative bias (MBE < 0), consistently underestimating populations, particularly for medium-to-high density buildings (e.g., actual populations of 400–500).

In contrast, the proposed framework demonstrates superior disaggregation fidelity. By integrating dynamic curriculum weighting and total population constraints, the proposed method effectively mitigates the underestimation bias observed in the tree-based models, yielding the MBE closest to zero and the highest R². Furthermore, the narrower 95% CI of the proposed method indicates significantly reduced predictive uncertainty at the micro-level. This quantitative and visual evidence solidifies the effectiveness of the proposed multi-objective optimization approach in maintaining spatial micro-heterogeneity without violating macro-level boundaries.

4.4. Result Visualization

To intuitively evaluate the spatial disaggregation performance of the proposed BIHL model framework, we visualize the building-level population estimation results for the Beixiaguan Subdistrict in Haidian District, Beijing. Figure 8 presents the fine-grained residential population distribution from a 2D perspective, providing a clear overview of the population density variations across the urban fabric, while Figure 9 displays the results from a 3D perspective, where the color gradient is proportional to the estimated population count with the real building height.

As illustrated in Figure 8, the BIHL framework successfully captures the high degree of spatial heterogeneity inherent in complex urban environments, effectively discriminating between different building typologies. High-rise apartment complexes and student dormitories, depicted in deep red, are accurately identified as population hotspots. Even within the same census block, Figure 8 reveals substantial variations among adjacent buildings, as the model integrates building geometry and POI-based typology to assign higher values to residential buildings while maintaining low values for mixed-use buildings. Figure 8 highlights the Jiaoda community (comprising the academic and residential areas of Beijing Jiaotong University), delineated by a green dotted line, and the China Meteorological Administration (CMA) community (encompassing its working and residential zones), enclosed by a blue dashed line. Driven by the high concentration of high-rise apartment blocks and dense dormitories, the population distribution in the Jiaoda community exhibits a dark red color. Conversely, the residential areas within the CMA community are dominated by low-rise residential buildings, resulting in a more orange representation of its population distribution. These findings are highly consistent with the 3D building characteristics depicted in Figure 9.

5. Conclusions

In this study, a Bayesian-informed hierarchical learning (BIHL) model framework was designed to fill the gap between data-driven deep learning and formal probabilistic constraints for building-level population estimation. By treating residential population as a latent variable constrained by mobile signaling priors and hard macro-level census data, the proposed method provides a robust solution for fine-grained urban modeling.

To validate the model’s performance, a case study was conducted using data from Haidian District, Beijing. The experimental results demonstrated that the proposed framework significantly outperforms traditional models (MLR, RF), achieving superior predictive accuracy and robustness with a MAPE of 11.36%. The minimal divergence observed between the MAPE and Symmetric MAPE (SMAPE) metrics further corroborates the model’s stability in handling complex error distributions compared to deterministic approaches. Through rigorous ablation studies, this research confirmed the critical role of micro-spatial features, underscoring that the integration of building-level spatial coordinates with residential category priors is indispensable for mitigating ‘spatial smoothing’ effects and enhancing model generalization. Furthermore, unlike standard machine learning approaches that typically exhibit pronounced under-prediction in high-density residential areas, the BIHL model framework effectively eliminates systematic bias and ensures physical consistency by employing a dynamic curriculum learning strategy coupled with hierarchical consistency constraints. This mechanism successfully balances micro-level feature extraction with macro-level census regularization. Ultimately, this work not only provides a robust methodology for generating high-fidelity, residential-level population layers but also establishes a solid foundation for next-generation Travel Demand Models (TDMs), offering significant theoretical and practical implications for optimizing urban public transit networks and formulating precise emergency response strategies in complex urban environments.

While the proposed BIHL model framework demonstrated significant accuracy in the empirical validation within Haidian District, certain limitations remain. First, the model is relatively sensitive to the quality of multi-source inputs; specifically, the inherent noise and spatiotemporal sparsity of mobile signaling data may constrain the stability of micro-level predictions. Second, the model’s reliance on site-specific Zone Embeddings hinders its direct transferability across different cities or regions, typically necessitating parameter recalibration when deployed in new environments. Third, from a methodological perspective, the current neural estimator provides deterministic point predictions optimized by gradient descent, lacking full posterior probability sampling to quantify the absolute uncertainty of the final outputs.

To address these challenges, future research could integrate multi-temporal mobile signaling data to extend the current static estimation into dynamic spatiotemporal forecasting. Simultaneously, to enable robust model transferability, future work should leverage richer urban spatial datasets and employ advanced methods, such as graph neural networks, to establish a generalized mapping between urban spatial features and population biases. Finally, a critical direction is the development of a fully probabilistic Bayesian inference system—such as one integrating Bayesian Neural Networks or Markov Chain Monte Carlo sampling—to rigorously provide posterior uncertainty bounds alongside the micro-level population estimates.

Author Contributions

Conceptualization, Jin Deng, Jianfeng Liu, and Yadi Zhu; methodology, Jin Deng and Yadi Zhu; software, Jin Deng; validation, Guanhua Yang and Zhou Hu; investigation, Guanhua Yang; data curation, Zhou Hu and Ying Deng; writing—original draft, Ying Deng; visualization, Jianfeng Liu; supervision, Jianfeng Liu; project administration, Jianfeng Liu; funding acquisition, Yadi Zhu. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Fundamental Research Funds for the Central Universities [Grant No.: 2024JBMC021], and the National Key Research and Development Programme of China under the title of ‘Intelligent Assessment and Dynamic Prediction of Spatial Operational Characteristics of Station-City Converged Stereoscopic Networks’ [Grant No. 2023YFC3807503].

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The original dataset includes mobile phone records and census data, and confidentiality agreements were signed with the data providers. Requests for access will require a clear description of the requester’s background and the intended use of the data to ensure compliance with these agreements.

Conflicts of Interest

Author Jianfeng Liu and Guanhua Yang were employed by the Beijing Urban Construction and Transport Planning & Design Institute Co., Ltd., Beijing 100037, China. Zhou Hu was employed by the Beijing Urban Construction Design & Development Group Co., Ltd., Beijing 100037, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Algorithm A1: Optimization Workflow of the BIHL Framework

Input: Building features X, Regional census populations P_census, Zone identifiers Z, Maximum epochs T, Warm-up epochs T_warm, Dynamic weight parameters λ_max, Ensemble size K
Output: Fine-grained building-level population estimates

{\overset{⌢}{P}}_{final}

1: // Layer 1: Proxy Label Generation & Uncertainty Weighting
2: Normalize features X → X_norm
3: Train baseline LightGBM model on X_norm to obtain proxy labels y_proxy
4: Train ensemble of K LightGBM models with varying hyperparameters
5: Calculate prediction variance σ_i² for each building i from ensemble outputs
6: Compute confidence weights w_i _∝ 1/(σ_i² + ϵ) and normalize them
7: // Layer 2: Enhanced Neural Network Posterior Estimator Architecture
8: Initialize Base MLP network parameters and Interaction network parameters
9: Initialize Zone Bias Embedding matrix E
10: Define prediction function

{\hat{y}}_{b} = f_{θ} (X_{b}, Z_{b})

11:

{\hat{y}}_{base} = MLP (X_{b})

12: b_zb = E[z_b] // Extract specific scalar bias for zone z_b
13:

{\hat{y}}_{interact} = Interaction (X_{b})

14: return

{\hat{y}}_{b} = {\hat{y}}_{base} + b_{z_{b}} + α \cdot {\hat{y}}_{interact}

15: // Layer 2: Multi-Objective Neural Network Training
16: Initialize neural network parameters θ
17: for t = 1 to T do
18: // Micro-level Optimization (Mini-batch)
19: for each mini-batch (X_b, y_b, Z_b, w_b) do
20:    Predict standardized population log-values using Layer 2
21:    Compute unweighted Smooth L1 Loss: L_unweighted
22:    Compute weighted micro-loss: L_micro
23:    Update θ by descending gradient ∇_θL_micro with gradient clipping
24: end for
25: // Macro-level Spatial Constraint (Full-batch)
26: Predict full-batch values

{\hat{y}}_{all} = f_{θ} (X_{all}, Z_{all})

27: Transform predictions to original population scale:

{\hat{P}}_{all} = \exp ({\hat{y}}_{all} \cdot σ_{y} + μ_{y}) - 1

28: Aggregate building populations by zone:

{\hat{P}}_{zone} = Agg ({\hat{P}}_{all}, Z_{all})

29: Calculate macro-level loss (Relative MSE): L_macro
30: // Dynamic Curriculum Weighting
31: if t ≤ T_warm then
32: λ_t = λ_max·(t/T_warm) // Linear warm-up
33: else
34: λ_t = λ_max·0.5· [1 + cos(π (t − T_warm)/(T − T_warm))] // Cosine annealing
35: end if
36: Compute total constraint loss: L_total = λ_t · L_macro
37: Update θ by descending gradient ∇_θL_macro
38: Update Learning Rate using Advanced Scheduler (Warm-up + Cosine Annealing)
39: Check Early Stopping condition based on Validation Loss
40: end for
41: // Post-Calibration Stage42: Load θ_best and generate raw predictions

{\hat{P}}_{raw}

43: for each zone j do
44: Scale factor

S_{j} = P_{census}^{j} / \sum_{i \in zone j} {\hat{P}}_{raw}^{i}

45: Apply calibration:

{\hat{P}}_{final}^{i} = {\hat{P}}_{raw}^{i} \cdot S_{j}

46: end for
47: return

{\hat{P}}_{final}

References

Chen, M.; Xian, Y.; Huang, Y.; Zhang, X.; Hu, M.; Guo, S.; Chen, L.; Liang, L. Fine-scale population spatialization data of China in 2018 based on real location-based big data. Sci. Data 2022, 9, 624. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Fan, H.; Wang, Y. Improving population mapping using Luojia 1-01 nighttime light image and location-based social media data. Sci. Total Environ. 2020, 730, 139148. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Ma, M.G.; Shi, K.F.; Peng, Z.Y. Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data. ISPRS Int. J. Geo-Inf. 2020, 9, 369. [Google Scholar] [CrossRef]
Azar, D.; Engstrom, R.; Graesser, J.; Comenetz, J. Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data. Remote Sens. Environ. 2013, 130, 219–232. [Google Scholar] [CrossRef]
Szarka, N.; Biljecki, F. Population estimation beyond counts-Inferring demographic characteristics. PLoS ONE 2022, 17, e0266484. [Google Scholar] [CrossRef]
Jing, C.; Zhou, W.; Qian, Y.; Yan, J. Mapping the Urban Population in Residential Neighborhoods by Integrating Remote Sensing and Crowdsourcing Data. Remote Sens. 2020, 12, 3235. [Google Scholar] [CrossRef]
Xie, Y.; Weng, A.; Weng, Q. Population Estimation of Urban Residential Communities Using Remotely Sensed Morphologic Data. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1111–1115. [Google Scholar] [CrossRef]
Tomás, L.; Fonseca, L.; Almeida, C.; Leonardi, F.; Pereira, M. Urban population estimation based on residential buildings volume using IKONOS-2 images and lidar data. Int. J. Remote Sens. 2016, 37, 1–28. [Google Scholar] [CrossRef]
Ural, S.; Hussain, E.; Shan, J. Building population mapping with aerial imagery and GIS data. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 841–852. [Google Scholar] [CrossRef]
Stevens, F.; Gaughan, A.; Linard, C.; Tatem, A. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef]
Li, G.; Weng, Q. Fine-scale population estimation: How Landsat ETM plus imagery can improve population distribution mapping. Can. J. Remote Sens. 2010, 36, 155–165. [Google Scholar] [CrossRef]
Qing, Y.X.; Wu, H.Y.; Qi, K.L.; Gui, Z.P.; Liu, Y.H.; Li, Z.Q.; Li, R. Integrating street-view imagery and points of interest for refining population spatialization: A case study in Wuhan City. Sustain. Cities Soc. 2024, 115, 105883. [Google Scholar] [CrossRef]
Wu, B.; Yang, C.; Wu, Q.; Wang, C.; Wu, J.; Yu, B. A building volume adjusted nighttime light index for characterizing the relationship between urban population and nighttime light intensity. Comput. Environ. Urban Syst. 2023, 99, 101911. [Google Scholar] [CrossRef]
Wang, M.; Wang, Y.; Li, B.; Cai, Z.; Kang, M. A Population Spatialization Model at the Building Scale Using Random Forest. Remote Sens. 2022, 14, 1811. [Google Scholar] [CrossRef]
Robinson, C.; Hohman, F.; Dilkina, B.; Machinery, A.C. (Eds.) A Deep Learning Approach for Population Estimation from Satellite Imagery. In Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities (GeoHumanities), Los Angeles, CA, USA, 7–10 November 2017; Association for Computing Machinery: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
Tiecke, T.G.; Liu, X.; Zhang, A.; Gros, A.; Li, N.; Yetman, G.; Kilic, T.; Murray, S.; Blankespoor, B.; Prydz, E.B.; et al. Mapping the world population one building at a time. arXiv 2017, arXiv:1712.05839. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Cai, B.; Shao, Z.; Huang, X.; Zhou, X.; Fang, S. Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103399. [Google Scholar] [CrossRef]
Li, M.; Zhang, H.; Chen, J. (Eds.) Fine-grained Dynamic Population Mapping Method Based on Large-scale Sparse Mobile Phone Data. In Proceedings of the 20th IEEE International Conference on Mobile Data Management (IEEE MDM), Hong Kong, China, 10–13 June 2019; IEEE Computer Society: Los Alamitos, CA, USA, 2019. [Google Scholar] [CrossRef]
Liu, G.; Li, R.; Xia, J.; Liu, Z.; Cai, J.; Wu, H.; Peng, M. Dual-environment feature fusion-based method for estimating building-scale population distributions. Geo-Spat. Inf. Sci. 2024, 27, 1943–1958. [Google Scholar] [CrossRef]
Lu, W.; Weng, Q. An ANN-based method for population Dasymetric mapping to avoid the scale heterogeneity: A case study in Hong Kong, 2016–2021. Comput. Environ. Urban Syst. 2024, 108, 102072. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, J. Integrating BP and MGWR-SL Model to Estimate Village-Level Poor Population: An Experimental Study from Qianjiang, China. Soc. Indic. Res. 2018, 138, 639–663. [Google Scholar] [CrossRef]
Cao, Y.; Liu, J.; Wang, Y.; Wang, L.; Wu, W.; Su, F. A study on the method for functional classification of urban buildings by using POI data. J. Geo-Inf. Sci. 2020, 22, 1339–1348. [Google Scholar] [CrossRef]
Liang, S.; Jiang, J. Analysis of Commercial Space Pattern and Driving Factors Based on POI Data and Nighttime Lighting Data--Taking Xuzhou Main City as an Example. Geospat. Inf. 2025, 1–6. Available online: https://link.cnki.net/urlid/42.1692.P.20251126.1111.010 (accessed on 24 March 2026).
Tobler, W. Smooth Pycnophylactic Interpolation for Geographical Regions. J. Am. Stat. Assoc. 1979, 74, 519–530. [Google Scholar] [CrossRef]
Zandbergen, P.A. Dasymetric mapping using high resolution address point datasets. Trans. GIS 2011, 15, 5–27. [Google Scholar] [CrossRef]
Liu, X.; Clarke, K.; Herold, M. Population density and image texture: A comparison study. Photogramm. Eng. Remote Sens. 2006, 72, 187–196. [Google Scholar] [CrossRef]
Longley, P.A.; Goodchild, M.F.; Maguire, D.J.; Rhind, D.W. Geographic Information Science and Systems; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Bakillah, M.; Liang, S.; Mobasheri, A.; Arsanjani, J.J.; Zipf, A. Fine-resolution population mapping using OpenStreetMap points-of-interest. Int. J. Geogr. Inf. Sci. 2014, 28, 1940–1963. [Google Scholar] [CrossRef]
Zhu, Y.; Chen, F.; Wang, Z.; Deng, J. Spatio-temporal analysis of rail station ridership determinants in the built environment. Transportation 2019, 46, 2269–2289. [Google Scholar] [CrossRef]
Wang, S.; Li, R.; Jiang, J.; Meng, Y. Fine-scale population estimation based on building classifications: A case study in Wuhan. Future Internet 2021, 13, 251. [Google Scholar] [CrossRef]
Liu, Z.; Gui, Z.; Wu, H.; Qin, K.; Wu, J.; Mei, Y.; Zhao, J. Fine-scale population spatialization by synthesizing building data and POI data. J. Geomat. 2021, 46, 102–106. [Google Scholar]
Du, K.; Song, J.; Chen, D.; Li, M.; Zhu, Y. A Novel Traffic Analysis Zone Division Methodology Based on Individual Travel Data. Appl. Sci. 2025, 15, 156. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]

Figure 1. BIHDL model framework.

Figure 2. Exploratory data analysis of pseudo-residential population: (A) lightcoral-colored histogram showcasing the heavy-tailed distribution of the original population data; (B) skyblue-colored density histogram of the log-transformed data, overlaid with a fitted normal density curve (black solid line), where μ and σ denote the fitted mean and standard deviation, respectively; (C) Quantile-Quantile (Q-Q) plot of the log-transformed data against a theoretical normal distribution, where skyblue dots represent empirical data points.

Figure 3. Enhanced neural network posterior model architecture.

Figure 4. Zone bias embedding architecture.

Figure 5. Location of study area.

Figure 6. Training loss convergence trajectories for dynamic and constant weighting strategies.

Figure 7. Scatter plot comparison of proxy ground truth versus predicted building-level residential population across MLR, RF, LightGBM, and BIHL frameworks. Blue dots represent individual building-level data points. The orange dashed line represents the perfect alignment reference (y = x), while the green dash-dotted line denotes the empirical linear fit surrounded by a semi-transparent 95% confidence interval (CI) band. Insets display the performance metrics for each model: R² (coefficient of determination), MBE (mean bias error), and the 95% CI of absolute error.

Figure 8. Two-dimensional spatial distribution of estimated building-level population in Beixiaguan Subdistrict.

Figure 9. Three-dimensional visualization of estimated building-level population in Beixiaguan Subdistrict.

Table 1. Notation used in this study.

Symbol	Description	Unit
i, a, z	Indices of buildings, AOIs, and zones	-
N, K, Z	Number of buildings, samples, and zones	-
S_a_,i, S_i	Overlap area and building area	m²
R_a_,i	Overlap ratio between building and AOI	-
A_i	Residential floor area of building i	m²
M_G_(i)	Population in grid containing building i	persons
x_i	Feature vector of building i	-
$p_{i}, {\hat{p}}_{i}$	True and estimated population	persons
${\hat{P}}_{z}$ , P_census_,z	Estimated and census population in zone z	persons
μ_i^prior, σ_i^prior	Prior mean and standard deviation for building i	-
μ_log, σ_log	Mean and standard deviation of the log-transformed population	-
${\hat{y}}_{i}^{s t d}$	Standardized prediction	-
f_MLP (·)	Multi-layer perceptron (MLP) mapping function	-
w_i	Confidence weight	-
L_total, L_micro, L_macro	Total, micro-level, and macro-level loss functions	-
λ (t)	Dynamic loss weight	-

Table 2. Summary of datasets used in this study.

Dataset Name	Data Source	Key Attributes	Spatial Scale	Numerical Range
Building Footprint (BF) Data	\	building ID	Building level	1~2763
		geometry		Polygon
		floor number		1~40
		GFA		7~115,168
Area of Interest (AOI) Data	\	area ID	Area level	1~3890
		type		Commercial and Residential, Science/Education/Culture, Healthcare, Government, Accommodation, Corporate, Scenic, etc.
		boundaries		Spatial extent of AOI parcels (polygon geometry)
Point of Interest (POI) Data	Amap.com	POI ID	Point level	1~7650
		category		Commercial Services, Residential, Office, Science/Education/Culture, Transportation, Government, Finance, Catering, Entertainment, etc.
		location		(Longitude, Latitude)
Public Transit Route and Stop Data	Amap.com	routes	Line/point level	Polyline geometry (line features for transit routes)
Public Transit Route and Stop Data	Amap.com	stop location	Line/point level	(Longitude, Latitude)
Grid-based Population Data	Telecommunication operator	grid ID	Grid level	1~7046
Grid-based Population Data	Telecommunication operator	population count	Grid level	79,222
Subdistrict-level Census Data	National Bureau of Statistics	subdistrict ID	Subdistrict level	1–29
Subdistrict-level Census Data	National Bureau of Statistics	population count	Subdistrict level	27,614–226,315

Table 3. Goodness-of-fit comparison for alternative prior distribution assumptions.

Distribution	AIC	BIC
Log-Normal	107,718.82	107,735.20
Log-Skew-Normal	107,470.53	107,495.10
Poisson (Rounded)	47,711,715.36	47,711,723.55
Negative Binomial (Rounded)	405,760.21	405,776.59

Table 4. Hardware configuration and hyperparameters for the proposed BIHL framework.

Items	Parameter	Value
Hardware	GPU Configuration	NVIDIA RTX A2000 (12 GB)
Hardware	CPU and RAM	Intel(R) Xeon(R) W-2235 CPU @ 3.80 GHz, 32.0 GB RAM
Data	Feature Dimensionality	5 (area, POI count, bus stops, mobile users, location category)
Network Architecture	Hidden Layers	[256, 128, 64]
Network Architecture	Regularization	Batch Normalization, Layer Normalization, Layer-decaying Dropout (initial p = 0.2, decaying factor 0.8ⁱ)
Training Hyperparameters	Batch Size	128
	Maximum Epochs	80 (with early stopping patience = 30)
	Optimizer	AdamW
	Gradient Clipping	Max norm = 0.5

Table 5. Descriptive statistics of key building-level variables.

	Mean	Standard Deviations	P10	P90
footprint area	580.80	933.64	65.45	1203.41
gross floor area (GFA)	2943.67	6395.67	78.09	7338.09
number of floors	3.76	4.42	1	7

Table 6. Ablation experiment results: impact of dynamic weighting in loss function, spatial location, and residential category variables on building-level population estimation accuracy.

	MAE	MAPE (%)	SMAPE (%)
BIHL	12,271.33	11.36	11.26
BIHL—no spatial location	22,533.95	23.40	26.76
BIHL—no residential category	15,365.65	16.42	18.63

Note: Bold text denotes the model with the best performance.

Table 7. Performance comparison of BIHL model and baseline models for subdistrict-level population estimation.

	MAE	MAPE (%)	SMAPE (%)
MLR	29,392.96	30.20	34.61
RF	12,368.75	12.40	12.65
LightGBM	12,533.14	11.58	11.74
BIHL	12,271.33	11.36	11.26

Note: Bold text denotes the model with the best performance.

Table 8. R², MBE, and 95% CI of the absolute errors for MLR, RF, LightGBM, and BIHL frameworks.

Model	R²	MBE	MAE	95% CI Lower	95% CI Upper
MLR	0.9556	4.54	16.55	14.25	18.85
RF	0.9555	−9	12.84	10.26	15.43
LightGBM	0.9495	−8.1	13.95	11.21	16.7
BIHL	0.9835	−2.88	8.57	7.05	10.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Deng, J.; Deng, Y.; Liu, J.; Zhu, Y.; Yang, G.; Hu, Z. Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS Int. J. Geo-Inf. 2026, 15, 264. https://doi.org/10.3390/ijgi15060264

AMA Style

Deng J, Deng Y, Liu J, Zhu Y, Yang G, Hu Z. Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS International Journal of Geo-Information. 2026; 15(6):264. https://doi.org/10.3390/ijgi15060264

Chicago/Turabian Style

Deng, Jin, Ying Deng, Jianfeng Liu, Yadi Zhu, Guanhua Yang, and Zhou Hu. 2026. "Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model" ISPRS International Journal of Geo-Information 15, no. 6: 264. https://doi.org/10.3390/ijgi15060264

APA Style

Deng, J., Deng, Y., Liu, J., Zhu, Y., Yang, G., & Hu, Z. (2026). Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model. ISPRS International Journal of Geo-Information, 15(6), 264. https://doi.org/10.3390/ijgi15060264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Building-Level Population Estimation Method Using a Bayesian-Informed Hierarchical Learning Model

Abstract

1. Introduction

2. Data Description and Preprocessing

2.1. Data Description

2.2. Data Preprocessing

2.2.1. Building Type Inference

2.2.2. Residential Area Calculation

2.2.3. Pseudo-Residential Population

2.2.4. Other Features

3. Methodology

3.1. First Layer: Data-Driven Prior Model

3.1.1. Exploratory Data Analysis of Prior Distribution

3.1.2. Prior Model Formulation

3.2. Second Layer: Enhanced Neural Network Posterior Estimator

3.2.1. MLP Base Network

3.2.2. Zone Bias Embedding

3.2.3. Zone Interaction Network

3.3. Third Layer: Hierarchical Loss Function and Spatial Constraints

3.3.1. Micro-Level Proxy Loss (Lmicro)

3.3.2. Macro-Level Census Constraint (Lmacro)

3.4. Evaluation Metrics

3.5. Model Implementation

4. Results and Discussions

4.1. Study Area Overview

4.2. Ablation Experiment

4.3. Model Verification

4.4. Result Visualization

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3.1. Micro-Level Proxy Loss (L_micro)

3.3.2. Macro-Level Census Constraint (L_macro)