Mapping Fine-Scale Urban Spatial Population Distribution Based on High-Resolution Stereo Pair Images, Points of Interest, and Land Cover Data

: Fine-scale population distribution is increasingly becoming a research hotspot owing to its high demand in many applied ﬁelds. It is of great signiﬁcance in urban emergency response, disaster assessment, resource allocation, urban planning, market research, and transportation route design. This study employed land cover, building address, and housing price data, and high-resolution stereo pair remote sensing images to simulate ﬁne-scale urban population distribution. We ﬁrstly extracted the residential zones on the basis of land cover and Google Earth data, combined them with building information including address and price. Then, we employed the stereo pair analysis method to obtain the building height on the basis of ZY3-02 high-resolution satellite data and transform the building height into building ﬂoors. After that, we built a sophisticated, high spatial resolution model of population density. Finally, we evaluated the accuracy of the model using the survey data from 12 communities in the study area. Results demonstrated that the proposed model for spatial ﬁne-scale urban population products yielded more accurate small-area population estimation relative to high-resolution gridded population surface (HGPS). The approach proposed in this study holds potential to improve the precision and automation of high-resolution population estimation.


Introduction
Population distribution data on a fine scale are of great significance in many areas including public health, emergency evacuation, disaster management, resource allocation, urban planning, market research, and transportation route design [1]. Such data was once only presented as choropleth maps, where the population numbers, normally derived from demographic surveys (e.g., census data), were aggregated over enumeration units represented by irregular polygons (e.g., census blocks). Although this method obtains accurate population information, it takes a long time, requires large workloads, and high costs, and presents the information of population distribution in an aggregated way, sometimes due to confidentiality, which may hinder more accurate population-based analyses. Choropleth maps have some other limitations [2], including the inaccuracy caused by periodic changes of the boundary of data [18]. Results obtained through object-based image analysis (OBIA) of QuickBird imagery are used for a subset of a highly populated area in western Kenya, and shows that using houses as classified from very high spatial resolution remote sensing (VHR) imagery for a study area subset works well for redistributing human population at the regional level [19]. However, when the targeted area or parcel is complex (not only does it include various land use and cover, but also the buildings have different residential attributions), the single land-cover data, socioeconomic, or high-resolution remote sensing data is not enough as ancillary data to model the fine-scale population distribution.
To fill this gap, this study explores the frontier by presenting an approach to modeling urban nighttime population distribution at the building level with both the footprint and height of buildings considered as ancillary data. First, we classify the study area as residential and non-residential on the basis of land use, building address, housing price, and Google Earth data. Then, in residential zones we extract building footprint and height from high-resolution stereo pair remote sensing images and a digital terrain model (DTM). The building height is further transformed into the number of building floors on the basis of building standards adopted in the study area. Finally, weighted area metric and volumetric models are employed to disaggregate census population to individual residential buildings. The approach developed in this study can be used in the urban population survey and community emergency management. Findings of this study were assessed in 12 communities in Chaoyang district of Beijing, China. Through integrated high-resolution stereo pair images with points of interest and land cover data, a hierarchical classification and simulation approach is proposed to solve the mapping of fine-scale spatial population distribution in a complex urban area.

Study Area
The study area is the Olympic Village in Chaoyang District in Beijing, China, with an area of 19.6 km 2 on a flat terrain and a population of about 130,000 as of 2015 living in 12 neighborhoods. The land use types of the study area include residential area, school, hospital, stadium, river, lake, and forest. The Olympic Village was initially built to accommodate athletes during the Beijing 2008 Olympic Games and was converted into residential areas afterwards. Compared to most conventional residential areas in Beijing, the population density in the Olympic Village is more homogeneous among buildings with the same physical properties, which makes it a suitable area for this study in which we focus on the relationship only between population density and the physical properties of buildings, with the hope of minimizing the effects of other factors on population density. The location of the study area is shown in Figure 1.

Data
The administrative boundary of the Olympic Village was acquired from the National Geomatics Center of China [20]. The census data were collected from the local Bureau of Statistics, surveyed in 2015, which recorded the total population in each neighborhood within the Olympic Village. The high-resolution stereo pair remote sensing images covering the Olympic Village in a spatial resolution of 2.1 m (nadir) and 2.5 m (forward and backward) for the panchromatic band were derived from the ZY3-02 satellite, which is the second satellite of the ZiYuan3 satellite series (China's first civil three-line array stereo mapping satellite) [21]. Phased 180° apart in the same orbit, the ZY3-02 has increased the efficiency of in-orbit data acquisition and enhanced the acquisition of spatial geographic information, and hence could support mapping services for land resource surveys and monitoring.
The digital elevation model (DEM) data, with a spatial resolution of 10 m, was collected from Geographical Information Monitoring Cloud Platform [22]. The housing price data in the Olympic Village at the end of 2015 was collected from Homelink [23], which is the largest real estate trading platform in China. The points of interest (POI) data were obtained from Baidu Map [24], the most widely used web map service provider in China. The land cover data in 2010 were obtained at a spatial resolution of 30 m from the GlobeLand30 that included 10 land cover classes: water bodies, wetland, artificial surfaces, tundra, permanent snow and ice, grass lands, barren lands, cultivated land, shrub lands, and forest [25].

Data
The administrative boundary of the Olympic Village was acquired from the National Geomatics Center of China [20]. The census data were collected from the local Bureau of Statistics, surveyed in 2015, which recorded the total population in each neighborhood within the Olympic Village. The high-resolution stereo pair remote sensing images covering the Olympic Village in a spatial resolution of 2.1 m (nadir) and 2.5 m (forward and backward) for the panchromatic band were derived from the ZY3-02 satellite, which is the second satellite of the ZiYuan3 satellite series (China's first civil three-line array stereo mapping satellite) [21]. Phased 180 • apart in the same orbit, the ZY3-02 has increased the efficiency of in-orbit data acquisition and enhanced the acquisition of spatial geographic information, and hence could support mapping services for land resource surveys and monitoring. The digital elevation model (DEM) data, with a spatial resolution of 10 m, was collected from Geographical Information Monitoring Cloud Platform [22]. The housing price data in the Olympic Village at the end of 2015 was collected from Homelink [23], which is the largest real estate trading platform in China. The points of interest (POI) data were obtained from Baidu Map [24], the most widely used web map service provider in China. The land cover data in 2010 were obtained at a spatial resolution of 30 m from the GlobeLand30 that included 10 land cover classes: water bodies, wetland, artificial surfaces, tundra, permanent snow and ice, grass lands, barren lands, cultivated land, shrub lands, and forest [25].

Extraction of Residential Buildings
Only the population in residential areas was considered in this study of nighttime population modeling. A three-step procedure was used to distinguish residential areas from other classes: residential zones extracted are divided into three steps through a classification of hierarchical decision. Firstly, separate the zones of rivers, lakes, and forests from residential areas on a coarse scale using the land cover data according to its type code for each type of land cover has a sole type code. Then, the zones of stadiums, hospitals, schools, and factories in the study area are separated from the residents' layers according to the building addresses obtained from Google Earth. Finally, the commercial zones are separated from the ordinary residential zones using the housing price data to achieve a fine classification of urban residential communities, for there is a distinct difference in price between commercial buildings and housing buildings in Beijing.
The residential buildings are extracted through image classification in the residential zones after the residential zones are extracted from the study area because buildings have spectral reflectance very similar to roads and streets. Urban areas are also often full of buildings with multicolor roofs, of different shapes and sizes [1]. In such complex urban environments, conventional pixel-based image classification methods have limited capabilities in processing high resolution data [1,26], and often result in misclassification and low accuracy [27]. Object-based classification methods can produce better separation among spectrally similar classes with high accuracy [28]. A rule-based, object-based image classification approach was used to extract buildings on the basis of the information of image spectrum, texture, and gradient.

Estimation of Building Height
The ZY3-02 satellite images were used to build a stereo image pair to extract the building height.
(1) Three sets of combinations of different pairs can be generated (i.e., nadir and forward (NF), nadir and backward (NB), and forward and backward (FB)). This study used the NF pair to extract the digital surface model (DSM). The DEM extraction module in the software of ENVI 5.0 was used based on the polynomial coefficients, ground control points (GCPs), and tie-points to calculate a mathematical model which relates the columns and rows of matched pixels with coordinates and elevations. (2) The rational polynomial coefficients (RPC) built on the WGS84 geocentric coordinate system were employed to establish the stereo image pair model and produce a DSM on the basis of the automatic matching method [29]. (3) The generated DSM has a spatial resolution of 2.1 m. It is then co-registered with the DEM layer and the value of it is subtracted and the height information in the residential zones is obtained.

Population Modeling
We overlapped the height layer with the polygon layer of building location and assigned the mean height value in the polygon of the building as the building height. Since there is no single family housing in the study area of the Olympic Village, all the buildings are multi-family housing, considering that the building floor has a more close relationship with the population than building height, and the use of the building floor in the model can reduce the impact of building height inversion error on the accuracy of population distribution simulation. The study uses building floor to calculate the population density instead of the frequently-used building height in the model. The simplified format can be expressed as the following.
where i is the number of buildings, N i represents the account of the ith building floors in the residential zones. H i represents the height of the ith building in the residential zones. F i represents the average floor height of the ith building in the residential zones. Since different buildings have different floor heights the study area varies from 2.8-3.2 m. The floor height for each building or floor is difficult to survey. Therefore, we set a fixed value of 3 m instead of the average floor height of building F i in Equation (1).
The estimated population per building is calculated according to the census data, the building type, area, and the account of building floors.
where j is the number of pixels or grids in the study area; n is the total pixels in the study area; C j represents the building type (=1 if residential, =0 otherwise); W is the total population in the study area acquired from census; N j represent the building floors of pixel j; A represents the area of a pixel or grid in the study area (size = 2.1 m × 2.1 m). N i and A i represent the building floors and the area of building i. The whole flowchart of methods for dasymetric modeling is shown in Figure 2.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 15 The estimated population per building is calculated according to the census data, the building type, area, and the account of building floors.
where j is the number of pixels or grids in the study area; n is the total pixels in the study area; C represents the building type (=1 if residential , =0 otherwise); W is the total population in the study area acquired from census; N represent the building floors of pixel j; A represents the area of a pixel or grid in the study area (size = 2.1 m × 2.1 m). N and A represent the building floors and the area of building i. The whole flowchart of methods for dasymetric modeling is shown in Figure 2.

Validation
We compared our modeling results with true values in order to validate the model accuracy. There are no building scale demographic data in the study area. The minimum statistical demographic unit of the study area is community. Therefore, we took the population number of the modeling results in each community and then compared it with the demographics data of it to compute the model error in each community. Figure 3 shows the classified result of residential zones and non-residential zones in the study area. The residential zones are mainly distributed in the northeast and southwest of the Olympic

Validation
We compared our modeling results with true values in order to validate the model accuracy. There are no building scale demographic data in the study area. The minimum statistical demographic unit of the study area is community. Therefore, we took the population number of the modeling results in each community and then compared it with the demographics data of it to compute the model error in each community. Figure 3 shows the classified result of residential zones and non-residential zones in the study area. The residential zones are mainly distributed in the northeast and southwest of the Olympic Village. The area of the residential zones is 3.08 km 2 which accounts for one sixth of the whole study area. The 12 communities in the study area are separated into 21 blocks.   Figure 4 shows the extracted buildings in the residential zones using objected-oriented classification. There are overall 542 buildings identified in the residential zones. The mean size of the buildings is 953 m 2 , the maximum is 5059 m 2 and the minimum is 130 m 2 . The overall achieved accuracy and kappa are 94% and 0.92, respectively.   Village. The area of the residential zones is 3.08 km 2 which accounts for one sixth of the whole study area. The 12 communities in the study area are separated into 21 blocks.  Figure 4 shows the extracted buildings in the residential zones using objected-oriented classification. There are overall 542  The extracted building heights in the residential zones are shown in Figure 5. As a statistic, we graded them into five ranges including <20 m, 21-40 m, 41-60 m, 61-80 m, and >80 m. In total, 358 buildings have a height below 20 m, and account for 66% of the residential buildings. The highest building is 95 m, and only two buildings have a height that is over 80 m.

Results
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 15 The extracted building heights in the residential zones are shown in Figure 5. As a statistic, we graded them into five ranges including <20 m, 21-40 m, 41-60 m, 61-80 m, and >80 m. In total, 358 buildings have a height below 20 m, and account for 66% of the residential buildings. The highest building is 95 m, and only two buildings have a height that is over 80 m. Currently, building height information is widely used in high-resolution population estimation, but building floors have a closer correlation with the population than building height, and the use of the building floor in the model reduces the building height inversion error influence on population distribution simulation accuracy, and also improves the model calculation efficiency. Therefore, we used the building floor instead of the common building height in the model. The extracted building floor in the residential zones is as shown in Figure 6. Currently, building height information is widely used in high-resolution population estimation, but building floors have a closer correlation with the population than building height, and the use of the building floor in the model reduces the building height inversion error influence on population distribution simulation accuracy, and also improves the model calculation efficiency. Therefore, we used the building floor instead of the common building height in the model. The extracted building floor in the residential zones is as shown in Figure 6.
According to Equation (2), the estimated building population was calculated based on total population and building floor and area data. We classified the population density into six types including 0, 1-10, 11-100, 101-500, 501-1000, and >1000. The result is shown in Figure 7. Overall, 542 buildings have a total population of 139,223, with an average population of 257, a minimum of 0, a maximum of 3322, and a standard deviation of 342.
The study area includes 12 communities (census blocks, see Figure 8); we used the census data of 12 communities in 2017 to validate the estimated population produced by the model. The population estimation errors are shown in Figure 9. Six census blocks were overestimated, and the other six blocks were underestimated. The maximum is block 1, with an error of 34%. The minimum is block 8, with an error of −3%. The errors in the study were mainly caused by the processes of building area and height extraction. The accuracy of the population simulation relies on the methods of object classification and height extraction. The object-oriented classification used in the study is a comprehensive analysis on the basis of the spectral, shape, texture, shadow, and spatial position of each object. When the size, shape, and distribution of the buildings are relatively regular, the classification accuracy is high. However, there is an absolute error mainly ranging from 2 m to 5 m while using the stereo pair photogrammetry. The bigger the building height, the smaller the relative error. Therefore, our method is more effective for urban areas where residential buildings have a regular size, shape, distribution, and high-rise. According to Equation (2), the estimated building population was calculated based on total population and building floor and area data. We classified the population density into six types including 0, 1-10, 11-100, 101-500, 501-1000, and >1000. The result is shown in Figure 7. Overall, 542 buildings have a total population of 139,223, with an average population of 257, a minimum of 0, a maximum of 3322, and a standard deviation of 342. The study area includes 12 communities (census blocks, see Figure 8); we used the census data of 12 communities in 2017 to validate the estimated population produced by the model. The population estimation errors are shown in Figure 9. Six census blocks were overestimated, and the other six blocks were underestimated. The maximum is block 1, with an error of 34%. The minimum is block 8, with an error of −3%. The errors in the study were mainly caused by the processes of building area and height extraction. The accuracy of the population simulation relies on the methods

Discussion
The study proposed a new model to estimate the building-scale spatial population in an urban area. Combined with the ZY3 stereo pair remote sensing images, the method can also be used to

Discussion
The study proposed a new model to estimate the building-scale spatial population in an urban area. Combined with the ZY3 stereo pair remote sensing images, the method can also be used to

Discussion
The study proposed a new model to estimate the building-scale spatial population in an urban area. Combined with the ZY3 stereo pair remote sensing images, the method can also be used to produce a new fine-scale population data based on global population products such as GPW, LansScan, and WorldPop instead of census data. The population distribution has the characteristics of spatial dynamic. In order to simplify the model, this research ignored the time dynamic, mainly studied the static night population spatial distribution simulation method. According to the existing research results, although there are many factors that could affect the population distribution in residential buildings, the volume is the optimal factor of fine scale population distribution simulation. The use of remote sensing images, address data, housing price data, land use maps, DEM date, and other publicly available supplementary data is effective in identifying residential buildings.
Estimating the building height is a key step in the study of fine-scale population simulation. The methods of building height retrieval using remote sensing can be divided into four types including building shadow-based method, radar image-based method, LiDAR point cloud data-based method, and stereo pairs-based method. The building shadow-based method is a widely application on building height estimating [30], but it has great limitations in that it cannot obtain an accurate building height in the case of terrain interference or dense buildings. The building height estimating method based on the synthetic aperture radar still faces problems such as difficulty in image registration and interpretation and low degree of automation [31]. The three-dimensional (3D) reconstruction of buildings on the basis of LiDAR data is a method that has grown rapidly in recent years; it mainly uses the segmentation method to obtain the contours and heights of buildings with high precision. However, the LiDAR data are expensive, cover a small area, and the data are difficult to acquire [32,33]. The stereo pairs-based method of building height inversion has a distinct principle and the data is easy to obtain. The method is relatively easy and has a high accuracy, which is a viable and reliable method for building height estimating, but it is mainly on the basis of aerial stereo pairs [34].
Points of interest data such as mobile phones, Baidu, Wechat, Twitter, or OpenStreetMap is a very useful source to estimate the population density in an urban area [35][36][37]. In particular, combined points of interest data and remote sensing images improve the accuracy of population modeling [38]. However, remote sensing data employed in the model were always limited by the spatial resolution which mainly ranged from 30 m to 1 km. Our study is the first to use the ZY3-02 satellite images of stereo pairs to retrieve building height for fine-scale population simulation. Although it may not have very high accuracy, it has a meter level spatial resolution and provides potential for the global fine-scale spatial population estimation.
There are also some limitations in this study. First, the current models for population estimation or distribution simulation are mostly on the basis of the statistical regression methods, namely using population-related indicators to establish a population estimation model to obtain the spatial distribution of the population. The statistical regression method relies on the samples and has poor generalizability in larger areas. Second, the accuracy of object classification and height extraction could influence the accuracy of population modeling. However, there is not a single method of classification or object extraction that can avoid all errors. Yet we can improve the model by incorporating spatial data with higher resolution (e.g., airborne LiDAR data) and even the modern big data (e.g., individual-level tracking data) [39]. Third, this model is only for simulating night population spatial distribution, because during the daytime the population is mainly located at work, study, or leisure places such as factories, schools, and malls. Finally, the model may be more suitable for urban areas than rural areas.

Conclusions
The aim of this work was to simulate the urban nighttime population distribution on a fine spatial scale. We firstly extracted the residential zones based on the data of LUCC and Google Earth data, combined with the building address, prices, and other information. Then we employed the stereo pair analysis method to calculate the building height on the basis of high spatial resolution ZY3-02 satellite remote sensing data, and transformed building height into building floors. After that, we built a sophisticated, high spatial resolution model of population density. Finally, we evaluated the accuracy of the model using the survey data in 12 communities of the study area. The accuracy of the model is mainly in the methods of object classification and height extraction. On the basis of the object-oriented classification and stereo pair photogrammetry, our method is more effective to urban areas where the residential buildings are more regular in size, shape, distribution and high-rise. In our future work, higher spatial resolution remote sensing imagery will used to improve the accuracy of the building area exaction, and LiDAR data will be added to estimate the building height. The fine-scale urban population spatial products are expected to serve as a more accurate input in various research fields, such as public health, emergency rescue, and climate change. The approach outlined provides an improved means of producing spatially-explicit population data. Also, the widely used world population data such as high resolution settlement layer (HRSL) from Center for International Earth Science Information Network (CIESIN) can be added in our model to improve the accuracy in future research [40].