An Updating System for the Gridded Population Database of China Based on Remote Sensing, GIS and Spatial Database Technologies

The spatial distribution of population is closely related to land use and land cover (LULC) patterns on both regional and global scales. Population can be redistributed onto geo-referenced square grids according to this relation. In the past decades, various approaches to monitoring LULC using remote sensing and Geographic Information Systems (GIS) have been developed, which makes it possible for efficient updating of geo-referenced population data. A Spatial Population Updating System (SPUS) is developed for updating the gridded population database of China based on remote sensing, GIS and spatial database technologies, with a spatial resolution of 1 km by 1 km. The SPUS can process standard Moderate Resolution Imaging Spectroradiometer (MODIS L1B) data integrated with a Pattern Decomposition Method (PDM) and an LULC-Conversion Model to obtain patterns of land use and land cover, and provide input parameters for a Population Spatialization Model (PSM). The PSM embedded in SPUS is used for generating 1 km by 1 km gridded population data in each population distribution region based on natural and socio-economic variables. Validation results from finer township-level census data of Yishui County suggest that the gridded population database produced by the SPUS is reliable.


Data Sources
The data of the research include the Chinese census data, land use data and ancillary data listed in Table 1. According to the differences in data sources and data types, the criterion and precision also vary. Data preprocessing has been conducted for all data layers, including satellite image correction, projection transformation, and attribution data standardization.

Methodology for Land-use/Land-cover (LULC) Data Updating
The Chinese land use types were used as primary indices in the spatial population model of this paper. The Pattern Decomposition Method (PDM) is applied to MODIS data to obtain the land use/land cover patterns. The vegetation, water and soil coefficients are extracted by PDM for each MODIS image pixel to create a LULC-Conversion Model (LULC-CM). The MODIS images are classified into different land use types based on the relationship between the spectral coefficients and the land-use structure. To eliminate the noise effects to MODIS bands, e.g. cloud effects, we merged the clearest images between July and September which corresponds to the period when vegetation flourished. Thus it was easier to classify the land cover of built-ups and bare soil areas caused by crop harvesting, which is sufficient for modeling spatial distribution of population in annual increments.

(1) Pattern Decomposition Method (PDM)
Spectral response patterns for each pixel of an image can be decomposed into three components using three standard spectral shape patterns determined from the image data [19,20]. Zhang has used the PDM to detect land cover in Miyun District in Beijing of China, and found that there are good correlations between the bands of VIS and LULC type and area [21]. The LULC information can be represented by three standard spectral patterns: vegetation, water and soil, transferred from the 7 VIS bands of MODIS data.
First of all, standard spectrum patterns are extracted from the MODIS L1B data. Surface albedos of the MODIS L1B data within the specified experimental area are normalized to avoid the disturbance of absolute spectrum values. The normalization albedo can be calculated using: where B i is the normalized albedo value of band i; A i is the original surface albedo value of band i; A j is the original surface albedo value of band j; and j is the index of image bands. After choosing the sample pixels of pure water, soil and vegetation in the test area, a 3 × 7 matrix P is obtained by averaging the B i values of the 7 bands. The B i values represent the spectrum pattern of water, vegetation and soil in MODIS band 1 to 7. Taking Shandong Province as an example, the standard spectrum patterns are shown in Table 2. Secondly, pixel spectrum decomposition is conducted based on the standard spectrum patterns. The albedo of each pixel is expressed as a linear combination of the reflectance of each LULC unit: where A i is the surface albedo of the pixel at band i; P iw , P iv , and P is are the standard spectrum patterns of water, vegetation and soil; and C w , C v , and C s represent decomposition coefficients (positive numbers). According to the PDM principle, proportions of the three types of land use in each pixel can be obtained using: where r w , r v , and r s represent the three matrices for water, vegetation and soil proportions in a pixel of 500 m × 500 m.

(2) LULC-Conversion Model (LULC-CM)
The results obtained from Equation (3) cannot be converted to land use types directly. However, the combination of r w , r v , and r s can be associated with land use types. This paper uses the decomposition results and LULC-CM to derive land use types. The LULC-CM is created using the following equation: where C 1 , C 2 , C 3 , X 1 , and Y are matrices, and X 1 = [x1 , x2 , x3 , x4 , x5 , x6], representing different percentages of cropland, forest, grassland, urban residential, rural residential, and water area in each pixel; C 1 , C 2 , and C 3 are modulus of the model and the constant part; r w , r v , r s are obtained from Equation (3).

Method of Population Spatialization
This paper adopts the method of population spatialization model based on the relationship between demographical data and land use types, and redistributes population onto 1 km by 1 km grids [18]. The Data Center for Resources and Environmental Sciences of the Chinese Academy of Sciences (RESDC, CAS) has applied this method to build the 2000 gridded population database of China. The general steps for redistributing population are shown in Figure 1.

(1) Regionalization of Spatial Population Distribution
China is a large country with different population density and land use patterns from the west to the east. According the Fifth Census of China in 2000, the average population densities of the eastern, central, and western China are 452.3, 262.2 and 51.4 (persons/km 2 ), respectively. To obtain more reliable results, we constructed a three-dimensional feature space based on the provincial population density core and regionalization index to divide the whole country into 8 regions using the minimum distance rule [21]. A population model was established for each region using the following steps: 1) Calculation of spatial population characteristic index. Based on population density, economic development, land use structure, transportation network, and river density at county level, Equation 5 is used to calculate the spatial population characteristic index of the model.
where I p is the spatial population characteristic index; P is the total population; GDP is the gross domestic product; S is the total land area; I c is the index of cultivation density ; I rd is the road density; I rl is the railway density; and I r is the residential density. 2) Determining the center of provincial population distribution. Based on the characteristics of a provincial population distribution index (z) and the spatial distribution of population centers (x, y), a three-dimensional space to calculate interprovincial population distribution distance (d) is constructed, and the minimum distance method is used to divide the first level into eight population regions. 3) Secondary population regions are divided by county-level terrain parameters and population density. Due to the limitations of space, the details can be found in our previous research [18].

(2) Population Spatialization Model (PSM)
Quite a few types of methods for distributing census data have been proposed in recent years. The LandScan Global Population Dataset was produced according to a linear relationship between census counts (at sub-national level) and distribution of roads, slope, land cover, nighttime lights [16,17]. Mennis and Hultgren presented an "intelligent" dasymetric mapping technique (IDM), which used the ratio of class densities to redistribute population to sub-source zone areas [23].
A population spatialization model is built to redistribute population of a county into different types of land uses: where P i stands for the total population of the i-th county, a j is the population density of the j-th land use type, x fj is the total area of the j-th land use type in f-th section (km 2 )，and n f is the number of land use types in the current section. According to the rule of "no residential area, no population", the intercept B i is set to zero.
To ensure that predicted population equals to the census statistical results within each administrative unit, the population density of each land use type should be adjusted by the ratio of the predicted population (P i ) and the census count (P i 0 ). The initial coefficient a j of the adjustment is defined as: where a ij is the modified population density for the j-th land use type within the i-th administrative unit; P i and P i 0 are the predicted population and census count of the i-th administrative unit respectively. After the above-mentioned steps, population can be estimated from cell to cell. To create spatialized population data of China, the population of each cell is calculated by linking population density coefficient a ij to land use grid using the following equation:

SPUS System Design and Development
SPUS is a software system combining PDM, LULC-CM, and PSM. Functionalities of SPUS include generation, analysis, visualization, and management of gridded population databases. The system is developed in Microsoft Windows environment and is easy to use. The input to the system includes census data, remotely sensed (such as MODIS) data, and other ancillary data. Figure 2 shows the data processing flowchart of SPUS. The input data includes statistical population data at county level, MODIS/Terra Surface Reflectance 8-Day L3 Global 500 m SIN Grid, and ancillary data such as administrative boundary maps. All data layers are stored in the attribute and spatial databases. As an example of input data layers for SPUS, Figure 3 shows the land use data of Shandong Province in 2002.  Figure 4 shows the schematic representation of system function modules.

(1) Management Module of Spatial Data
The module is a spatial database tool that manages all spatial data based on ArcSDE. It can be used to import, export and query spatial data (including raster and vector databases).

(2) Management Module of Statistical Data
The module manages all kinds of statistical data including population, socio-economic data and ancillary attribution data using Oracle 9i. The functions include data input, storage, output with common formats such as Excel, XML, MDB, DBF, data checking, quality control, and data querying.

(3) Spectral Pattern Decomposition Module
The module executes the PDM to conduct processing of standard MODIS data to generate ASCII files of coefficients using IDL in the .Net environment. ArcEngine is used to convert ASCII files to GRID format. The module is a mixture of .Net, IDL and ArcEngine.

(4) LULC-CM Module
This is a functional module that executes LULC-CM. The module includes three steps: 1) Train the sample areas selected according to the regionalization to get the modulus which can be considered repository of SPUS as the basic information of LULC. 2) Apply the LULC-CM to convert the result of PDM to six different land use types defined as cropland, forest, grassland, city, rural residential area and water in a pixel of 500 by 500 meters. 3) Check and import LULC data into the spatial database. Considering the advantage of IDL in scientific computation, LULC-CM was developed by IDL. The IDL program generates ASCII files with the percentages of six LULC types in each pixel, and then uses ArcEngine to create LULC grids for spatial analysis.

(5) Population Data Spatialization Module
The module is the core of SPUS. It executes the PSM to redistribute the statistical population data on grids by combining the LULC grid, province vector data, county vector data and other ancillary data. It is a GIS system developed with ArcEngine consisting of three major functions.  Data display. The main interface that can add and display spatial data and provide functions of querying, panning, zooming in, zooming out, selecting and saving.  Model processing. To create spatial population grids based on PSM.  Results verification. To adjust the primary results according to total population control within county level units to generate the final result of gridded spatial population dataset.

(6) Spatial Analysis Module
SPUS provides some common functions of GIS spatial analysis such as grid calculation, projection transformation, aggregation, buffering, overlaying (union, intersect, erase), zonal and neighborhood analysis. Users can combine these functions easily to obtain new spatial indices such as regional total population, population growth rate, and degree of population aggregation.

Results
The most important functionality of the SPUS system is to update gridded population data. Figure 5 shows the gridded population data of Shandong Province of China in 2002 based on MODIS data. The spatial resolution of grid is 500 m by 500 m, and the maximum value of population in 0.25 sq km grid in Shandong Province is 3,094 persons. There are some high-value areas on the image within countylevel administrative units showing the urban areas with higher population density.
Verification of redistributed China population data is time-consuming mainly because of the difficulty in establishing a suitable reference database for comparison purposes. It is difficult to get actual census counts for 1 km × 1 km grids. However, census data of sub-county units, which is township level in China, could be obtained in some provinces. A substitute approach has been designed for verification based on these population data. The following steps were carried out for towns with census data: (1) Create vector boundary maps of towns with census data; (2) Overlay gridded population data with boundary maps; (3) Accumulate population of all cells within these towns; and (4) Compare population estimations with census data. We have collected census data of 19 towns in Yishui County, Shandong Province. Verifications have been conducted using 2002 statistical data.   Table 2 shows that the relative errors between predicted population and actual census counts vary from 0.51% to 25.63%, with eight towns lower than 10%, eight higher than 20%, and the remaining three between 10%-20%. The accuracies are acceptable for many applications at the county, province, or national levels.

Conclusions
This paper describes the design and implementation of the Spatial Population Updating System (SPUS) integrating GIS, remote sensing, spatial database, and statistical methods. The system combines several modules seamlessly for efficient generation, analysis, visualization, and management of spatial population datasets. Preliminary verification of 2002 gridded population datasets using township level census data suggests that the gridded population datasets can be used for many applications at a regional or national scale. Accuracies of gridded population data derived from currently available global remote sensing data are inadequate for modeling population distribution and detecting changes in the spatial distribution of population on annual increments. Further research in two major directions may be important. The first is to improve model accuracy with more factors related to the spatial distribution of population (especially in urban areas), and validate population models with more accurate census counts at finer resolutions. The other is to extend the SPUS functions for more real applications, such as adding new population indices to help users build spatial population databases.