Accuracy Assessment of Multi-Source Gridded Population Distribution Datasets in China

: Population is one of the core elements of sustainable development. Quantifying the estimation accuracy of population spatial distribution has been recognized as a critical and challenging task. This study aims to evaluate the data accuracy of four population datasets in China, including three global gridded population datasets, the Gridded Population of the World (GPW), Global Rural and Urban Mapping Project (GRUMP), and WorldPop project (WorldPop), and a Chinese regional gridded population dataset, the China 1 km Gridded Population (CnPop) dataset. These datasets are assessed using a speciﬁc method based on a GIS-linked 2000 census dataset at the township level in China. The results indicate that WorldPop had the highest estimation accuracy, estimating about 60% of the total population. CnPop accurately estimated about half of the total population, showing a good mapping performance. The GPW had an acceptable estimation accuracy in a few plain and basin areas, accounting for about 30% of the total population. Compared to the GPW, GRUMP accurately estimated about 40% of the total population. The relative estimation error analysis discovered the disadvantages of the generation strategies of these datasets. The conclusions are expected to serve as a quality reference for potential dataset users and producers, and promote accuracy assessment for population datasets in other regions and globally. illustrate a similar trend: dense population distributions in the east and sparse ones in the west, although the details vary. The coastal zones, the North China plain, middle and lower reaches of the Yangtze River, Sichuan Basin, Songliao Plain, and Weihe Valley are the main densely populated regions. The Qinghai-Tibet Plateau, Inner Mongolia Plateau, Loess Plateau, and most areas in the Xinjiang Uygur Autonomous Region are sparsely populated. In addition to these generally densely or sparsely populated areas, the populations in the hilly areas, the ecotone between agriculture and animal husbandry, and the Yunnan-Guizhou Plain are scattered with a relative medium density. In the mapping performance, the GPW and GRUMP seem similar to Figure 2, and CnPop and WorldPop showed a little difference compared to that figure, which was mainly caused by the different spatial resolutions. The spatial resolutions of the averaged population density maps at the township level for 27 provinces in China, the GPW, GRUMP, CnPop, and WorldPop, were approximately 13.4 km, 5 km, 1 km, 1 km, and 100 m, respectively. The closer the spatial resolution, the more similar the mapping effect. 26.85%, and 22.65% for CnPop; 29.23%, 24.25%, and 46.52% for the GPW; 37.54%, 28.78%, and 33.58% for GRUMP; and 57.4%, 27.04%, and 15.56% for WorldPop. This shows that a majority of townships in the CnPop, GRUMP, and WorldPop datasets fell within the accurately estimated category, while most townships in the GPW dataset were in the greatly underestimated or greatly overestimated category. The second majority of townships in the CnPop and WorldPop datasets fell within the underestimated or overestimated category, while that for the GRUMP dataset was in the greatly underestimated or greatly overestimated category. The percentage of town populations that fell within the greatly 24.25%, and 46.52% for the GPW; 37.54%, 28.78%, and 33.58% for GRUMP; and 57.4%, 27.04%, and 15.56% for WorldPop. This shows that a majority of townships in the CnPop, GRUMP, and WorldPop datasets fell within the accurately estimated category, while most townships in the GPW dataset were in the greatly underestimated or greatly overestimated category. The second majority of townships in the CnPop and WorldPop datasets fell within the underestimated or overestimated category, while that for the GRUMP dataset was in the greatly underestimated or greatly overestimated category. The percentage of town populations that fell within the greatly underestimated or greatly overestimated category in the WorldPop datasets was 15.56%, which was the smallest value of the four datasets.


Introduction
The world population has increased dramatically from 1.6 billion in 1900 to 7.6 billion in mid-2017, and 59.66% of those people live in Asia [1,2]. The issue of population is one of the biggest problems in creating a sustainable society today. Adequate knowledge of population distributions has proven to be essential in many domains, such as environmental impact assessments, disaster prevention and mitigation, medical treatments, regional sustainable development, and climate change evaluations [3][4][5][6]. Commonly available information on population distribution and composition largely relies on demographical data, generally counted using census tracts, blocks, postcode zones, townships, and villages. In many cases, however, due to the defined spatial units used for data collection and reporting, these statistical datasets have severe limitations. First of all, in statistical data, population density in an administrative unit is a single value; hence, it does not reflect spatial distribution and internal variation [7]. Second, over time, census tracts change along with the gradual restructuring

Vector Boundary and Census Data at the Township Level
The administrative divisions of China are officially organized into five levels. From top to bottom, they are as follows: (1) provinces (Sheng), autonomous regions (Zizhiqu), municipalities (Zhixiashi), and special administrative regions (Tebiexingzhengqu); (2) prefectures (Shi, Diqu, Meng, and Zhou); (3) counties (Xian, Qu, and Qi); (4) townships (Xiang, Zhen, Jiedao, and Sumu); and (5) villages (Cun, Shequ, and Gacha). The spatial scale of the accuracy assessment used in this study is at the township level, which is also the finest cell for which census data can be publicly accessed in China. Xiang, Zhen, and Jiedao are three types of township: Xiang and Zhen are found in rural areas, while Jiedao are found in urban areas.
The vector boundary data at the township level were obtained from the Data Sharing Platform of Earth System Science of the National Science and Technology Infrastructure of China (http://www.geodata.cn/). The original boundary datasets were manually digitized from township division maps collected in 2000 at a scale of 1:250,000 in the different provinces of China. Data from Heilongjiang, Guangxi, Xinjiang, and Gansu provinces were unavailable; thus, the vector boundary data at the township level used in this study covered 27 provincial regions, including Liaoning, Jilin, Inner Mongolia Autonomous Region (part), Beijing, Tianjin, Shanghai, Hebei, Henan, Shaanxi, Shanxi, Ningxia, Shandong, Anhui, Jiangsu, Hunan, Hubei, Jiangxi, Zhejiang, Fujian, Guangdong, Hainan, Yunnan, Guizhou, Sichuan, Chongqing, Qinghai, and the Tibet Autonomous Region, as shown in Figure 1. The Hong Kong Special Administrative Region, Macao Special Administrative Region, and Taiwan province are not included in this study.
Census data were obtained from China's fifth census dataset for 2000 at the township level (Xiang, Zhen, and Jiedao), which was released by the National Bureau of Statistics (http://www.stats.gov.cn/). To ensure that the vector boundary data and census datasets could be spatially and correctly joined, several pre-processing steps were performed, such as geo-referencing, boundary adjustments, geocoding, and topology error checks. The final population spatial dataset at the township level contained 33,631 spatial units with a total population of 1,103,880,920, which accounted for 87.21% of To ensure that the vector boundary data and census datasets could be spatially and correctly joined, several pre-processing steps were performed, such as geo-referencing, boundary adjustments, geocoding, and topology error checks. The final population spatial dataset at the township level contained 33,631 spatial units with a total population of 1,103,880,920, which accounted for 87.21% of the total population of China in 2000. The total area was 6,018,171.42 km 2 , representing 62.69% of the area of mainland China. The mean spatial resolution (which is equal to sqrt (area/number of units)) [11] of the township units was 13.4 km. Table 1 presents the general characteristics of the four gridded population distribution datasets being assessed in this study. The CnPop dataset was obtained from the Data Sharing Platform of Earth System Science of National Science and Technology Infrastructure of China. It is a national-scale dataset that covers mainland China with a spatial resolution of 1 km using census data at the county level and land use data at a scale of 1:100,000 as inputs [18]. There are two steps used to generate CnPop: (1) estimate the population density of each land use type using the least squares method in each sub-region; and (2) calculate the population in each grid cell on the basis of the population density and area of each land use type.

Gridded Population Distribution Datasets and Pretreatment
The GPWv3 and GRUMPv1 were downloaded from the Center for International Earth Science Information Network's website (http://sedac.ciesin.columbia.edu/). The GPWv3 has a spatial resolution of 2.5 arc minutes [10]. Using tables of population counts listed by administrative area and spatially explicit administrative boundary data as two basic inputs, the population in each administrative unit was allocated to the grid cells assigned proportionally to that unit using a simple areal weighting algorithm.
The GRUMP dataset, which has a spatial resolution of 30 arc-seconds, is a 'lightly' modeled dataset product based on the GPW [11]. The allocation mechanism for GRUMP builds on the GPW approach but explicitly considers the populations of urban areas. In addition to data of statistical reporting units, population estimates, point allocations, and footprints for urban centers of each country were collected. Night-time satellite images were widely used to identify the urban settlements of major cities.
The WorldPop va dataset, which has a spatial resolution of 3 arc-seconds, was downloaded from the official WorldPop project website (http://www.worldpop.org.uk/). It is modeled using the MacDonald Dettwiler and Associates (MDA) GeoCover database and auxiliary data like OpenStreetMap for the correction of residential and building area distribution [33]. The population densities per land cover type were estimated by using the refined land cover layer and enumerated demographical data. Then, taking the population densities as weights, the demographical data were redistributed across each grid cell across the entire region [33,34].
The geographical reference of the four gridded population distribution datasets and the population dataset at the township level were unified to WGS 84, which is an internationally adopted geocentric coordinate system, and the Albers equal area projection, which is a conic, equal area map projection that is widely used in Asia and Europe area. For the population count layers of the GPW, GRUMP, and WorldPop, original raster layers were converted to point layers. Then, the point layers were reprojected and clipped by the Chinese mainland boundary data. The population density layers of the GPW, GRUMP, and WorldPop were masked by the Chinese mainland boundary data and converted into an Albers equal area projection using the nearest resampling technique (NEAREST), with a spatial resolution of 5 km, 1 km, or 100 m. NEAREST performs a nearest neighbor assignment, and is the fastest of the interpolation methods.

Accuracy Assessment Method
The estimated population numbers of the four datasets in each polygon of the 33,631 township units were counted separately using Zonal Statistics, a tool that calculates statistics on values of a raster within the zones of another dataset, in the ArcGIS software. Then, the absolute estimation error (AEE) and relative estimation error (REE) were calculated using Formulas (1) and (2): where AEE ij is the absolute estimation error of township j based on dataset i, P j is the census count of township j, and PE ij is the estimated population of township j based on dataset i. When i = 1, i = 2, i = 3, and i = 4, the dataset is CnPop, GPW, GRUMP, and WorldPop, respectively. In Formula (2), REE ij is the relative estimation error of township j based on dataset i, which is a fractional value that has considered the population size of each township. For the convenience of mapping and in-depth analysis, REE ij was classified into five categories ( Table 2). The estimation accuracy was evaluated using the following three steps. First, taking each township as an entity, the average population density of the 33,631 township units was calculated and set as the guideline for visually inspecting and comparing the mapping performance of CnPop, GPW, GRUMP, and Worldpop. Second, the scatter plots of PE ij and P j were constructed and the correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) between PE ij and P j were calculated. Third, the REE ij values were classified and mapped to identify the spatial distribution of the error and the amount and percentages of the total population and area in each REE ij range were summed for the purpose of analysis and comparison. China. Figure 3a-d, respectively show CnPop, GPW, GRUMP, and WorldPop datasets on the basis of each population density layer for China. In comparison to Figure 2, the four maps in Figure 3 illustrate a similar trend: dense population distributions in the east and sparse ones in the west, although the details vary. The coastal zones, the North China plain, middle and lower reaches of the Yangtze River, Sichuan Basin, Songliao Plain, and Weihe Valley are the main densely populated regions. The Qinghai-Tibet Plateau, Inner Mongolia Plateau, Loess Plateau, and most areas in the Xinjiang Uygur Autonomous Region are sparsely populated. In addition to these generally densely or sparsely populated areas, the populations in the hilly areas, the ecotone between agriculture and animal husbandry, and the Yunnan-Guizhou Plain are scattered with a relative medium density. In the mapping performance, the GPW and GRUMP seem similar to Figure 2, and CnPop and WorldPop showed a little difference compared to that figure, which was mainly caused by the different spatial resolutions. The spatial resolutions of the averaged population density maps at the township level for 27 provinces in China, the GPW, GRUMP, CnPop, and WorldPop, were approximately 13.4 km, 5 km, 1 km, 1 km, and 100 m, respectively. The closer the spatial resolution, the more similar the mapping effect.

Visual Inspection of Mapping Performance
Just in the mapping performance of CnPop, GPW, GRUMP, and WorldPop, visual differences between the four population distribution maps were rather obvious. Both CnPop and WorldPop are modeled datasets based on land use or land cover data; thus, they offer more spatial heterogeneity details than the GPW and GRUMP. The divergence in the population distribution between rural and urban areas and across various landscapes is accurately characterized by WorldPop and CnPop, whether in the heavily populated southeast area or the sparsely populated northwest region. The population density in CnPop's grid cells documents the gradual change from high to low values across the entire map. However, the boundaries of various population density ranges are abrupt on the WorldPop map, because of the coarse classification effect of the MDA GeoCover dataset. The GPW is roughly patched compared to the other three maps, which can be mainly attributed to its simple areal weighting method and coarse resolution. GRUMP fails to represent the spatial pattern in the sparsely populated parts of the western and northern areas. However, in the eastern and southern areas, where the extents of urban and rural regions can be identified with night-time light images or ancillary data, the population is considerably redistributed and the discrimination of the population density is significantly enhanced in comparison to the GPW. In urban areas, GRUMP shows a ring pattern in which low-density areas surround high-density ones, which is similar to the night-time light images. In sum, the ranking of the overall mapping performance from fine to coarse is as follows: CnPop, WorldPop, GRUMP, and GPW. In addition to these generally densely or sparsely populated areas, the populations in the hilly areas, the ecotone between agriculture and animal husbandry, and the Yunnan-Guizhou Plain are scattered with a relative medium density. In the mapping performance, the GPW and GRUMP seem similar to Figure 2, and CnPop and WorldPop showed a little difference compared to that figure, which was mainly caused by the different spatial resolutions. The spatial resolutions of the averaged population density maps at the township level for 27 provinces in China, the GPW, GRUMP, CnPop, and WorldPop, were approximately 13.4 km, 5 km, 1 km, 1 km, and 100 m, respectively. The closer the spatial resolution, the more similar the mapping effect. Just in the mapping performance of CnPop, GPW, GRUMP, and WorldPop, visual differences between the four population distribution maps were rather obvious. Both CnPop and WorldPop are modeled datasets based on land use or land cover data; thus, they offer more spatial heterogeneity details than the GPW and GRUMP. The divergence in the population distribution between rural and urban areas and across various landscapes is accurately characterized by WorldPop and CnPop, whether in the heavily populated southeast area or the sparsely populated northwest region. The population density in CnPop's grid cells documents the gradual change from high to low values across the entire map. However, the boundaries of various population density ranges are abrupt on the WorldPop map, because of the coarse classification effect of the MDA GeoCover dataset. The GPW is roughly patched compared to the other three maps, which can be mainly attributed to its simple areal weighting method and coarse resolution. GRUMP fails to represent the spatial pattern in the sparsely populated parts of the western and northern areas. However, in the eastern and southern areas, where the extents of urban and rural regions can be identified with night-time light images or ancillary data, the population is considerably redistributed and the discrimination of the population density is significantly enhanced in comparison to the GPW. In urban areas, GRUMP shows a ring pattern in which low-density areas surround high-density ones, which is similar to the night-time light images. In sum, the ranking of the overall mapping performance from fine to coarse is as follows: CnPop, WorldPop, GRUMP, and GPW.   Figure 4 shows the relationships between the estimated population counts and census data in 27 provinces in China. Each point represents the estimated population count and a statistical population number at the township level. It can be seen that the relationship between the predicted populations and census populations is more linear for both WorldPop and CnPop than for the GPW or GRUMP. The WorldPop dataset has the highest correlation coefficient between the estimated and statistical values (0.906) compared to CnPop (0.772), GRUMP (0.740), and the GPW (0.499). Table 3 shows the RMSE, MAE, and MAPE values for the estimated and demographic population counts for each dataset. Of the three indicators, WorldPop has the lowest values, followed by CnPop, GRUMP, and the GPW. The MAPE value for WorldPop (42.44%) is the smallest and is marginally smaller than that for CnPop (42.77%). This suggests that WorldPop is more accurate than CnPop, GPW, and GRUMP. However, the estimation accuracy of CnPop is close to that of WorldPop and better than that of GRUMP and the GPW. Because GRUMP distributes large populations across major urban areas and then uses areal weighting to allocate the remainder of the population to rural regions, while the GPW uniformly distributes the population across each county level administrative unit, the estimation accuracy of GRUMP is better than the GPW. It is noteworthy that the difference between the RMSE and MAE values for all four datasets was significant, which suggests that the variances in the individual errors were large and had the same various trends shown by the MAPE.    Table 3 shows the RMSE, MAE, and MAPE values for the estimated and demographic population counts for each dataset. Of the three indicators, WorldPop has the lowest values, followed by CnPop, GRUMP, and the GPW. The MAPE value for WorldPop (42.44%) is the smallest and is marginally smaller than that for CnPop (42.77%). This suggests that WorldPop is more accurate than CnPop, GPW, and GRUMP. However, the estimation accuracy of CnPop is close to that of WorldPop and better than that of GRUMP and the GPW. Because GRUMP distributes large populations across major urban areas and then uses areal weighting to allocate the remainder of the population to rural regions, while the GPW uniformly distributes the population across each county level administrative unit, the estimation accuracy of GRUMP is better than the GPW. It is noteworthy that the difference between the RMSE and MAE values for all four datasets was significant, which suggests that the variances in the individual errors were large and had the same various trends shown by the MAPE.

Mapping and Analysis of Relative Estimation Error
The spatial distributions of the relative estimation errors (REEs) for the four population datasets are shown in Figure 5. In all four sub-figures, it can be seen that there are several red and blue areas on the Qinghai-Tibet and Inner Mongolian Plateaus, indicating that the population distribution datasets are greatly overestimated or underestimated in these areas. In Figure 5a, green patches corresponding to the accurately estimated township units dominate the entire map, which suggests that CnPop has a good accuracy. Non-green spatial units mainly appear in regions with complex terrains, such as the Hengduan Mountains, Qinling Mountains, northern farming-pastoral ecotone, the karst mountain areas, and the hilly areas, indicating that CnPop's modeling strategy does not simulate population distribution patterns well in these areas. In Figure 5b, more than half of the entire map area is not green, while small green patches are discretely located in large plain areas, such as the Huang-Huai-Hai region, the Jianghan Plain, and the middle and lower reaches of the Yangtze River. This suggests that the areal weight interpolation method has an acceptable accuracy for parts of the plain and basin areas in China. The green areas in Figure 5c clearly exceed those in Figure 5b. However, it is noteworthy that the greatly underestimated or overestimated patches in sparsely populated northern China do not decrease, suggesting that the method used to generate the GRUMP dataset does not improve the estimation accuracy of areas without intensive night-time light, such as the less-developed rural areas on the Qinghai Tibet and Inner Mongolia Plateaus, and some hilly regions. In comparison to Figure 5a-c, more green patches cover the east and south regions in Figure

Mapping and Analysis of Relative Estimation Error
The spatial distributions of the relative estimation errors (REEs) for the four population datasets are shown in Figure 5. In all four sub-figures, it can be seen that there are several red and blue areas on the Qinghai-Tibet and Inner Mongolian Plateaus, indicating that the population distribution datasets are greatly overestimated or underestimated in these areas. In Figure 5a, green patches corresponding to the accurately estimated township units dominate the entire map, which suggests that CnPop has a good accuracy. Non-green spatial units mainly appear in regions with complex terrains, such as the Hengduan Mountains, Qinling Mountains, northern farming-pastoral ecotone, the karst mountain areas, and the hilly areas, indicating that CnPop's modeling strategy does not simulate population distribution patterns well in these areas. In Figure 5b, more than half of the entire map area is not green, while small green patches are discretely located in large plain areas, such as the Huang-Huai-Hai region, the Jianghan Plain, and the middle and lower reaches of the Yangtze River. This suggests that the areal weight interpolation method has an acceptable accuracy for parts of the plain and basin areas in China. The green areas in Figure 5c clearly exceed those in Figure 5b. However, it is noteworthy that the greatly underestimated or overestimated patches in sparsely populated northern China do not decrease, suggesting that the method used to generate the GRUMP dataset does not improve the estimation accuracy of areas without intensive night-time light, such as the less-developed rural areas on the Qinghai Tibet and Inner Mongolia Plateaus, and some hilly regions. In comparison to Figure 5a-c, more green patches cover the east and south regions in Figure 5d with a remarkable advantage. Meanwhile, the coverage of red patches in the Hengduan Mountains regions in Figure 5d exceeds those in Figure 5a and is almost equivalent to those in Figure 5b. This shows that the WorldPop dataset simulates the population distribution well for the vast majority of towns in eastern and southern China; however, there is a considerably large error level in the hilly areas such as Hengduan Mountain.
Sustainability 2018, 10, x FOR PEER REVIEW 9 of 15 5d with a remarkable advantage. Meanwhile, the coverage of red patches in the Hengduan Mountains regions in Figure 5d exceeds those in Figure 5a and is almost equivalent to those in Figure  5b. This shows that the WorldPop dataset simulates the population distribution well for the vast majority of towns in eastern and southern China; however, there is a considerably large error level in the hilly areas such as Hengduan Mountain. For a better understanding of the error structure of each dataset, we calculated the total population, total area, and corresponding percentages that fell within each REE range (Table 4). Figure 6a shows the percentage bar charts of the datasets in different error ranges. The REE values in the range from −25 to 25% are the township units determined to have a relatively good estimation performance. The percentages of the total population falling within this range were more than half for both CnPop (50.5%) and WorldPop (57.4%). The dominant range was the 'accurately estimated' category for GRUMP (37.54%) and 'greatly underestimated' category for the GPW (30.43%) from the perspective of total population percentage. If we divided the REE values into three simpler ranges, that is, accurately estimated (from −25 to 25%), underestimated or overestimated (from −50 to −25% or from 25 to 50%), and greatly underestimated or overestimated (≥50% or ≤−50%), the percentages of the total population that fell within the three ranges were, respectively, 50.5%, 26.85%, and 22.65% for CnPop; 29.23%, 24.25%, and 46.52% for the GPW; 37.54%, 28.78%, and 33.58% for GRUMP; and 57.4%, 27.04%, and 15.56% for WorldPop. This shows that a majority of townships in the CnPop, GRUMP, and WorldPop datasets fell within the accurately estimated category, while most townships in the GPW dataset were in the greatly underestimated or greatly overestimated category. The second majority of townships in the CnPop and WorldPop datasets fell within the underestimated or overestimated category, while that for the GRUMP dataset was in the greatly underestimated or greatly overestimated category. The percentage of town populations that fell within the greatly For a better understanding of the error structure of each dataset, we calculated the total population, total area, and corresponding percentages that fell within each REE range (Table 4). Figure 6a shows the percentage bar charts of the datasets in different error ranges. The REE values in the range from −25 to 25% are the township units determined to have a relatively good estimation performance. The percentages of the total population falling within this range were more than half for both CnPop (50.5%) and WorldPop (57.4%). The dominant range was the 'accurately estimated' category for GRUMP (37.54%) and 'greatly underestimated' category for the GPW (30.43%) from the perspective of total population percentage. If we divided the REE values into three simpler ranges, that is, accurately estimated (from −25 to 25%), underestimated or overestimated (from −50 to −25% or from 25 to 50%), and greatly underestimated or overestimated (≥50% or ≤−50%), the percentages of the total population that fell within the three ranges were, respectively, 50.5%, 26.85%, and 22.65% for CnPop; 29.23%, 24.25%, and 46.52% for the GPW; 37.54%, 28.78%, and 33.58% for GRUMP; and 57.4%, 27.04%, and 15.56% for WorldPop. This shows that a majority of townships in the CnPop, GRUMP, and WorldPop datasets fell within the accurately estimated category, while most townships in the GPW dataset were in the greatly underestimated or greatly overestimated category. The second majority of townships in the CnPop and WorldPop datasets fell within the underestimated or overestimated category, while that for the GRUMP dataset was in the greatly underestimated or greatly overestimated category. The percentage of town populations that fell within the greatly underestimated or greatly overestimated category in the WorldPop datasets was 15.56%, which was the smallest value of the four datasets.
Since China's population is unevenly distributed, the total area and population for each REE range in the four datasets is not proportional. The WorldPop dataset accurately estimated the population distribution for 40.37% of the total area, while the corresponding values for CnPop, GPW, and GRUMP were 37.81%, 26.63%, and 31.67%. The total area of the greatly underestimated or overestimated townships accounted for more than one-third of the four datasets. In summary, the WorldPop dataset was the most accurate among the four datasets. Both WorldPop and CnPop provided a good estimation accuracy for more than half of the 33,631 townships, although they exhibited a low accuracy in the sparsely populated areas, which accounted for approximately 40% of the total area. The GPW's estimation accuracy was barely satisfactory in most townships and was substantially improved by GRUMP, which increased the number of accurately estimated townships and decreased the number of greatly underestimated or overestimated townships. To identify the unbalanced distribution pattern of the REE, the average population density of each REE range for the four datasets was calculated as i, shown in Figure 6b. For CnPop, the average population density was significantly lower in the greatly overestimated or underestimated ranges, indicating that the land use/land cover model failed to redistribute the population well in sparsely populated areas. The values of the underestimated and accurately estimated ranges were approximately the same, which shows that the heterogeneity discernibility of CnPop needs to be improved in densely populated areas. The values that corresponded to the GPW sharply decreased from the greatly underestimated range to the greatly overestimated range, suggesting that the areal weighting interpolation method cannot redistribute the population well since the average density of the administrative units can significantly magnify the population density of sparsely populated areas and suppress that of densely populated ones. In comparison to the GPW, the average population density of the greatly underestimated units of GRUMP decreased by half and those of both underestimated and greatly overestimated marginally increased, illustrating that auxiliary data such as night-time light images could discriminate some populated areas, but a few populated areas remained unidentified because the average population density of the greatly underestimated units was still relatively high. The average population density of the greatly overestimated ranges was the smallest for the WorldPop dataset, showing that this dataset greatly overestimated the population distribution over a wide range of sparsely populated areas. Meanwhile, the average population density of the underestimated, accurately estimated, and overestimated ranges progressively decreased, illustrating that the fixed estimated population densities of different land cover types could not exactly conform to the real population distribution across China. An interesting observation was that the values of the accurately estimated ranges for the four datasets were about 200-260 people per km 2 , which suggests that the areas within this population density value range are easy to characterize with a high accuracy. observation was that the values of the accurately estimated ranges for the four datasets were about 200-260 people per km 2 , which suggests that the areas within this population density value range are easy to characterize with a high accuracy.

Discussion
The accuracy assessment and comparisons above illustrate that the simple areal weighting method used to generate the GPW and GRUMP can lead to serious estimation errors in the Chinese context, while the modeling strategy based on land use/land cover, which was utilized to produce CnPop and WorldPop, is precise across most areas in southeastern China but leads to obvious estimation errors across northwestern China. As mentioned above, these errors are not only caused by the generation methods but also fundamentally created by the complexity of the population distribution in China. The four datasets failed to accurately characterize the spatial pattern of the population distribution in the less-developed northern and western China, including the Qinghai-Tibet and Inner Mongolia Plateaus, as well as the hilly area, karst region, and farming-pastoral ecotone. We speculate that there are three main causes for this. First, because farmland and rural settlements with small areas are sparsely dispersed across these areas, they are too difficult to extract information from and include in the land use or land cover data, leading to the uncertain distribution of the corresponding population [35]. Second, the assumption that the population density per land use type or a county is a fixed value is too coarse to reflect the heterogeneity of a real population distribution [7]. A few studies have demonstrated that zoning before modeling and multi-source data fusion can effectively enhance the modeling accuracy in these regions. The former involves dividing the research area into several sub-regions in line with natural and cultural features using a zoning or clustering algorithm initially and then constructing a prediction model for each sub-region. For example, Zeng et al. divided mainland China into eight zones using night-time light image clustering and the shortest path algorithm and markedly improved the modeling performance [8]. Zhuo et al. classified Chinese counties into four types on the basis of their night-light characteristics and modeled the population distribution both inside and outside of light patches using regression and Coulomb's Law model. In the Coulomb's Law model, the distribution of China's population is treated as a 'field' analogous to an electric field. In this 'field', urban centres, where the population and socioeconomic activities are highly concentrated, have impacts on the surrounding regions in the same way that point charges exert an influence on any charged objects around them. Based on Coulomb's Law, this paper assumed that the magnitude of the impact (i.e., the force of attraction related to the population distribution) of an urban centre imposed on a given point equals TDN/r 2 , where TDN represents the total digital number and r represents the distance from the urban centre [19]. The latter refers to the advantage of using multi-source environmental and geographical factors such as land use, topography, rural settlement points, traffic network, and water system data to establish a data fusion model to simulate the population distribution. For example, Dong et al. confirmed that elevation, slope, and aspect strongly influence the population distribution in Guizhou province, which is one

Discussion
The accuracy assessment and comparisons above illustrate that the simple areal weighting method used to generate the GPW and GRUMP can lead to serious estimation errors in the Chinese context, while the modeling strategy based on land use/land cover, which was utilized to produce CnPop and WorldPop, is precise across most areas in southeastern China but leads to obvious estimation errors across northwestern China. As mentioned above, these errors are not only caused by the generation methods but also fundamentally created by the complexity of the population distribution in China. The four datasets failed to accurately characterize the spatial pattern of the population distribution in the less-developed northern and western China, including the Qinghai-Tibet and Inner Mongolia Plateaus, as well as the hilly area, karst region, and farming-pastoral ecotone. We speculate that there are three main causes for this. First, because farmland and rural settlements with small areas are sparsely dispersed across these areas, they are too difficult to extract information from and include in the land use or land cover data, leading to the uncertain distribution of the corresponding population [35]. Second, the assumption that the population density per land use type or a county is a fixed value is too coarse to reflect the heterogeneity of a real population distribution [7]. A few studies have demonstrated that zoning before modeling and multi-source data fusion can effectively enhance the modeling accuracy in these regions. The former involves dividing the research area into several sub-regions in line with natural and cultural features using a zoning or clustering algorithm initially and then constructing a prediction model for each sub-region. For example, Zeng et al. divided mainland China into eight zones using night-time light image clustering and the shortest path algorithm and markedly improved the modeling performance [8]. Zhuo et al. classified Chinese counties into four types on the basis of their night-light characteristics and modeled the population distribution both inside and outside of light patches using regression and Coulomb's Law model. In the Coulomb's Law model, the distribution of China's population is treated as a 'field' analogous to an electric field. In this 'field', urban centres, where the population and socioeconomic activities are highly concentrated, have impacts on the surrounding regions in the same way that point charges exert an influence on any charged objects around them. Based on Coulomb's Law, this paper assumed that the magnitude of the impact (i.e., the force of attraction related to the population distribution) of an urban centre imposed on a given point equals TDN/r 2 , where TDN represents the total digital number and r represents the distance from the urban centre [19]. The latter refers to the advantage of using multi-source environmental and geographical factors such as land use, topography, rural settlement points, traffic network, and water system data to establish a data fusion model to simulate the population distribution. For example, Dong et al. confirmed that elevation, slope, and aspect strongly influence the population distribution in Guizhou province, which is one of China's major karst areas [36]. Liao et al. accurately transformed the population on the Qinghai-Tibet Plateau into a grid format with a spatial resolution of 1km using a multi-data fusion approach based on elevation, village settlement, traffic network, land use, and water system [37]. Third, a problem worth pointing out is that the ecological environment of these inaccurately estimated regions is fragile and responds sensitively to climate change and human activities [38,39]. For example, several infectious diseases such as the plague and tick-borne encephalitis primarily existed and spread in these regions [40,41].
The characterization year of both the four gridded population datasets being evaluated and the spatial population dataset at the township level used for assessment in this study is 2000. Despite the successive publication of several gridded population distribution datasets for China in 2010 and 2015, this study's assessment and comparison work for 2000 retains its unique characteristics. First, since this study is the first to systematically and comprehensively evaluate the estimation accuracy of CnPop, GPW, GRUMP, and WorldPop using a wide range of spatial population datasets at the township level in China, the conclusions could deepen our spatial understanding of the four gridded datasets and provide a clear quality reference for choosing suitable datasets for related environment and resource research pertinent to 2000. Second, six nationwide censuses have been conducted in China since 1949. Of these, the fifth census in 2000 was the first to release statistical population data at the township level, while the previous four censuses only offered statistical datasets at the county level. Despite the improvement in the resolution of statistical data, the corresponding vector boundary data on the township scale are still difficult to access owing to policy restrictions and expenses, indicating that fine-scale population geography research in China has remained inadequate over the past decades. The pioneering attempt of this study, which constructed a GIS-linked 2000 census dataset at the township level covering 27 provinces, can serve as a basic and updatable dataset for fine-scale population geography research in China [42].

Conclusions
This study performed estimation accuracy assessments and a comparison of CnPop, GPW, GRUMP, and WorldPop over 27 provinces in China, using boundary and census data at the township level. The results showed that the estimation accuracy, error structure, and mapping performance of these four gridded population datasets varied within each dataset and between different datasets. WorldPop had the highest estimation accuracy with 60% of the population classified under the accurately estimated category. Second was CnPop, which accurately simulated the distribution of half of the population; it also showed the best mapping performance. The GPW had the lowest estimation accuracy and worst mapping performance because it only had an acceptable estimation accuracy in some plain and basin areas. The estimated accuracy and mapping performance of GRUMP were significantly better than those of the GPW because it had a higher percentage in the accurately estimated category and a lower percentage in the greatly underestimated or overestimated category. However, all four datasets failed to characterize the population distribution in northwestern China, mainly on the Qinghai-Tibet and Inner Mongolia Plateaus. In addition, there were relatively large estimation errors in regions with complex terrains such as the hilly area, karst region, and farming-pastoral ecotone in the four gridded datasets.
Thus, future work should focus on updating and constructing time-series boundary and census datasets at the township level for China in 2000 and 2010. Utilizing fine-scale geospatial population data, two sequential gridded population datasets will be produced with a high spatial resolution on the basis of the areal weighting method and data fusion method using accessibility and night-time light data. Finally, village-level boundary and demographic data are expected to be collected for the sake of calibration and validation.