Mapping the Population Density in Mainland China Using NPP/VIIRS and Points-Of-Interest Data Based on a Random Forests Model

Understanding the spatial distribution of populations at a finer spatial scale has important value for many applications, such as disaster risk rescue operations, business decision-making, and regional planning. In this study, a random forest (RF)-based population density mapping method was proposed in order to generate high-precision population density data with a 100 m × 100 m grid in mainland China in 2015 (hereafter referred to as ‘Popi’). Besides the commonly used elevation, slope, Normalized Vegetation Index (NDVI), land use/land cover, roads, and National Polar Orbiting Partnership/Visible Infrared Imaging Radiometer Suite (NPP/VIIRS), 16,101,762 records of points of interest (POIs) and 2867 county-level censuses were used in order to develop the model. Furthermore, 28,505 township-level censuses (74% of the total number of townships) were collected in order to evaluate the accuracy of the Popi product. The results showed that the utilization of multi-source data (especially the combination of POIs and NPP/VIIRS data) can effectively improve the accuracy of population mapping at a finer scale. The feature importances of the POIs and NPP/VIIRS are 0.49 and 0.14, respectively, which are higher values than those obtained for other natural factors. Compared with the Worldpop population dataset, the Popi data exhibited a higher accuracy. The number of accurately-estimated townships was 19,300 (67.7%) in the Popi product and 16,237 (56.9%) in the Worldpop product. The Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) were 14,839 and 7218, respectively, for Popi, and 18,014 and 8572, respectively, for Worldpop. The research method in this paper could provide a reference for the spatialization of other socioeconomic data (such as GDP).


Introduction
Population density data, presenting a realistic description of the spatial distribution of a population, is of great significance for mapping human settlements [1], quantifying the number of populations threatened by infectious diseases [2], calculating the birth rate and mortality [3], quantifying the spatial distribution of poverty and hunger [4], and evaluating Sustainable Development Goals (SDGs) [5]. According to data from the National Bureau of Statistics of China (http://www.stats.gov.cn/tjsj/pcsj/), The remainder of this paper is structured as follows: Sections 2 and 3 introduce the data and methods, respectively; the results, accuracy verification, and feature importance of the modeling factors are outlined in Section 4; in the discussion section of Section 5, the differences between POIs and NPP/VIIRS, the response of POI data, and the error analysis in mapping the population density are analyzed; and finally, Section 6 outlines the conclusions.

Data
The data used in this paper include NPP/VIIRS, POI, LULC, NDVI, DEM, road, and census data ( Table 1). The NPP/VIIRS satellite imagery was obtained from a new generation of polar-orbiting satellite system preparation projects. The first version of the 'vcm-orm-ntl' annual synthetic product was used in this paper, which excludes stray light and filters out transient lights, background values, and abnormal values, forming cloudless average radiation value data.
POI is a type of social perception data, and the duplicate records need to be deleted. Within mainland China, a total of 16,101,762 records of POI data were obtained in 2015. All of the POIs were classified into 10 categories based on their type ( Table 2).
LULC data, representing one of the most accurate long-term products in China, was obtained by the visual interpretation of Landsat satellite images, and the accuracy exceeds 75% [31,32]. The types of LULC data used in this paper include four classifications: cultivated land, forest land, grassland, and construction land.
The NDVI data include the MOD13Q1 product, with a spatial resolution of 250 m. The data are 16-day synthetic data. It is necessary to mosaic all of the NDVI on the same day to generate a full map of China.
The DEM data were employed in order to obtain elevation data and slope data.
The road data includes seven types (railways, national roads, provincial roads, high-speed roads, urban roads, county roads, and village roads). A total of 13,429,160 road data sets for mainland China were obtained.
The NDVI data include the MOD13Q1 product, with a spatial resolution of 250 m. The data are 16-day synthetic data. It is necessary to mosaic all of the NDVI on the same day to generate a full map of China.
The DEM data were employed in order to obtain elevation data and slope data. The road data includes seven types (railways, national roads, provincial roads, high-speed roads, urban roads, county roads, and village roads). A total of 13,429,160 road data sets for mainland China were obtained. Additionally, 2867 county-level censuses, with a total of 1,374,620,000 people, were obtained and were used as the dependent variable in order to establish the RF model. The township level is the smallest unit of the census in China, and is a lower administrative unit than the county level. This study also obtained 28,505 township censuses (accounting for 74% of all of the townships in China), which were employed in order to analyze the model's accuracy ( Figure 1). The Worldpop global population data set was provided by the University of Southampton (https://www.worldpop.org/geodata/listing?id=16), with a spatial resolution of 100m. The Worldpop program provide high resolution, open and contemporary data in human population distributions, which makes the measurement of the population density more accurate.
In this paper, the data were resampled to 100 m × 100 m.

Mapping the Population Density with the RF Model
This paper first constructs a non-linear relationship between the modeling factors and censuses at the county level. Then, the non-linear relationship at the county level is applied to the grid level, in order to achieve the spatialization of the population data. As shown in Figure 2, Step 1 is the establishment of the RF model: the 10 NDVI, slope, elevation, NPP/VIIRS, POI density layer, road distance layer, and LULC (cultivated land, forest, grassland, and construction land) preprocessed layers are aggregated at the county level (the preprocessing of independent variables is described in Section 3.2). Taking the above-10 county-level summary values as independent variables, and the The Worldpop global population data set was provided by the University of Southampton (https://www.worldpop.org/geodata/listing?id=16), with a spatial resolution of 100 m. The Worldpop program provide high resolution, open and contemporary data in human population distributions, which makes the measurement of the population density more accurate.
In this paper, the data were resampled to 100 m × 100 m.

Mapping the Population Density with the RF Model
This paper first constructs a non-linear relationship between the modeling factors and censuses at the county level. Then, the non-linear relationship at the county level is applied to the grid level, Remote Sens. 2020, 12, 3645 6 of 22 in order to achieve the spatialization of the population data. As shown in Figure 2, Step 1 is the establishment of the RF model: the 10 NDVI, slope, elevation, NPP/VIIRS, POI density layer, road distance layer, and LULC (cultivated land, forest, grassland, and construction land) preprocessed layers are aggregated at the county level (the preprocessing of independent variables is described in Section 3.2). Taking the above-10 county-level summary values as independent variables, and the natural logarithm of the county-level census as the dependent variable, all of the variables need to be divided into train sets and test sets. A total of 70% of the variables are randomly selected as train sets for the training of the RF model. The remaining 30% form the test set, which is used to calculate the model's accuracy.
Step 2 involves predicting the population density. By taking 10 independent variable raster layers as prediction sets, the prediction sets are input into the RF model to obtain the population density data.
Step 3 includes the dasymetric population mapping, where the population density data are adjusted according to the relationship between the census and the population density calculated by the RF model. The final population density data (named Popi) is calculated by Equation (1): where p g is the Popi data, p g is the population density data calculated by the RF model, p c is the census data at the county level, and p c is the aggerated value of population density data calculated by the RF model at the county level.
Remote Sens. 2020, 07, x FOR PEER REVIEW 6 of 22 for the training of the RF model. The remaining 30% form the test set, which is used to calculate the model's accuracy.
Step 2 involves predicting the population density. By taking 10 independent variable raster layers as prediction sets, the prediction sets are input into the RF model to obtain the population density data.
Step 3 includes the dasymetric population mapping, where the population density data are adjusted according to the relationship between the census and the population density calculated by the RF model. The final population density data (named Popi) is calculated by Equation (1): where g p is the Popi data, g p′ is the population density data calculated by the RF model, c p is the census data at the county level, and c p′ is the aggerated value of population density data calculated by the RF model at the county level.   Due to the strong sensitivity of the sensor, it is easy to detect objects such as instant lights and aurora. Therefore, the NPP/VIIRS data requires the following corrections. First, the background noise needs to be eliminated: the non-zero DN value in 2013 the DMSP/OLS image is used as a mask and the area outside the mask is regarded as a noise area. Then, the 2015 NPP/VIIRS data are extracted by the mask. Additionally, the maximum value needs to be excluded: according to the assumption proposed by Shi et al. [33], the maximum DN value in NPP/VIIRS cannot exceed the maximum DN value in the most developed cities in mainland China. This paper takes the maximum DN value (400) of the most economically developed cities (Beijing, Shanghai, and Guangzhou) in the study area as the threshold. The eight-neighborhood algorithm is utilized for the smoothing of the pixels which are larger than the threshold.
It can be seen from Figure 3 that the corrected NPP/VIIRS data eliminates the background noise and DN values greater than 400. In addition, since the resolution of the NPP/VIIRS data is 750 m, it needs to be matched with other data. Considering that NPP/VIIRS is the current high-resolution global open source night light data, and that we also have other high-resolution data to provide detailed information, we use the nearest neighbor method to resample the NPP/VIIRS data to 100 m. Due to the strong sensitivity of the sensor, it is easy to detect objects such as instant lights and aurora. Therefore, the NPP/VIIRS data requires the following corrections. First, the background noise needs to be eliminated: the non-zero DN value in 2013 the DMSP/OLS image is used as a mask and the area outside the mask is regarded as a noise area. Then, the 2015 NPP/VIIRS data are extracted by the mask. Additionally, the maximum value needs to be excluded: according to the assumption proposed by Shi et al. [33], the maximum DN value in NPP/VIIRS cannot exceed the maximum DN value in the most developed cities in mainland China. This paper takes the maximum DN value (400) of the most economically developed cities (Beijing, Shanghai, and Guangzhou) in the study area as the threshold. The eight-neighborhood algorithm is utilized for the smoothing of the pixels which are larger than the threshold.
It can be seen from Figure 3 that the corrected NPP/VIIRS data eliminates the background noise and DN values greater than 400. In addition, since the resolution of the NPP/VIIRS data is 750m, it needs to be matched with other data. Considering that NPP/VIIRS is the current high-resolution global open source night light data, and that we also have other high-resolution data to provide detailed information, we use the nearest neighbor method to resample the NPP/VIIRS data to 100m.

Producing NDVI Annual Synthetic Data
The maximum value composite (MVC) method is used to generate annual synthetic NDVI images from multi-temporal NDVI images in order to separate human settlements from other land cover types, and to eliminate the effects of cloud pollution [2]. The MVC method is shown in Equation (2): where NDVI is the annual synthetic image; NDVImax is an unscaled annual synthetic image; and NDVI1, NDVI2, …, NDVI23 are images obtained after the mosaic process. Since the NDVImax value range is −10000 to 10000, Equation (3) can scale the NDVImax value between −1 and 1.

Generating the POI Density Layers
The POI density layer is generated from POI point data by the kernel density estimation (KDE) method. In order to ease the burden of the calculation and make full use of POI, the entropy method is used to merge 10 POI density layers into one layer.
First, the 10 types of discrete POI point data (vector) are utilized to generate 10 smooth surfaces (the raster layer) of 100 m × 100 m through the KDE with a 6500 m bandwidth, respectively. KDE can

Producing NDVI Annual Synthetic Data
The maximum value composite (MVC) method is used to generate annual synthetic NDVI images from multi-temporal NDVI images in order to separate human settlements from other land cover types, and to eliminate the effects of cloud pollution [2]. The MVC method is shown in Equation (2): where NDVI is the annual synthetic image; NDVI max is an unscaled annual synthetic image; and NDVI 1 , NDVI 2 , . . . , NDVI 23 are images obtained after the mosaic process. Since the NDVI max value range is −10000 to 10000, Equation (3) can scale the NDVI max value between −1 and 1.

Generating the POI Density Layers
The POI density layer is generated from POI point data by the kernel density estimation (KDE) method. In order to ease the burden of the calculation and make full use of POI, the entropy method is used to merge 10 POI density layers into one layer.
First, the 10 types of discrete POI point data (vector) are utilized to generate 10 smooth surfaces (the raster layer) of 100 m × 100 m through the KDE with a 6500 m bandwidth, respectively. KDE can calculate the density of a certain type of POI point within a given range. The optimal bandwidth is determined as follows: when setting the bandwidth, the step length is set to 100 m in the range of 1000-10,000 m, and the step length is set to 1000 m in the range of 10,000-24,000 m. Then, a total of 105 different bandwidths are set, and 1050 corresponding POI density layers (105 times × 10 layers) are generated in sequence. Next, the county-level 10 POI density layer is generated by different bandwidths, respectively. Finally, taking 10 POI county-level sums as the independent variables, and the county-level census as the dependent variable, an RF regression model is built. A total of 105 RF regression models were constructed in this study. The bandwidth corresponding to the minimum out-of-bag error value of the 105th RF model is the optimal bandwidth. 'Out-of-bag' is an unbiased estimate of the RF generalization error [30]. The smaller the out-of-bag value, the smaller the model error. The out-of-bag experienced a process of decreasing first and then increasing ( Figure 4). When the bandwidth is 6500, the out-of-bag value is the lowest.
Remote Sens. 2020, 07, x FOR PEER REVIEW 8 of 22 calculate the density of a certain type of POI point within a given range. The optimal bandwidth is determined as follows: when setting the bandwidth, the step length is set to 100 m in the range of 1000-10,000 m, and the step length is set to 1000 m in the range of 10,000-24,000m. Then, a total of 105 different bandwidths are set, and 1050 corresponding POI density layers (105 times × 10 layers) are generated in sequence. Next, the county-level 10 POI density layer is generated by different bandwidths, respectively. Finally, taking 10 POI county-level sums as the independent variables, and the county-level census as the dependent variable, an RF regression model is built. A total of 105 RF regression models were constructed in this study. The bandwidth corresponding to the minimum out-of-bag error value of the 105th RF model is the optimal bandwidth. 'Out-of-bag' is an unbiased estimate of the RF generalization error [30]. The smaller the out-of-bag value, the smaller the model error. The out-of-bag experienced a process of decreasing first and then increasing ( Figure 4). When the bandwidth is 6500, the out-of-bag value is the lowest. Then, the 10 different types of POI density layers need to be combined into one layer. The entropy value can be utilized to calculate the weight of the 10 POI density layers. The entropy method only depends on the data itself and is an objective weighting method [34]. The smaller the entropy value, the greater the weight of the indicator, and vice versa. Therefore, in order to eliminate the difference in the dimension and range between different indicators, this study utilized Equation (4) to normalize the 10 POI density layers, respectively. Then, Equations (5) and (6) were utilized to calculate the entropy value (ej) and weight value (wj) of the j index. The POI weight is shown in Table  3. Finally, Equation (7) was used to merge the 10 POI layers into one layer (POIsum).
where bij is the pixel value of the POI density layer after normalization; aij is the pixel value of the POI density layer; ej is the entropy value; wj is the weight value; POIsum is the merged POI layer; m is the number of layers, j = 1, ..., m, m = 10; and I = 1,..., n, n is the number of pixels in the raster layer, with a total of 2,047,689,100 pixels.  1000  1200  1400  1600  1800  2000  2200  2400  2600  2800  3000  3200  3400  3600  3800  4000  4200  4400  4600  4800  5000  5200  5400  5600  5800  6000  6200  6400  6600  6800  7000  7200  7400  7600  7800  8000  8200  8400  8600  8800  9000  9200  9400  9600  9800  10000  12000  14000  16000  18000  20000  22000  24000 Out-of-bag bandwidth Then, the 10 different types of POI density layers need to be combined into one layer. The entropy value can be utilized to calculate the weight of the 10 POI density layers. The entropy method only depends on the data itself and is an objective weighting method [34]. The smaller the entropy value, the greater the weight of the indicator, and vice versa. Therefore, in order to eliminate the difference in the dimension and range between different indicators, this study utilized Equation (4) to normalize the 10 POI density layers, respectively. Then, Equations (5) and (6) were utilized to calculate the entropy value (e j ) and weight value (w j ) of the j index. The POI weight is shown in Table 3. Finally, Equation (7) was used to merge the 10 POI layers into one layer (POI sum ). b ij = a ij − min a 1j , · · · , a nj max a 1j , · · · , a ij − min a 1j , · · · , a ij , where b ij is the pixel value of the POI density layer after normalization; a ij is the pixel value of the POI density layer; e j is the entropy value; w j is the weight value; POI sum is the merged POI layer; m is the number of layers, j = 1, ..., m, m = 10; and I = 1,..., n, n is the number of pixels in the raster layer, with a total of 2,047,689,100 pixels.

Calculating the Road Distance Layers
The distance from the center point of each grid to the nearest road of each type was calculated in order to generate seven road distance rasters. Then, to simplify the subsequent calculations, the entropy method was utilized to merge the seven road raster layers into one raster layer, and the process of merging the layers was the same as that in Section 3.2.3. The road data weights are shown in Table 4.

Accuracy Assessment
Indirect verification was utilized as a compromise, since the 100 m × 100 m population density data were inaccessible. Therefore, we compared the Popi data and Worldpop data with the census at the township level. The indicators, including the Relative Error (RE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), were used to measure the accuracy. The formulae are as follows: where P i is the sum of population density products at the township level, and C i is the township population.

Results of the Population Density Mapping
In order to improve the model's accuracy, the parameters of the RF model need to be optimized. Considering that the random search parameter is time-consuming, we preset the possible value range of the parameter and utilized the out-of-bag value to access the optimal parameters ( Figure 5). The range and optimal values are shown in Table 5. The goodness of fit (R 2 ) of the training set is 0.98, and the R 2 of the test set is 0.88.  The population density at the 100 m × 100 m resolution in 2015 is shown in Figure 6a. The population density shows a strong imbalance in mainland China. The population of China is densely distributed in the southeast and sparsely distributed in the northwest. In the east, the population distribution is concentrated. For example, in four of the most densely-populated cities, including Beijing, Shanghai, Guangzhou, and Shenzhen, the population density is more than 1000 people/km 2 . In the less densely populated cities, such as Lanzhou, Kunming, Shenyang, and Urumqi, which contain more rural areas compared with the densely populated cities, the population density is 200-300 people/km 2 . Due to natural factors, the population density in certain western regions is less than 10 people/km 2 .  The population density at the 100 m × 100 m resolution in 2015 is shown in Figure 6a. The population density shows a strong imbalance in mainland China. The population of China is densely distributed in the southeast and sparsely distributed in the northwest. In the east, the population distribution is concentrated. For example, in four of the most densely-populated cities, including Beijing, Shanghai, Guangzhou, and Shenzhen, the population density is more than 1000 people/km 2 . In the less densely populated cities, such as Lanzhou, Kunming, Shenyang, and Urumqi, which contain more rural areas compared with the densely populated cities, the population density is 200-300 people/km 2 . Due to natural factors, the population density in certain western regions is less than 10 people/km 2 .

Parameter Type Range Optimal Value
The spatial distribution of the population is affected by the altitude, landform, vegetation, land cover, transportation, and social economy, etc. The natural conditions of the river valleys, plains, and hills with middle and low altitudes provide superior land conditions for the development of human activities. For example, there are the population is more concentrated in marine plains (such as Shanghai), alluvial plains (such as central Shaanxi), and low-elevation hilly areas (such as Guangzhou). Next, the different combinations of terrain, climate, and vegetation lead to different land cover types. The population will first gather in areas with advantageous natural resource endowments. Since land is the carrier of the population distribution, the population is more likely to be gathered in areas with superior conditions (such as built-up areas). Moreover, the road accessibility affects the level of communication between regions, causing the flow of various resource elements, including the population. Since major traffic arteries become the axis of the population distribution, the population is concentrated in areas with developed transportation (such as Beijing and Shanghai). The intensity of the economic development represented by the level of urbanization redistributes the population density based on resources and natural conditions, forming a 'core-periphery' effect of the population's spatial distribution within the city, reflected in municipality cities, such as Shenzhen, or provincial capitals, including Shenyang and Kunming, etc. (Figure 6b). The spatial distribution of the population is affected by the altitude, landform, vegetation, land cover, transportation, and social economy, etc. The natural conditions of the river valleys, plains, and hills with middle and low altitudes provide superior land conditions for the development of human activities. For example, there are the population is more concentrated in marine plains (such as Shanghai), alluvial plains (such as central Shaanxi), and low-elevation hilly areas (such as Guangzhou). Next, the different combinations of terrain, climate, and vegetation lead to different land cover types. The population will first gather in areas with advantageous natural resource endowments. Since land is the carrier of the population distribution, the population is more likely to be gathered in areas with superior conditions (such as built-up areas). Moreover, the road accessibility affects the level of communication between regions, causing the flow of various resource elements, including the population. Since major traffic arteries become the axis of the population distribution, the population is concentrated in areas with developed transportation (such as Beijing and Shanghai). The intensity of the economic development represented by the level of urbanization redistributes the population density based on resources and natural conditions, forming a 'coreperiphery' effect of the population's spatial distribution within the city, reflected in municipality cities, such as Shenzhen, or provincial capitals, including Shenyang and Kunming, etc. (Figure 6b).  respectively, and the MAE is 8572 and 7218, respectively. The Worldpop and Popi products have a higher accuracy in mainland China, while the Popi data are slightly better than the Worldpop data. As seen from Figure 7, Popi products can reduce the overestimation and underestimation to a certain extent. For example, Popi products have reduced overestimation in the western provinces (such as southern Xinjiang). Meanwhile, Popi products have reduced underestimation in the central and eastern provinces (such as Shaanxi, Sichuan, Hunan, Anhui, and Zhejiang, etc.).  In order to analyze the accuracy of the Popi data, we calculated the RE values of the townships and divided them into five levels ( Figure 7). Overall, few townships are significantly overestimated and underestimated in these two products. Among them, the number of townships that are accurately estimated with the Popi product is 19,300 (67.7%), while the number of townships that are accurately estimated with the Worldpop product is 16,237 (56.9%). There are 1051 (3.6%) and 4996 (17.5%) townships in the Popi products that are slightly overestimated and slightly underestimated, respectively. The number of townships that are slightly overestimated in the Worldpop products is 2038 (7.1%), and the number of townships that are slightly underestimated is 5847 (20.5%). Meanwhile, the number of townships seriously overestimated and seriously underestimated in the Popi products are 1572 (5.5%) and 1586 (5.6%), respectively. The Worldpop product has 2092 (7.3%) seriously overestimated townships, and 2291 (8.1%) seriously underestimated townships. Above all, the sum of the proportions of the severely overestimated or severely underestimated townships is about 10%, indicating that both products have a good accuracy.

Accuracy Assessment
Furthermore, the underestimation ratio of the two products exceeds 20%, and is slightly higher than the overestimation ratio (about 10%). For example, there are many underestimations in the central and eastern provinces (such as Jiangsu, Anhui, Sichuan, and Zhejiang), where the total population of the province exceeds 50 million, with a higher density. In contrast, the western provinces (such as Xinjiang, Tibet, Qinghai, and Inner Mongolia) with small populations and low densities have many obvious overestimations for the two products. The township area of the western region is larger, but the township number is small, while the township area of the central and eastern regions is smaller, and the number is larger. Therefore, more townships are underestimated.
In order to analyze the accuracy of the different regions, we calculated the R 2 between the township census and the estimated township population in 30 provinces. As seen from Figure 8, the R 2 value of the Popi product in 22 provinces is higher than that of the Worldpop product, and the R 2 value of the Worldpop product in eight provinces is higher than that of the Popi product. and underestimated in these two products. Among them, the number of townships that are accurately estimated with the Popi product is 19,300 (67.7%), while the number of townships that are accurately estimated with the Worldpop product is 16,237 (56.9%). There are 1051 (3.6%) and 4996 (17.5%) townships in the Popi products that are slightly overestimated and slightly underestimated, respectively. The number of townships that are slightly overestimated in the Worldpop products is 2038 (7.1%), and the number of townships that are slightly underestimated is 5847 (20.5%). Meanwhile, the number of townships seriously overestimated and seriously underestimated in the Popi products are 1572 (5.5%) and 1586 (5.6%), respectively. The Worldpop product has 2092 (7.3%) seriously overestimated townships, and 2291 (8.1%) seriously underestimated townships. Above all, the sum of the proportions of the severely overestimated or severely underestimated townships is about 10%, indicating that both products have a good accuracy.
Furthermore, the underestimation ratio of the two products exceeds 20%, and is slightly higher than the overestimation ratio (about 10%). For example, there are many underestimations in the central and eastern provinces (such as Jiangsu, Anhui, Sichuan, and Zhejiang), where the total population of the province exceeds 50 million, with a higher density. In contrast, the western provinces (such as Xinjiang, Tibet, Qinghai, and Inner Mongolia) with small populations and low densities have many obvious overestimations for the two products. The township area of the western region is larger, but the township number is small, while the township area of the central and eastern regions is smaller, and the number is larger. Therefore, more townships are underestimated.
In order to analyze the accuracy of the different regions, we calculated the R 2 between the township census and the estimated township population in 30 provinces. As seen from Figure 8, the R 2 value of the Popi product in 22 provinces is higher than that of the Worldpop product, and the R 2 value of the Worldpop product in eight provinces is higher than that of the Popi product. From a regional perspective, the southeast regions, such as Guangdong, Fujian, Zhejiang, and Jiangsu, are mainly distributed in plains and hills with low altitudes, which are suitable for humans and have the highest accuracy (R 2 > 0.7). Similarly, the northwest, including Shaanxi and Ningxia, with medium-and low-altitude alluvial plains, is also suitable for humans, and has R 2 > 0.7. In the From a regional perspective, the southeast regions, such as Guangdong, Fujian, Zhejiang, and Jiangsu, are mainly distributed in plains and hills with low altitudes, which are suitable for humans and have the highest accuracy (R 2 > 0.7). Similarly, the northwest, including Shaanxi and Ningxia, with medium-and low-altitude alluvial plains, is also suitable for humans, and has R 2 > 0.7. In the central regions, such as Guangxi and Anhui, the landforms are mainly low-altitude mountains and hills, showing that the natural conditions are suitable, and exhibiting a higher accuracy (0.7 > R 2 > 0.6).
Next, the western region, including Xinjiang, Sichuan, and Guizhou, has a medium accuracy (0.6 > R 2 > 0.5) and a high altitude, and is dominated by mountains and hills, such as the Tianshan Mountains and the Hengduan Mountains. Henan and Shandong, with a total population exceeding 100 million, have a medium accuracy (0.6 > R 2 > 0.5). They are followed by the northwest and southwest regions, including Qinghai, Gansu, and Chongqing, where the landform is occupied by alluvial platforms, mountains, and denuded platforms, etc., illustrating the unsuitability of natural conditions with a lower accuracy (0.5 > R 2 > 0.4). The provinces with the lowest accuracy (R 2 < 0.4) are in the southwest and northeast, including Tibet, Heilongjiang, and Liaoning. Tibet is dominated by mountains with high altitudes, such as Tanggula Mountain, demonstrating uninhabitable characteristics. Heilongjiang and Liaoning are dominated by mountains and terraces with a low altitude, but the population is less affected by the low temperature.
Above all, the accuracy is affected by the altitude, landform, temperature, and total population, etc. The provinces with a poor accuracy are located in densely-populated or sparsely-populated areas, or areas with poor natural conditions.
In order to analyze the relationship between the population density of the townships and RE, we divided the population density of the townships into six classifications, as shown in Figure 9. Overall, the RE values of the two products are mostly distributed in the range of −0.25 to 0.25. The distribution of the RE values in the Popi data are more concentrated, with fewer extreme points, and the median of the RE values is closer to 0. The distribution of the RE values in the Worldpop dataset is more scattered, with more extreme values. 0.6).
Next, the western region, including Xinjiang, Sichuan, and Guizhou, has a medium accuracy (0.6 > R 2 > 0.5) and a high altitude, and is dominated by mountains and hills, such as the Tianshan Mountains and the Hengduan Mountains. Henan and Shandong, with a total population exceeding 100 million, have a medium accuracy (0.6 > R 2 > 0.5). They are followed by the northwest and southwest regions, including Qinghai, Gansu, and Chongqing, where the landform is occupied by alluvial platforms, mountains, and denuded platforms, etc., illustrating the unsuitability of natural conditions with a lower accuracy (0.5 > R 2 > 0.4). The provinces with the lowest accuracy (R 2 < 0.4) are in the southwest and northeast, including Tibet, Heilongjiang, and Liaoning. Tibet is dominated by mountains with high altitudes, such as Tanggula Mountain, demonstrating uninhabitable characteristics. Heilongjiang and Liaoning are dominated by mountains and terraces with a low altitude, but the population is less affected by the low temperature.
Above all, the accuracy is affected by the altitude, landform, temperature, and total population, etc. The provinces with a poor accuracy are located in densely-populated or sparsely-populated areas, or areas with poor natural conditions.
In order to analyze the relationship between the population density of the townships and RE, we divided the population density of the townships into six classifications, as shown in Figure 9.  There is a certain correlation between the RE value and the population density. We found that there may be an overestimation when the population density is less than 50 people/km 2 and the median of RE in this classification is close to 0.2. There tends to be an underestimation when the population density exceeds 1000 people/km 2 and the median of the RE in this classification is close to −0.15. Moreover, compared with a low population density (less than 50 people/km 2 ), the median of the RE value with a high population density (exceeds 1000 people/km 2 ) is closer to 0, with fewer discrete points. Hence, the accuracy of these two products is better in areas with a higher population density than it is in areas with a lower population density. When the population density is in the range of 50-1000 people/km 2 , the median value of the RE is close to 0, indicating that both products have good effects. There is a certain correlation between the RE value and the population density. We found that there may be an overestimation when the population density is less than 50 people/km 2 and the median of RE in this classification is close to 0.2. There tends to be an underestimation when the population density exceeds 1000 people/km 2 and the median of the RE in this classification is close to −0.15. Moreover, compared with a low population density (less than 50 people/km 2 ), the median of the RE value with a high population density (exceeds 1000 people/km 2 ) is closer to 0, with fewer discrete points. Hence, the accuracy of these two products is better in areas with a higher population density than it is in areas with a lower population density. When the population density is in the range of 50-1000 people/km 2 , the median value of the RE is close to 0, indicating that both products have good effects.

The Feature Importance of the Independent Variables
In order to analyze the importance of the independent variables, the feature importance indicator was calculated. The feature importance is the degree of influence on the accuracy of the RF model when an independent variable is replaced by randomly distributed data. If the independent variables do not participate in the model's training, the model's accuracy will decrease. The higher the feature's importance, the more important the independent variables, and vice versa. The feature importance of the RF model is shown in Figure 10.

The Feature Importance of the Independent Variables
In order to analyze the importance of the independent variables, the feature importance indicator was calculated. The feature importance is the degree of influence on the accuracy of the RF model when an independent variable is replaced by randomly distributed data. If the independent variables do not participate in the model's training, the model's accuracy will decrease. The higher the feature's importance, the more important the independent variables, and vice versa. The feature importance of the RF model is shown in Figure 10. Compared with the natural factors, such as the NDVI, elevation, slope, and LULC data, the socio-economic factors, including the POI data and NTL, are more important. Among them, the feature importance of the POI data is 0.49. If the POI data does not participate in the model training, the accuracy of the model will drop by approximately half, which indicates that the POI data are the most important. This is followed by the NTL data, where the feature importance of the NPP/VIIRS is 0.14. If neither the POI nor NTL data participate in the RF model training, the model's accuracy decreases by about 63%. Therefore, the method of combining the POI and NPP/VIIRS data is more conducive to the estimation of the population density.
The feature importance value of construction land is 0.16, while the values for cultivated land, forest land, and grassland are all below 0.1. The feature importance of construction land is significantly higher than that of the last three factors. The construction land includes urban construction land, rural construction land, industrial land, and mining land, while the population is usually distributed in built-up areas, rural areas, and workplaces. Only a small number of people are engaged in agriculture or forestry, and are therefore distributed in cultivated land or woodland, etc. Therefore, cultivated land, forest land, and grassland are less important in the spatialization of the population.
The population distribution is limited by natural conditions (NDVI, elevation, and slope) and road factors. Although the feature importance of these factors is less than 0.1, the model's accuracy will decrease when these factors are removed. For example, in Guizhou, as one of the main karst regions in China, Dong et al. [8] confirmed that the altitude, slope, and aspect have a great influence on the population distribution. The feature importance confirms that there is a certain relationship between these natural factors and the population. Compared with the natural factors, such as the NDVI, elevation, slope, and LULC data, the socio-economic factors, including the POI data and NTL, are more important. Among them, the feature importance of the POI data is 0.49. If the POI data does not participate in the model training, the accuracy of the model will drop by approximately half, which indicates that the POI data are the most important. This is followed by the NTL data, where the feature importance of the NPP/VIIRS is 0.14. If neither the POI nor NTL data participate in the RF model training, the model's accuracy decreases by about 63%. Therefore, the method of combining the POI and NPP/VIIRS data is more conducive to the estimation of the population density.
The feature importance value of construction land is 0.16, while the values for cultivated land, forest land, and grassland are all below 0.1. The feature importance of construction land is significantly higher than that of the last three factors. The construction land includes urban construction land, rural construction land, industrial land, and mining land, while the population is usually distributed in built-up areas, rural areas, and workplaces. Only a small number of people are engaged in agriculture or forestry, and are therefore distributed in cultivated land or woodland, etc. Therefore, cultivated land, forest land, and grassland are less important in the spatialization of the population.
The population distribution is limited by natural conditions (NDVI, elevation, and slope) and road factors. Although the feature importance of these factors is less than 0.1, the model's accuracy will decrease when these factors are removed. For example, in Guizhou, as one of the main karst regions in China, Dong et al. [8] confirmed that the altitude, slope, and aspect have a great influence on the population distribution. The feature importance confirms that there is a certain relationship between these natural factors and the population.

The Differences between POIs and NPP/VIIRS in Mapping the Population Density
As shown in Figure 10, the feature importance of the POI data is significantly higher than that of the NTL data. In order to analyze the cause of this phenomenon, we divided the NPP/VIIRS data from 2015 into a lighted area and an unlighted area, according to whether the DN value was greater than 0. The lighted area was shown to be only 2,163,658 km 2 (22.5%). Zhuo et al. [15] divided the DMSP/OLS data values within mainland China from 1998 into two types: with and without lights. The area with light only accounts for 503,400 km 2 (5.3%), while the area without light accounts for 8,040,000 km 2 (84.7%). There is also a population distribution in the areas without lights. The NPP/VIIRS and DMSP/OLS data have similar problems.
One limitation of NPP/VIIRS is that the overpass time is later than 12:00 PM, and most of the lights are off at that time. The urban functions differed in their temporal light dynamics [35,36]. For example, Li et al. [36] found that an outdoor sports field and an administrative building lost 97.28% and 4.56% of their measured brightness between 8:08 PM-4:05 AM, respectively, while the entire study area in Wuhan, China, lost 61.86% of its total brightness. Furthermore, the period between 9:06 PM and 10:05 PM was the period with the greatest amount of light loss in the study area.
Furthermore, the lighted areas are mostly distributed in urban areas, and are less covered in rural areas, while the POI data covers urban and rural areas. In particular, the type of village in the POI data can effectively simulate the spatial distribution of the rural population. In 2015, there were 3,567,162 records of villages, which can make up for the lack of NPP/VIIRS in rural areas.
Additionally, there are several differences between the NTL and POI data in the urban built-up areas and the rural areas. We utilized urban construction land and rural construction land in LULC data as mask data, and the NPP/VIIRS data and POI density layer were divided into built-up areas and rural areas by masks, respectively. The four layers (NPP/VIIRS (built-up area), POIs (built-up area), NPP/VIIRS (county), and POIs (county)) were aggregated at the city level, respectively. We took the city census as the dependent variable and the four aggregated values as the independent variable in order to calculate the correlation coefficient and the R 2 of the linear regression model between the independent and dependent variables, respectively ( Figure 11).

The Correlation between POIs and Censuses
Previous studies have proved that there is a correlation between POI and the population. For example, Li et al. [24] and Yang et al. [37] used POI data to estimate the population density, confirming that POI data can be used as a population spatialization modeling factor. Similarly, the feature importance of the POI calculated in this paper is 0.49, and Figure 12a shows that there is a positive correlation between POI data and the total population at the city level. Furthermore, Figure  12l shows that the correlation coefficient between the population density and POI density is 0.83, indicating a positive correlation between the population and POI. This fully illustrates the availability The NPP/VIIRS and POI data have a positive correlation with the city census in urban built-up areas and rural areas. Two independent variables perform better in urban built-up areas than in rural areas, while the POIs perform better than NPP/VIIRS in the same area. As shown in Figure 11, the city census has the highest correlation with the POIs within the built-up areas; the correlation coefficient is 0.73, and the R 2 is 0.54. Then, the correlation coefficient between the city census and the NPP/VIIRS value within the built-up area is 0.67. Apart from that, the correlation coefficient between the city census and the POIs in the rural area is 0.65, and the R 2 is 0.43. Finally, the correlation coefficient between the city census and the NPP/VIIRS values in the rural area is 0.64, and the R 2 is 0.41.

The Correlation between POIs and Censuses
Previous studies have proved that there is a correlation between POI and the population. For example, Li et al. [24] and Yang et al. [37] used POI data to estimate the population density, confirming that POI data can be used as a population spatialization modeling factor. Similarly, the feature importance of the POI calculated in this paper is 0.49, and Figure 12a shows that there is a positive correlation between POI data and the total population at the city level. Furthermore, Figure 12l shows that the correlation coefficient between the population density and POI density is 0.83, indicating a positive correlation between the population and POI. This fully illustrates the availability of POI data in the establishment of the model. There is a positive and strong correlation between the census and the POI density layer [38]. The correlation coefficient between the aggregated value of the POI density layer and the city census is 0.85, and the R 2 of the linear regression model is 0.72, which is significantly higher than the values for the other 10 POIs. Therefore, the method of integrating multiple POI layers into one POI density layer can effectively reduce the calculation, and can form the distribution of the population under the combined action of multiple POI types. In addition, there are many types of POI data, but not every type is related to the population distribution. For example, airports and railway stations, etc., are usually far away from urban (higher population density) areas, and have no obvious correlation with population density. The POI types selected in the article, including education, factories, finance, and villages, are more closely related to the population.
In order to explore the relationship between censuses and multiple POIs, we utilized the city administrative boundary data to aggregate the POI density layer and the 10 types of POI layers, respectively. Then, we took the city census as the dependent variable and the aggregated value at the city level as the independent variable, and the correlation coefficient and the R 2 were calculated, respectively ( Figure 12).
There is a positive and strong correlation between the census and the POI density layer [38]. The correlation coefficient between the aggregated value of the POI density layer and the city census is 0.85, and the R 2 of the linear regression model is 0.72, which is significantly higher than the values for the other 10 POIs. Therefore, the method of integrating multiple POI layers into one POI density layer can effectively reduce the calculation, and can form the distribution of the population under the combined action of multiple POI types.
The 10 POI data are positively correlated with the population, and the correlation coefficient is greater than 0.7. The population has the highest correlation with four types of POI: education, government, finance, and villages. Specifically, the correlation coefficients between the city censuses and education, government, finance, and villages are 0.83, 0.82, 0.81, and 0.81, respectively. These four types of POI data are closely related to human life. The correlation coefficients between the city censuses and medical services and transportation are 0.77 and 0.76, respectively, indicating that medical services and transportation are indispensable in people's daily lives. Moreover, the correlation coefficients between the city census and catering, entertainment, hotels, and workplaces are 0.73, 0.72, 0.7, and 0.7, respectively.

Error Analysis
These two products performed well in most areas of mainland China. However, due to the large differences in the natural resources and socioeconomic conditions in different regions, certain errors remain.
The Popi dataset is better than the Worldpop dataset for the following reasons: both datasets employ the RF model and similar modeling factors, including natural factors (elevation, slope, LULC, etc.) and socioeconomic factors (NTL, roads, etc.). It can be seen from Section 4.3 that the feature importance of POI and NTL is significantly higher than for other natural factors. The POI data applied in Worldpop are derived from the Open Street Map (OSM) dataset. The OSM data are an online map collaboration program. Any registered user can edit the map content. The lack of professional editing may lead to inaccurate positioning or classification, which reduces the credibility of the data [39]. Compared with OSM data, POIs are produced and edited by professionals, providing more accurate location information and attribute information. Especially in urban areas, POI data can accurately describe the relevant information of humanity, reflect the spatial heterogeneity, and improve the accuracy of the population spatial modeling.
In addition, the data sources used by Worldpop have a low resolution and poor timeliness. For example, the NTL data used by Worldpop is DMSP/OLS data, with a resolution of 1 km, which has a pixel saturation effect [16]. The resolutions of the LULC and NTL data used in this paper are 30 and 750 m, respectively, which are significantly better than those given in Worldpop. The LULC and NTL data of Worldpop use 2009 data and 1992-2013 time-series data, respectively [39]. Due to timeliness, there are certain shortcomings in the estimation of the spatial distribution of the population in 2015.
When the population density is higher (>1000 people/km 2 ) or lower (<50 people/km 2 ), more errors are likely to occur. Furthermore, the accuracy of these two products in low-density areas is not as high as it is in high-density areas. Areas with a higher population density include developed urban areas, such as Beijing and Shanghai, while areas with a lower population density may be found in rural regions. Gaughan et al. [6] believe that the modeling process does not concentrate people enough in densely-populated urban areas, and the estimates are scattered to less-populated areas. This is inherent in the dasymetric approach used in the mapping population redistribution, but it affects relatively few total census units. Additionally, the greater the city population, the richer the types and quantities of the POI data [26]. The rural population is relatively scattered, and the number of POIs in rural areas is small and scattered [40]. Next, the NTL in the urban area with a DN value greater than 0 is higher than that in rural areas, and the NTL information is more abundant in the city [15]. Furthermore, the urban landforms are relatively simple, mostly dominated by plains and basins, and the terrain is relatively flat. The rural landforms may be hills or mountains, etc. These complex topographies and landforms affect the results of the population spatialization. For example, Bai et al. [9] found that the Worldpop dataset has considerable errors in hilly areas, such as the Hengduan Mountains. Besides this, due to the sparse distribution of the cultivated land and residential areas in rural areas, it is difficult to achieve an accurate classification of the land cover data in rural areas, which is not conducive to the spatial modeling of the population.
Overall, the quality and appropriateness of the modeling factors will affect the accuracy of population density data. For example, the Twitter data used by our predecessors in population density mapping has achieved better results in Indonesia; however, the use of Twitter is not widespread in China [41]. In contrast, as the accuracy of the location and attributes of the POI data is reliable, and it is optimal to utilize the socioeconomic factors in the quantification of the population density in China. Furthermore, the anisotropic characteristic of the artificial light at night may also impact the NPP/VIIRS data [42]. The Luojia-1 satellite provided a new data source for nighttime light applications, which could be a future direction of study. In addition, the modeling method also affects accuracy of the population density data. For example, Yang et al. [37] used POI data to calculate the population density of Zhejiang Province, China, and applied the linear regression equation to build the model. The linear model needs to assume that there is a linear relationship between the population and the modeling factors. However, the Enhanced Vegetation Index (EVI) used by Yang et al. [37] was not shown to have to have a linear relationship with the population density.
Finally, although the accuracy of the Popi product is slightly higher than that of Worldpop in this research area, there are few shortcomings in this article. For example, the LULC data used in this paper is highly accurate, but is not open source data. Secondly, the Baidu POI data can only be used in China, and not the rest of the world. It is still difficult to promote to the world due to the lack of Baidu POI data. In addition, the coverage of the OSM data used by the Worldpop data in China has not been completed yet [43]. Moreover, OSM, as a volunteered geographic information (VGI)-based dataset, may have incorrect locations or attributes that will affect the accuracy of the population density products. However, Worldpop products also have their own advantages. This product is a global high-precision population density product, and is currently the mainstream population density product.

Conclusions
Taking the LULC, NDVI, NPP/VIIRS, POI, road, and DEM data as the independent variables, the RF regression model was applied to disaggregate the 2015 county-level census population, in order to map the population density in a 100 m × 100 m grid in mainland China. The population density map that was produced showed a higher accuracy than the Worldpop dataset. This study demonstrated that the utilization of multi-source data can effectively improve the accuracy of population mapping at a finer scale.
The combination of POIs and NPP/VIIRS data is more conducive to the estimation of the population density. The sum of the feature importance of the socioeconomic factors, including POI and NPP/VIIRS, exceeds 60%, which is significantly higher than the other natural factors. Without the POIs and NPP/VIIRS in the RF model training, the model accuracy will decrease by about 63%.
The accuracy of the model is affected by the altitude, landform, temperature, and total population, etc. The areas with a poor accuracy were mainly located in the densely-populated and sparsely-populated areas. Obvious overestimation and underestimation appeared in areas with a population density of less than 50 people/km 2 and more than 1000 people/km 2 , respectively. The method of population density mapping based on remote sensing and POI data utilized in this paper has a certain practical significance. However, POIs are concentrated in urban areas, and are less concentrated in rural areas (except for rural settlement types). In our future work, we will introduce socioeconomic data, such as building data [25], Tencent location data [44], and Twitter [45], etc. The combination of POIs and NTL with other socioeconomic data sources may improve the accuracy.