Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data

: Gridded population results at a ﬁne resolution are important for optimizing the allocation of resources and researching population migration. For example, the data are crucial for epidemic control and natural disaster relief. In this study, the random forest model was applied to multisource data to estimate the population distribution in impervious areas at a 30 m spatial resolution in Chongqing, Southwest China. The community population data from the Chinese government were used to validate the estimation accuracy. Compared with the other regression techniques, the random forest regression method produced more accurate results (R 2 = 0.7469, RMSE = 2785.04 and p < 0.01). The points of interest (POIs) data played a more important role in the population estimation than the nighttime light images and natural topographical data, particularly in urban settings. Our results support the wide application of our method in mapping densely populated cities in China and other countries with similar characteristics.


Introduction
Gridded population data can reflect the explicit size and detailed distribution of the population. They are indispensable for scientific research and policy making, such as in city development planning [1], disaster prevention [2], public health including novel coronavirus disease 2019 (COVID-19) [3] and environmental conservation [4]. In the past decades, many cities have undergone rapid population growth and urban area expansion in the world, particularly in developing countries [5], which makes the increasing population become one of the major challenges for cities. The uncertainties of population information, such as the size, location and migration, have become urgent policy making issues that need to be solved for many governments [6]. Many countries have tried to collect detailed population data through censuses during certain periods of time, though collecting demographical data have high time consumption and high costs and the data may possibly be outdated after it is collected [7,8]. To protect privacy, the high resolution population distribution census data linked with small areas were restricted [9,10]. Consequently, urban population mapping data at a high resolution are often unavailable, which cause difficulties for studies. Nevertheless, thanks to the development of computer science and new models, there have been some studies that have used novel methods to obtain high spatial resolution population data without eroding privacy. ISPRS Int. J. Geo-Inf. 2020, 9,369 2 of 18 In the 1990 s, many scholars focused on the global population mapping methods and developed many global population datasets [11,12]. In 1995, for the first time, the Gridded Population of the World (GPW) collection disaggregated the global census data to a resolution of 2.5 arc-minutes (~4 km). The latest version (GPWv4) is at fine resolution of 30 arc-seconds (~1 km) using an areal weighting method and land cover datasets [12]. Similarly, the Gridded Rural Urban Mapping Project (GRUMP) succeeded in allocating the global population into 1 km grids considering the location and scale of the city center [13]. LandScan also provided global population data with a spatial resolution of approximately 1 km [14]. The Worldpop took advantage of the random forest model and multisource data, which made the 100 m resolution of global population data possible [15]. These population datasets employed different ancillary datasets with various approaches (Table 1). These datasets featured globally large scales with significance for the data-poor regions, but with concerns regarding accuracy. There are also some attempts that have used new data and methods [7] to conduct high-resolution population research at the national level [1,16,17]. However, the various urban fabric patterns and city complexity constrain the development of population mapping [8,18,19]. Satellite-derived products, e.g., topography [20], nighttime light images [21], built-up area [22] and land cover [23], were widely used in previous studies. Recently, mobile phones have become more prevalent and location-based services have appeared, such as online food delivery or car sharing, making city life more convenient. In the meantime, the rapid development of mobile location-based services has expanded the data sources for population spatialization [8,24]. Therefore, plenty of geospatial big data obtained from Internet websites became sources for population mapping [9,25,26]. Points of interest (POIs) data are typical geospatial vector points with detailed geographical coordinates provided by GPS and are usually shown on a digital map [17]. There are many kinds of amenities in cities with regards to the categories of the POIs. This social sensing data enriches the traditional approaches of remote sensing data in densely populated urban areas [26,27]. Therefore, integrating multisource data including POIs can improve the accuracy of gridded population data.
Areal interpolation [11,13,28,29] and dasymetric mapping [17,28,30] have been widely applied for the spatial decomposition of census data. To create the weight layer, some linear regression models have been proposed [5,31]. In particular, the random forest, one machine learning method, has been incorporated into models to produce the weighting layer of population data. For the first time, scholars obtained 100 m grid population mapping results in three undeveloped countries [20]. Due to the ease of training and interpretation, the random forest algorithm has received increasing attention in the population mapping research [7,8,20]. Despite some progress, the internal mechanism between the multisource data and the population estimation is still unclear [32]. Furthermore, in terms of the novel machine learning method, further research is required to cope with the impacts of auxiliary variables on population predictions [17]. This study applied the random forest regression model to produce the 30 m resolution gridded population data of Chongqing, Southwest China. The resolution was finer than those in many previous studies. The main aims of the study were (1) to spatially disaggregate subdistrict level census data into 30 m impervious grid cells and (2) to analyze the relation between geospatial features and the population.

Study Area
The research was conducted in Chongqing, China ( Figure 1). Located in the southwest of the country, Chongqing is the regional economic center for the upper reaches of the Yangtze River region, attracting a large sized migrant population. Since it was selected as one of directly controlled municipalities under China's State Council in 1997 [33], Chongqing has witnessed population and economic booms in the last two decades [34]. Nine central urban districts (Yuzhong, Yubei, Jiangbei, Shapingba, Nanan, Dadukou, Jiulongpo, Banan and Beibei) in Chongqing were selected for this research ( Figure 1). On one hand, the nine districts have more than 9.25 million permanent residents by the end of 2018, accounting for approximately 29.82% of the total resident population in Chongqing. On the other hand, the total area of the nine districts is approximately 5472.68 km 2 , which is only 6.64% of the whole Chongqing area. Consequently, the population density is very high at 1690 persons/km 2 . previous studies. The main aims of the study were (1) to spatially disaggregate subdistrict level census data into 30 m impervious grid cells and (2) to analyze the relation between geospatial features and the population.

Study Area
The research was conducted in Chongqing, China ( Figure 1). Located in the southwest of the country, Chongqing is the regional economic center for the upper reaches of the Yangtze River region, attracting a large sized migrant population. Since it was selected as one of directly controlled municipalities under China's State Council in 1997 [33], Chongqing has witnessed population and economic booms in the last two decades [34]. Nine central urban districts (Yuzhong, Yubei, Jiangbei, Shapingba, Nanan, Dadukou, Jiulongpo, Banan and Beibei) in Chongqing were selected for this research (Figure 1). On one hand, the nine districts have more than 9.25 million permanent residents by the end of 2018, accounting for approximately 29.82% of the total resident population in Chongqing. On the other hand, the total area of the nine districts is approximately 5472.68 km 2 , which is only 6.64% of the whole Chongqing area. Consequently, the population density is very high at 1690 persons/km 2 .
The central urban areas in Chongqing are famous for their mountainous and watery landscapes [35]. The average elevation is over 391 meters, with approximately 22.9% of the land covered by hills and mountains. Chongqing is also at the confluence of the Yangtze River and Jialing River. Its topographic features are extremely complicated. The region is in the subtropical monsoon climate with large seasonal variations in precipitation and temperature [36]. The relatively complex natural environment results in different levels of economic development and a heterogeneous population distribution, making Chongqing a typical place to research gridded populations.

Datasets
Six datasets were collected from government agencies, international organizations and research institutions and applied in this study ( Table 2). All the data were at the high spatial resolution in the The central urban areas in Chongqing are famous for their mountainous and watery landscapes [35]. The average elevation is over 391 meters, with approximately 22.9% of the land covered by hills and mountains. Chongqing is also at the confluence of the Yangtze River and Jialing River. Its topographic features are extremely complicated. The region is in the subtropical monsoon climate with large seasonal variations in precipitation and temperature [36]. The relatively complex natural environment results in different levels of economic development and a heterogeneous population distribution, making Chongqing a typical place to research gridded populations.

Datasets
Six datasets were collected from government agencies, international organizations and research institutions and applied in this study (Table 2). All the data were at the high spatial resolution in the most recent year of 2018, except for the ASTRE GDEM data that were from 2010. The administrative units of the research area include 9 districts and 162 subdistricts. The high-quality and recent 162 subdistricts' census data were the main data source for the disaggregation. The community level population survey data were used for the accuracy assessment. Given that the population distribution data showed a long-tail, the natural logarithm transformation was applied for the population density to get a normal and even distribution [20]. The rest of the five auxiliary spatial datasets, including the POIs, Luojia1-01 image, water, impervious surface and ASTER GDEM data, were used to guide the fine-resolution population mapping. The POI data contained different kinds of information, such as latitudes, longitudes, addresses, classifications and others, which can be regarded as tags. By using Baidu Map's application programming interface under each category, the POI data were retrieved from Baidu, Inc. (http://map.baidu.com), which is one of the most popular web map service providers in China. The categories of the POI data were identified based on the Chinese semantic phrase according to Baidu Incand published articles [8,17,26]. Finally, a total of 813253 effective POI records were classified into 14 categories (Table 3) [8]. It is worth noting that single residence POI presented a residential community, including hundreds of apartments. In terms of car POIs, they included the points of sale, repair and service for cars. Baidu, Inc. divided them into three categories for business reasons while they were combined in this study.
Kernel density estimation (KDE) has the capability to convert each separated point into a smooth and continuous density surface. Most notably, the bandwidth of the KDE has a significant effect on the result [29]. KDE has been widely used to calculate the density of features in a surrounding neighborhood while considering the distance decay [37]. In this study, the parameter was adjusted from 100 m to 3000 m to get the best estimation after fitting the model. All the categories of the POI records were converted into raster layers by KDE, and 1000 m is chosen as the proper bandwidth. Due to the various distributions of the different POI categories, it is allowed to change; however, considering the principles of reproducibility and comparability, all the POIs use the same KDE bandwidth [17]. A new generation nigh-time light (NTL) remote sensing satellite, Luojia1-01, was launched in June 2018. The spatial resolution of Luojia1-01 NTL imagery is 130 m at the nadir point, which is the finest spatial resolution at present. Luojia1-01 has a better capability to simulate socioeconomic parameters than the Suomi National Polar-orbiting Partnership-Visible Infrared Imaging Radiometer (NPP-VIIRS) NTL data [38][39][40]. After spatial adjustment, radiation calibration was conducted for the Luojia1-01 images on the data website (http://59.175.109.173:8888). The final NTL layer was obtained using the following Equation (1): where L is the natural logarithm transformation value of the amplified radiant brightness; and DN is the digital number, which is the gray value of the initial image.
In terms of the natural variables, the Digital Elevation Model (DEM) gives the elevation of study area. The DEM was extracted from the Advanced Space-borne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) version 2 at a spatial resolution of 30 m. A surface analysis tool was used to calculate the slope and topographic relief. Water and impervious area data were obtained from a field investigation by the Chongqing Planning and Design Institute. The water based on vector map data were converted to raster format by calculating the Euclidean distance. The impervious layer was transformed to raster as the mask of the population surface.
All spatial covariates were projected to WGS_1984_UTM_Zone_48 and resampled to raster layers with the spatial resolution of 30 m. The zonal mean values of each continuous dataset within the subdistrict administrative boundary were calculated. The preprocessing methods were implemented using ArcMap 10.3 (ESRI, Inc., Redlands, CA, USA).

Methods
The multisource geospatial data were input into the dasymetric model due to the spatial heterogeneity of population distributions. To produce the spatial weighting layer, 10-fold cross validation was adopted to build a random forest regression model. Figure 2 represented the procedures for gridded population mapping, including the data processing, model building and result analysis. The random forest (RF) is an ensemble learning methodology, which was first systematically proposed by Breiman [41]. Among the essential ingredients, both bagging and the Classification and Regression Trees (CART) criteria play critical roles [41,42]. This machine learning technique is made of a combination of randomly generated decision trees, which can be used for classification and regression and achieve a significant improvement [43]. A decision tree is a nonlinear model to calculate outcomes using many linear boundaries by asking a series of queries about the available feature until reaching the final decision [44]. Therefore, the random forest model based on the spatial segmentation tree is not sensitive to the scale. For the regression, the final output is the mean prediction of the discrete trees that correspond to the input variables. Because the response variable was consecutive, the random forest regression model was adopted to fit the data in this study.
In the process of machine learning, it is necessary to optimize the super parameters and select a group of optimal super parameters for the learners to improve the performance and effect of learning [44,45]. For example, the max number of features and the max depth of trees are the two super parameters in random forest model. Grid search is conducted to try every possibility through loop traversal in all candidate parameter selections. The best parameters are used in the later analysis. To improve the predictive accuracy and control overfitting, the maximum depth and features of the decision tree were applied to stop excessive splitting. Moreover, the assessment accuracy is sensitive to the number of decision trees. If the number of trees is too small, it is easy to underfit the model. Conversely, more decision trees are likely to increase the calculation load. At the proper number, the model reaches a near steady state.
To avoid the impact of the initial data partition on the results, cross validation was performed to reduce the occasionality. In general, grid search needs to be combined with cross validation to ensure the validity and accuracy of the test. Therefore, cross validation (CV) was used to obtain these 3 fitting parameters [46]. When the number of samples was small, K-fold CV was helpful [47]. The K-1-fold was used as the training data and the outcome was validated by the remaining fold. In this research, there were only 162 subdistricts involved in the model and therefore 10-fold CV was adopted. The model estimation was completed using the Python 3.5.2 statistical environment and the Scikit-learn 0.21.3 random forest package. The random forest (RF) is an ensemble learning methodology, which was first systematically proposed by Breiman [41]. Among the essential ingredients, both bagging and the Classification and Regression Trees (CART) criteria play critical roles [41,42]. This machine learning technique is made of a combination of randomly generated decision trees, which can be used for classification and regression and achieve a significant improvement [43]. A decision tree is a nonlinear model to calculate outcomes using many linear boundaries by asking a series of queries about the available feature until reaching the final decision [44]. Therefore, the random forest model based on the spatial segmentation tree is not sensitive to the scale. For the regression, the final output is the mean prediction of the discrete trees that correspond to the input variables. Because the response variable was consecutive, the random forest regression model was adopted to fit the data in this study.
In the process of machine learning, it is necessary to optimize the super parameters and select a group of optimal super parameters for the learners to improve the performance and effect of learning [44,45]. For example, the max number of features and the max depth of trees are the two super parameters in random forest model. Grid search is conducted to try every possibility through loop traversal in all candidate parameter selections. The best parameters are used in the later analysis. To improve the predictive accuracy and control overfitting, the maximum depth and features of the decision tree were applied to stop excessive splitting. Moreover, the assessment accuracy is sensitive to the number of decision trees. If the number of trees is too small, it is easy to underfit the model. Conversely, more decision trees are likely to increase the calculation load. At the proper number, the model reaches a near steady state.
To avoid the impact of the initial data partition on the results, cross validation was performed to reduce the occasionality. In general, grid search needs to be combined with cross validation to ensure the validity and accuracy of the test. Therefore, cross validation (CV) was used to obtain these 3 fitting parameters [46]. When the number of samples was small, K-fold CV was helpful [47]. The K-1-fold was used as the training data and the outcome was validated by the remaining fold. In this research, there were only 162 subdistricts involved in the model and therefore 10-fold CV was adopted. The model estimation was completed using the Python 3.5.2 statistical environment and the Scikit-learn 0.21.3 random forest package.
To predict the weight of the population density, the natural logarithm of the population density and all the zonal mean values of the features in each subdistrict administrative unit were used to fit the random forest model. The impervious mask was used to abstract the final weighting layer. The dasymetric population distribution map for central Chongqing was produced using the following Equation (2): where POP pixel is the predicted population for each pixel, POP sub−district is the census population of every subdistrict, W pixel is the population distribution weight of each grid and W sum is the weighted sum within the subdistrict boundary. Given that the latest census data from 2018 were used, the current datasets were unsuited to compare with this research since that most of datasets were based upon the 2010 census. Therefore, the community level population data from an authoritative investigation in the year of 2018 were used. Due to the population data confidentiality on the community scale, the assessment dataset only consisted of 200 communities in central Chongqing, which were randomly chosen by the Chongqing Bureau of Public Security. Three regression models were compared, including the multiple linear regression (MLR), the geographically weighted regression (GWR) and the random forest regression (RFR).
MLR models assume that the sample data are independent of each other while the population data have the geospatial dependence. Many attempts were made to make MLR models suitable for spatial data [48][49][50]. Brunsdon et al. (1996) first proposed the GWR [48]. Considering the spatial location, the regression coefficients of different auxiliary variables are not evaluated by using all sample data, but rather by using a certain range of the local subset of the data [49,50]. The GWR was used for demographic mapping [51] and the identification of households in potential economic distress [52].
Because different communities have distinct population densities, it was assumed that this model had different performances in communities with various population densities. The accuracy of these communities with different population densities were tested using the coefficient of determination (R 2 ), the root mean squared error (RMSE) and the mean absolute error (MAE), which are well-known accuracy criteria in the field of population estimation. Additionally, the Akaike information criterion (AIC) can estimate the relative amount of information lost by a statistical model. The smaller the value is, the better the model's fit [53,54]. In general, when the number of samples is small, the corrected Akaike information criterion (AICc) is more helpful than the AIC since it avoids potential overfitting [55]. Due to small samples of 162 subdistricts' populations, the AICc was adopted as the model fit criterion.
Feature importance plays an important role in the explanations of the random forests model and it measures the increase in the prediction error of the model after the values of the variable were randomly permuted [56]. Considering that shuffling feature values increases the model error, the feature is important [57]. Using the Python Scikit learn 0.21.3 package, the feature importance of the random forest regressions was calculated as the impurity value reduction using the mean squared error [58] in the following Equation (3): where y i is the size of the population in each subdistrict, N is the number of subdistricts and µ is the mean value of all subdistricts. The partial dependence plot (PDP) is one of the smartest ways to extract insights from complicated models [17,56,59]. The same as the function of the coefficients in the linear model, the PDP can be used to interpret machine learning models. After fitting the model on real data, the features were changed to make a series of predictions by averaging the impacts of all other variables. The PDP shows the change of the average estimation of the population density with the varying features. Using the Python Pdpbox package, the PDP was formed using the following Equation (4): where N is the number of observations of the response variables; x i,k denotes the covariates for features i = 1, 2, . . . , p and samples k = 1, 2, . . . , N; and F(x) is the mathematical function of the random forest regression model.

Population Estimation
Due to only 162 subdistricts being used to fit the model, 10-fold CV was chosen. 9-folds of samples were adopted into the model to obtain the proper parameters and the accuracy was assessed using the remaining samples. 10-fold cross validation and grid search were conducted to find the proper parameters in the RF (Figure 3). When the max depth of the decision tree was 4 and the max number of features was 12, the average correlation coefficient (R 2 ) of the CV score reached the maximum value of 0.8861.
where N is the number of observations of the response variables; , denotes the covariates for features i = 1, 2, …, p and samples k = 1, 2,…, N; and F(x) is the mathematical function of the random forest regression model.

Population Estimation
Due to only 162 subdistricts being used to fit the model, 10-fold CV was chosen. 9-folds of samples were adopted into the model to obtain the proper parameters and the accuracy was assessed using the remaining samples. 10-fold cross validation and grid search were conducted to find the proper parameters in the RF (Figure 3). When the max depth of the decision tree was 4 and the max number of features was 12, the average correlation coefficient (R 2 ) of the CV score reached the maximum value of 0.8861. The suitable number of estimators was tested in the random forest and it ranged from 1 to 300. When the number of estimators reached 50, the accuracy of the model was 88% (Figure 4). After that, the accuracy was stable with small fluctuations. Therefore, 50 trees were sufficient in the random forest regression model. The max depth of the decision trees and the max number of features were 4 and 12, respectively, when the final tuning parameters were adopted. The suitable number of estimators was tested in the random forest and it ranged from 1 to 300. When the number of estimators reached 50, the accuracy of the model was 88% (Figure 4). After that, the accuracy was stable with small fluctuations. Therefore, 50 trees were sufficient in the random forest regression model. The max depth of the decision trees and the max number of features were 4 and 12, respectively, when the final tuning parameters were adopted. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 9 of 18 After fitting the random forests regression model, the final distribution weight of each raster was produced (Figure 5a). The weights where only impervious surface cover existed were used to disaggregate the census population to each pixel at the subdistrict level (Figure 5b). In the final population estimated layer (Figure 6), the population presented a lumpy or banded distribution, concentrating around Yuzhong in central Chongqing. From the high-resolution photomicrograph disaggregation results (Figure 6a,b), more details were able to be captured. After fitting the random forests regression model, the final distribution weight of each raster was produced (Figure 5a). The weights where only impervious surface cover existed were used to disaggregate the census population to each pixel at the subdistrict level (Figure 5b).  After fitting the random forests regression model, the final distribution weight of each raster was produced (Figure 5a). The weights where only impervious surface cover existed were used to disaggregate the census population to each pixel at the subdistrict level (Figure 5b). In the final population estimated layer (Figure 6), the population presented a lumpy or banded distribution, concentrating around Yuzhong in central Chongqing. From the high-resolution photomicrograph disaggregation results (Figure 6a,b), more details were able to be captured. In the final population estimated layer (Figure 6), the population presented a lumpy or banded distribution, concentrating around Yuzhong in central Chongqing. From the high-resolution photomicrograph disaggregation results (Figure 6a,b), more details were able to be captured.

Accuracy Validation
When comparing the results obtained by the three different models, GWR had a smaller Akaike information criterion than the MLR and the RFR model had the highest R 2 value (0.7469) ( Table 4). The RMSE and MAE of the RFR were also smaller (2785.04 and 1889.70, respectively). Therefore, the RFR had better overall accuracy than the other two linear regression models. The RFR model did improve the estimation. To estimate the prediction precision in communities with different population densities, all the communities were divided into three groups according to the density of the community's population. In Figure 7, the green plus markers indicated the communities with low population densities below 150 persons per hectare. It is worth mentioning that the coefficient of determination (R 2 ) of this group was 0.8498, demonstrating the high accuracy of the results. The R 2 of the communities with medium population densities from 150 to 300 persons per hectare (the blue square markers) was 0.6729 and the R 2 of the communities with population densities above 300 persons per hectare (the red triangle markers) was 0.4903. For all the communities, the overall accuracy was acceptable (R 2 = 0.7469).

Accuracy Validation
When comparing the results obtained by the three different models, GWR had a smaller Akaike information criterion than the MLR and the RFR model had the highest R 2 value (0.7469) ( Table 4). The RMSE and MAE of the RFR were also smaller (2785.04 and 1889.70, respectively). Therefore, the RFR had better overall accuracy than the other two linear regression models. The RFR model did improve the estimation. To estimate the prediction precision in communities with different population densities, all the communities were divided into three groups according to the density of the community's population. In Figure 7, the green plus markers indicated the communities with low population densities below 150 persons per hectare. It is worth mentioning that the coefficient of determination (R 2 ) of this group was 0.8498, demonstrating the high accuracy of the results. The R 2 of the communities with medium population densities from 150 to 300 persons per hectare (the blue square markers) was 0.6729 and the R 2 of the communities with population densities above 300 persons per hectare (the red triangle markers) was 0.4903. For all the communities, the overall accuracy was acceptable (R 2 = 0.7469).

Interpretation of the Gridded Population
As marked in the green box in Figure 8, the densities of life service, restaurant and residence POIs were the top 3 most important features. Their importance values exceeded 0.15, reflecting that the model had learned to take advantage of those POI data. The density of life service POIs, such as post offices and barbershops, was the most important feature influencing the population mapping and had the highest feature importance score. Restaurants and residences were also of vital significance for everyday life and so they were closely related to population distribution. The feature importance values below 0.01 were shown in the red box. When the density of government POIs was high, it was usually the administrative office land in the city, corresponding to small populations. When the densities of car, transport and scenic spot POIs were high, they were industrial land, transportation facilities land and green land, respectively. Surprisingly, the brightness of the NTL corresponded poorly to the population distribution. The natural factors, such as the elevation, the slope and the distance to the river, were also of little influence, probably due to the large disturbances of anthropogenic activities on the natural environment.

Interpretation of the Gridded Population
As marked in the green box in Figure 8, the densities of life service, restaurant and residence POIs were the top 3 most important features. Their importance values exceeded 0.15, reflecting that the model had learned to take advantage of those POI data. The density of life service POIs, such as post offices and barbershops, was the most important feature influencing the population mapping and had the highest feature importance score. Restaurants and residences were also of vital significance for everyday life and so they were closely related to population distribution. The feature importance values below 0.01 were shown in the red box. When the density of government POIs was high, it was usually the administrative office land in the city, corresponding to small populations. When the densities of car, transport and scenic spot POIs were high, they were industrial land, transportation facilities land and green land, respectively. Surprisingly, the brightness of the NTL corresponded poorly to the population distribution. The natural factors, such as the elevation, the slope and the distance to the river, were also of little influence, probably due to the large disturbances of anthropogenic activities on the natural environment.   Figure 9 shows the complex influence of features on the predicted weight of the population distribution. Since the kernel density of POIs strongly affected predictions strongly, the top 3 features were analyzed, including the kernel density of the life service, restaurant and residence POIs. Then, another three variables were chosen, including the brightness of the NTL, the elevation and the distance to the river. The larger densities of these POIs indicated denser populations of these three kinds of POIs (life service, residence and restaurant) (Figure 9a-c, respectively). However, this pattern disappeared for the brightness of the NTL, the elevation and the distance to the river (Figure 9d-f, respectively). Regarding life service POIs, when the density increased from 0.06 to 7.25, the change of the population weight increased from 0 to 0.6, respectively (Figure 9a). Due to numerous kinds of life services, the city attracted many people who wanted to work and live. The more life services facilities that were constructed, the more convenient city life was. After 7.25, the change remained constant, indicating the limited impact of extra life service facilities after the threshold.
Regarding the density of residence POIs, when the density was below 14.62, the change was near to zero. First, the population density was relatively low in the suburbs with scattered residences. Second, people pursued healthy and comfortable living environments and the low density residential development provided a good trade-off. When the density of the points exceeded 14.62, the change increased with some stable values. The higher density of residence POIs implied more people lived around that point.
Regarding the high density of restaurant POIs, the model predicted a generally high population size. For one thing, there was no change when the density was below 7.4, indicating small numbers of restaurants in places with few people. For another, diverse businesses, including restaurants, were located close to the central business district (CBD), attracting many people to live near the CBD. Thus, when the density exceeded 7.4, the change rose rapidly.
Regarding the brightness of the NTL graph (Figure 9d), in agreement with previous studies [21], the light was so dim that little energy could be captured by the sensors in the sparsely populated countryside. Therefore, the difference was not notable when the brightness of the NTL was below 3.95. Afterwards, the high brightness of the NTL increased the probability of population aggregation, and it was saturated at 5.24. First, the NTL was closely linked to human activity. Second, the high brightness of the NTL reflected more details of transportation and industrial activities than those of the population distribution. Therefore, the change of the confidence was from −0.02 to 0.04, reflecting the dual characters.
The local natural environment had a very limited impact on the population distribution, thanks to the progress of technology. Thus, the predicted population size usually remained constant (Figure 9e,f). Interestingly, there was a modest rise when the elevation increased from 449.77 to 766.17 m. Meanwhile, the confidence interval was between −0.025 and 0.075, which implied both positive and negative effects. Because of the limited training data, the prediction accuracy of the random forest model for this variable may be affected.
In terms of the distance to the river, the prediction was nearly constant when the distance changed from 29.18 to 835.71 m. As the distance further increased, the change decreased below 0 and finally went down to -0.4. Heterogeneous effects may not be revealed since the PDP plots only showed the average marginal effects [46]. For one thing, people lived near to rivers to get beautiful natural views; thus, this feature had a positive association with the prediction. For another, people remained a distance from the river in order to avoid the threat of natural flood disasters. Henceforth, the PDP of the distance to the river was a horizontal line with both positive and negative confidence ranges since the two effects could counteract each other.

Principal Findings and Meaningful Implication
This study produced the latest 30 m gridded population data in central Chongqing using contemporary data publicly available at the time of the analysis in 2018. The results provide timely and important information to policy makers for city planning, natural disaster and disease control and environment protection [3,22,26,60,61]. The state-of-the-art random forest model was used to integrate multiple data and 10-fold cross validation was conducted to produce the weighting layer, which was used for the dasymetric population mapping. As some of the most popular social sensing data, POI data are helpful for getting high accuracy gridded population results (overall R 2 = 0.7469, RMSE = 2785.04 and p < 0.01), especially after the kernel density estimation was used to generate various kinds of POI surfaces [27]. With the help of the feature importance analysis and the PDP, the results indicate that POI data had a more crucial effect on population mapping than the remote sensing data, like the Luojia1-01 NTL images, in cities [26]. However, the natural variables had both negative and positive impacts on population spatialization. In terms of a large scale study, the NTL images were major auxiliary data for the estimation [62][63][64]. Despite the finest resolution of the NTL images being 150 meters [21], the complex urban fabric patterns and the uncertainty of the radiation calibration of Luojia1-01 constrain the application of the images in conducting accurate population estimations at a fine scale.

Limitations and Future Research
Similar to many studies, there are some limitations in the current study. In terms of data, our study mainly focused on POIs, and it may reflect the land use as well. In fact, the previous studies demonstrated that POI data could be used for the classification of urban land use [59,65,66]. Meanwhile, after being divided into 14 categories, all the data were fitted for the prediction model without integrating all the POIs into one layer. Consequently, the various kinds of POIs may be correlated with each other, which may cause some feature importance bias. Further studies must deal with the independence of each feature.
Because of the confidentiality of the population data at the fine spatiotemporal scale, this study only chose central Chongqing as a case study instead of applying the model to the whole China. The POI data in the current study were provided by Baidu, Inc. Outside China, POI data could be extracted from the open access website Open Street Map (OSM) (https://www.openstreetmap.org/ #map=6/-0.396/32.893) [67]. Open Street Map POIs are one of the most popular volunteered geographic information (VGI) data since they are open source and easy to access [17,25]. However, the data quality is uncertain. First, most of the POIs are reported in urban areas and many POIs in rural areas are not collected. Second, the positional and thematic accuracy also need to be improved [68]. To address these issues, we adopted commercial POI data, which were collected using rigorous processes and found to be reliable [8,17].
With regard to the random forest model, the results depended on the prediction of each tree, which added randomness to the calculation. When the model was repeated, the results may vary greatly [20]. In a future study, attention should be paid to averaging the results over repetitions in order to stabilize the measurements, as well as controlling the time of the modeling process, as computing power continues to develop.
The calibration and validation of the population mapping was tough work due to population mobility and inadequate data. The statistical population counts conducted by the government are generally considered to be high quality, although they still have inherent limitations [1]. We are unable to compare our results with China's national census data in 2010, but the local community investigation data from 2018 are available. The comparison indicated various accuracies in different parts of the city because of the changes of natural and social-economic situations. In the communities with relative larger errors, more efforts are needed to apply alternative ways to build new models with proper parameters.

Conclusions
This study provided a dasymetric model for downscaling the census population in central Chongqing, Southwest China. This model took advantage of the smart random forest algorithm to generate the predicted population layer at a 30 m resolution. It offered a reference for multisource data fusion and the proper interpretation of the output. The POI data were adopted to overcome the overestimation and underestimation of the remote sensing images. According to the accuracy validation at the community level, the proposed approach achieved acceptable overall accuracy (overall R 2 = 0.7469, RMSE =2785.04 and p < 0.01). Furthermore, social sensing data and the random forest regression can be used to disaggregate socioeconomic data and the results are helpful for policy makers.