Mapping China’s Electronic Power Consumption Using Points of Interest and Remote Sensing Data

: Producing gridded electric power consumption (EPC) maps at a ﬁne geographic scale is critical for rational deployment and effective utilization of electric power resources. Brightness of nighttime light (NTL) has been extensively adopted to evaluate the spatial patterns of EPC at multiple geographical scales. However, the blooming effect and saturation issue of NTL imagery limit its ability to accurately map EPC. Moreover, limited sectoral separation in applying NTL leads to the inaccurate spatial distribution of EPC, particularly in the case of industrial EPC, which is often a dominant portion of the total EPC in China. This study pioneers the separate estimation of spatial patterns of industrial and nonindustrial EPC over mainland China by jointly using points of interest (POIs) and multiple remotely sensed data in a random forests (RF) model. The POIs provided ﬁne and detailed information about the different socioeconomic activities and played a signiﬁcant role in determining industrial and nonindustrial EPC distribution. Based on the RF model, we produced industrial, non-industrial, and overall EPC maps at a 1 km resolution in mainland China for 2011. Compared against statistical data at the county level, our results showed a high accuracy ( R 2 = 0.958 for nonindustrial EPC estimation, 0.848 for industrial EPC estimation, and 0.913 for total EPC). This study indicated that the proposed RF-based method, integrating POIs and multiple remote sensing data, can markedly improve the accuracy for estimating EPC. This study also revealed the great potential of POIs in mapping the distribution of socioeconomic parameters.


Introduction
As the most widely used secondary energy source, electricity is indispensable to modern society and plays a vital role in supporting socioeconomic activities and human life. Hence, the spatial pattern of electric power consumption (EPC) can be used as an essential indicator in signifying socioeconomic development [1] and energy use, which, in turn, are closely associated with CO 2 emissions and global warming [2]. Despite the importance of geospatial analysis of EPC, spatially explicit data available for such an exercise are very limited. Traditionally, EPC data are primarily obtained from statistical data based on administrative units (e.g., province, city, or county). Such coarse data, which are short of spatial heterogeneity, cause great difficulties for interdisciplinary studies integrated with physical and environmental datasets in raster or grid formats. Thus, developing efficient approaches to estimate EPC at the pixel level that can be easily integrated with other spatial data has become a new research interest. are mainly located in suburban areas. This phenomenon cannot be captured well merely on the basis of NTL data and may cause substantial misdistribution in EPC over China.
To address these gaps, this study attempts to use an RF-based method to improve EPC estimation in China by integrating POIs and multisource remote sensing data. On the basis of the semantic features of POIs and remote sensing auxiliary datasets, we developed different RF models to estimate IEPC and NEPC separately to improve the rationality and accuracy of the EPC estimation. The results of this research were used to generate refined EPC maps for China for 2011, with a resolution of 1 km. To the best of our knowledge, this study is the first attempt to integrate POIs in EPC estimation.

Data and Preprocessing
For RF model training, we used the statistical EPC data in 2011 from the "China Energy Statistical Yearbook 2012", "China City Statistical Yearbook 2012", and China Statistical Yearbooks Database (http://tongji.cnki.net/) (accessed on 1 August 2019). The data included total electric power consumption (TEPC) and IEPC from 31 provinces/municipalities and 134 prefectures in mainland China in 2011. The NEPC value was calculated by subtracting IEPC from TEPC for each administrative unit. The statistical industrial and nonindustrial GDP data at the prefecture level in 2011 were also obtained from the "China City Statistical Yearbook 2012". For accuracy assessment of the EPC estimation, we collected statistical TEPC, IEPC, and GDP data in 2011 from 817 counties across China. Administrative boundary maps at the provincial, prefecture, and county levels in China (scale of 1:4,000,000) were acquired from the website of National Geomatics Center of China (http://ngcc.sbsm.gov.cn/) (accessed on 1 August 2019). Statistical EPC and GDP data were spatially jointed to the corresponding administrative boundaries in ArcGIS 10.2.
We used a range of remote sensing and geospatial datasets relevant to IEPC or NEPC to build a stack of geographical covariates for RF fitting. Each geographical covariate was sourced at 1 km resolution, or a resampling method was used to convert the data to a 1 km resolution. The geographical covariates included the following: The global radiance calibrated NTL dataset (NTL_OLS) for 2010 at a 1 km resolution was obtained from the National Geophysical Data Center of National Oceanic and Atmospheric Administration (https://ngdc.noaa.gov/eog/dmsp/download_radcal.html) (accessed on 1 August 2019). The radiance-calibrated NTL product overcomes the saturation problem in ordinary DMSP/OLS NTL products [50].
The POI data of mainland China in 2010 were derived from an online map service platform, Baidu Map. A total of 5,006,053 POIs were obtained, which were classified into 20 types. Each POI record included the point's name, address, category, latitude, and longitude. The 20 types of POIs were divided into two broad categories, namely, IEPC-related POIs and NEPC-related POIs. Industrial enterprise was the only POI category used for IEPC estimation.
The road network vector dataset was acquired from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (http://www.resdc.cn/) (accessed on 1 August 2019), which includes expressways, national highways, provincial highways, county highways, urban roads, railroads, and other categories of roads. Road density (Den_road) and distance to the nearest road (DtN_road) for each cell at the 1 km grid scale were calculated using spatial analyst tools in ArcGIS 10.2.
The normalized difference vegetation index (NDVI) products derived from the VEGE-TATION sensor on board the SPOT satellite platforms were downloaded from the website of Vlaamse Instelling voor Technologisch Onderzock (https://www.vito-eodata.be) (accessed on 1 August 2019). These data with 1 km spatial resolution were synthesized by 10-consecutive-day segments via the maximum-value compositing method [51]. We generated the annual maximum NDVI (NDVI_max) by merging the NDVI time series data for 2011 using the maximum-value compositing.
Elevation and terrain slope data were derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model version 2 (https://gdex.cr.usgs.gov/gdex/) (accessed on 1 August 2019) with a 1-arc-second spatial resolution, provided by the National Aeronautics and Space Administration's Land Processes Distributed Active Archive Center.
These geographic variables (NTL_OLS, NDVI_max, Elevation, and Slope) were uniformly clipped by the administrative boundaries of mainland China, and then reprojected to Albers Conical Equal Area projection.
The administrative region was masked by a waterbody map on the basis of the assumption that no EPC activity occurs on water. The global 3-arc-second waterbody dataset (~90 m) provided by Yamazaki and Trigg (2015) [52] was applied to generate a waterbody map. In addition, 19 scenes of cloud-free multispectral images from 2009 to 2011 were utilized to visually evaluate the EPC spatial distribution result. These images retrieved from Landsat-4/5 Thematic Mapper (TM) were downloaded from the website of United States Geological Survey; they cover four metropolises in mainland China, including Beijing, Shanghai, Guangzhou, and Chengdu.

Methodology
The proposed method for EPC estimation consists of three main procedures: (1) filling the missing EPC value at the prefecture level using the GDP data; (2) producing POIs imageries for IEPC and NEPC estimation; and (3) fitting RF regression models and generating EPC density maps ( Figure 1).
10-consecutive-day segments via the maximum-value compositing method [51]. We generated the annual maximum NDVI (NDVI_max) by merging the NDVI time series data for 2011 using the maximum-value compositing.
Elevation and terrain slope data were derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model version 2 (https://gdex.cr.usgs.gov/gdex/) (accessed on 1 August 2019) with a 1-arc-second spatial resolution, provided by the National Aeronautics and Space Administration's Land Processes Distributed Active Archive Center.
These geographic variables (NTL_OLS, NDVI_max, Elevation, and Slope) were uniformly clipped by the administrative boundaries of mainland China, and then reprojected to Albers Conical Equal Area projection.
The administrative region was masked by a waterbody map on the basis of the assumption that no EPC activity occurs on water. The global 3-arc-second waterbody dataset (~90 m) provided by Yamazaki and Trigg (2015) [52] was applied to generate a waterbody map. In addition, 19 scenes of cloud-free multispectral images from 2009 to 2011 were utilized to visually evaluate the EPC spatial distribution result. These images retrieved from Landsat-4/5 Thematic Mapper (TM) were downloaded from the website of United States Geological Survey; they cover four metropolises in mainland China, including Beijing, Shanghai, Guangzhou, and Chengdu.

Methodology
The proposed method for EPC estimation consists of three main procedures: (1) filling the missing EPC value at the prefecture level using the GDP data; (2) producing POIs imageries for IEPC and NEPC estimation; and (3) fitting RF regression models and generating EPC density maps ( Figure 1).

Filling the Missing EPC Value at the Prefecture Level
Although a significant positive correlation exists between EPC and economic growth [28,53,54], most existing models on EPC mapping using NTL are oversimplified and ignore factors governing EPC [28]. When we built an RF model using EPC as the dependent variable and GDP and sum NTL at the prefecture level as the explanatory variables, GDP had greater importance than NTL. GDP was also more sensitive to EPC than the population [6,55]. In addition, statistical EPC data are incomplete even at the prefecture level in China. On the basis of the linear relationships between GDP and EPC, using samples of

Filling the Missing EPC Value at the Prefecture Level
Although a significant positive correlation exists between EPC and economic growth [28,53,54], most existing models on EPC mapping using NTL are oversimplified and ignore factors governing EPC [28]. When we built an RF model using EPC as the dependent variable and GDP and sum NTL at the prefecture level as the explanatory variables, GDP had greater importance than NTL. GDP was also more sensitive to EPC than the population [6,55]. In addition, statistical EPC data are incomplete even at the prefecture level in China. On the basis of the linear relationships between GDP and EPC, using samples of 134 prefectures ( Figure 2) and statistical EPC data at the provincial level for an adjustment, we filled the missing NEPC and IEPC values for 226 prefectures across mainland China (Figure 3). This step can integrate the GDP information in the following EPC estimation and can satisfy the requirement of a dasymetric mapping approach. To capture regional differences, we divided the study area into four economy-geographic regions, 134 prefectures ( Figure 2) and statistical EPC data at the provincial level for an adjustment, we filled the missing NEPC and IEPC values for 226 prefectures across mainland China ( Figure 3). This step can integrate the GDP information in the following EPC estimation and can satisfy the requirement of a dasymetric mapping approach. To capture regional differences, we divided the study area into four economy-geographic regions, namely, northeastern China (NEC), eastern China (EC), central China (CC), and western China (WC), with 115, 290, 165, and 247 counties, respectively, for accuracy assessment ( Figure  3).

Producing POIs Imageries
In accordance with the 20 POIs categories, the industrial enterprise category was used to refine the IEPC estimation, and the remaining 19 categories were used to improve the NEPC estimation. Kernel density estimation (KDE), a common method for converting discrete point features into a continuous raster surface, was used to analyze the spatial distribution of the POIs data of various categories. Previous studies have shown that the selection of bandwidth greatly affects the results of KDE [56][57][58]. For each POIs category, 134 prefectures ( Figure 2) and statistical EPC data at the provincial level for an adjustment, we filled the missing NEPC and IEPC values for 226 prefectures across mainland China ( Figure 3). This step can integrate the GDP information in the following EPC estimation and can satisfy the requirement of a dasymetric mapping approach. To capture regional differences, we divided the study area into four economy-geographic regions, namely, northeastern China (NEC), eastern China (EC), central China (CC), and western China (WC), with 115, 290, 165, and 247 counties, respectively, for accuracy assessment ( Figure  3).

Producing POIs Imageries
In accordance with the 20 POIs categories, the industrial enterprise category was used to refine the IEPC estimation, and the remaining 19 categories were used to improve the NEPC estimation. Kernel density estimation (KDE), a common method for converting discrete point features into a continuous raster surface, was used to analyze the spatial distribution of the POIs data of various categories. Previous studies have shown that the selection of bandwidth greatly affects the results of KDE [56][57][58]. For each POIs category,

Producing POIs Imageries
In accordance with the 20 POIs categories, the industrial enterprise category was used to refine the IEPC estimation, and the remaining 19 categories were used to improve the NEPC estimation. Kernel density estimation (KDE), a common method for converting discrete point features into a continuous raster surface, was used to analyze the spatial distribution of the POIs data of various categories. Previous studies have shown that the selection of bandwidth greatly affects the results of KDE [56][57][58]. For each POIs category, we generated KDE layers with different bandwidths (0.1, 0.2, 0.3, . . . , 2.0 km, separately). Pearson correlation coefficients (PCCs) between the sum of the KDE of a certain POIs category with various bandwidths at the prefecture level and statistical NEPC or IEPC were calculated, and the highest PCC was used to determine the optimal bandwidth (OB) for the POIs category. As shown in Table 1, strong correlations were evident for most POIs categories with OB. Therefore, POIs can be useful to refine EPC estimation. The kernel density layers of 19 POIs categories with the OB were used to generate one layer (KD_NEPC_POI), using a weighted sum method to reduce the computational load in the final RF model for NEPC estimation. The weights were determined by the percent increased mean square error (%IncMSE), which indicates the variable importance in an RF model. In this step, NEPC is used as the dependent variable, and the aggregated values of 19 POI kernel density layers at the prefecture level are used as explanatory variables [35]. The distance to the nearest POI of each category for each grid cell was calculated using the Euclidean distance tool in ArcGIS 10.2. The 19 layers of distance to the nearest NEPC-related POIs were also integrated into one layer (DtN_NEPC_POI), using the same weighting method. Accordingly, the kernel density layer of the IEPC-related POIs (KD_IEPC_POI) and the layer of distance to the nearest IEPC-related POIs (DtN_IEPC_POI) were derived from the POIs category "industrial enterprise".

Building RF Regression Model
The RF models were used to generate gridded EPC density estimates that were subsequently used to dasymetrically disaggregate statistical EPC data at the prefecture level into grid cells following the RF-based dasymetric population mapping approach developed by Stevens et al. (2015) [33]. Initially, the mean EPC density (the dependent variable) and a suite of geographical covariates (the independent variables) were calculated at the prefecture level. The results were then used to fit an RF model for predicting EPC density at the grid cell level (i.e., to generate the dasymetric weighting layer) with those gridded covariates with a spatial resolution of 1 km. To reduce the processing time during the prediction phase, multistage RF estimation for covariate selection, according to percentage of variation explained and the variable importance of each covariate, was implemented to reduce the number of covariates in the final RF models [34,59]. In addition, to assess the added value of including POIs as covariates in the EPC estimation, the geographical ancillary data without the two POIs-related variables were used to fit an RF model for TEPC estimation. We compared the outputs and accuracy of the RF models with and without Remote Sens. 2021, 13, 1058 7 of 17 the POIs-related variables. The RF models were implemented using the randomForest Package [60] in the R environment [61].

Gridded EPC Maps for Mainland China
The IEPC and NEPC at the prefecture level were disaggregated into 1 km grid cells (Figure 4)  age of variation explained and the variable importance of each covariate, was implemented to reduce the number of covariates in the final RF models [34,59]. In addition, to assess the added value of including POIs as covariates in the EPC estimation, the geographical ancillary data without the two POIs-related variables were used to fit an RF model for TEPC estimation. We compared the outputs and accuracy of the RF models with and without the POIs-related variables. The RF models were implemented using the randomForest Package [60] in the R environment [61].

Gridded EPC Maps for Mainland China
The IEPC and NEPC at the prefecture level were disaggregated into 1 km grid cells (Figure 4)    Based on the gridded NEPC, we reconstructed the spatial patterns of NEPC in 2011 at the county level and then calculated the per capita NEPC (Figure 5a). The local Moran's I was used to capture the spatial agglomeration of per capita NEPC at the county level in China (Figure 5b). In general, the three coastal urban agglomerations, namely Beijing-Tianjin-Hebei, the Yangtze River Delta, and the Pearl River Delta, had the highest per capita NEPC, with values more than 3000 kWh (Figure 5a), and formed the High-High clusters (Figure 5b). Compared with their surrounding counties, the provincial capital cities also had higher per capita NEPC. In the east of the Hu-Huangyong Line, per capita NEPC was above 500 kWh in most coastal counties, while inland counties generally have lower per capita NEPC. Low-Low clusters were identified in the Henan-Anhui, Sichuan-Chongqing-Guizhou-Guangxi province. In the west of the Hu-Huanyong Line, the relatively large per capita NEPC was mainly caused by a very small population or relatively large predicted errors of NEPC; some High-High clusters also were found in Inner Mongolia, Gansu-Qinghai, and Xinjiang. Previous studies showed that energy consumption can be used to measure inequality in China [63,64], which can be further supported by the distribution of reconstructed per capita NEPC at the county level across China.
Based on the gridded NEPC, we reconstructed the spatial patterns of NEPC in 2011 at the county level and then calculated the per capita NEPC (Figure 5a). The local Moran's I was used to capture the spatial agglomeration of per capita NEPC at the county level in China (Figure 5b). In general, the three coastal urban agglomerations, namely Beijing-Tianjin-Hebei, the Yangtze River Delta, and the Pearl River Delta, had the highest per capita NEPC, with values more than 3000 kWh (Figure 5a), and formed the High-High clusters (Figure 5b). Compared with their surrounding counties, the provincial capital cities also had higher per capita NEPC. In the east of the Hu-Huangyong Line, per capita NEPC was above 500 kWh in most coastal counties, while inland counties generally have lower per capita NEPC. Low-Low clusters were identified in the Henan-Anhui, Sichuan-Chongqing-Guizhou-Guangxi province. In the west of the Hu-Huanyong Line, the relatively large per capita NEPC was mainly caused by a very small population or relatively large predicted errors of NEPC; some High-High clusters also were found in Inner Mongolia, Gansu-Qinghai, and Xinjiang. Previous studies showed that energy consumption can be used to measure inequality in China [63,64], which can be further supported by the distribution of reconstructed per capita NEPC at the county level across China.

Accuracy Assessment
A per-pixel evaluation of the EPC maps was impossible due to the lack of reference EPC data at the grid level. Therefore, the predicted IEPC, NEPC, and TEPC (IECP + NEPC) at the grid cell level was aggregated to the county level and then compared with statistical EPC data from 817 counties to evaluate the performance of the proposed method. Two statistical indicators, namely, regression coefficient (R 2 ) and root mean square error (RMSE), were used to evaluate the accuracies of the estimated EPC maps. Figure 6 shows the results of the accuracy assessment of the estimated EPC. In summary, the proposed method exhibited exceptionally high predictive performance on NEPC estimation (R 2 = 0.958, RMSE = 0.734 billion kWh) and IEPC estimation (R 2 = 0.848, RMSE = 2.375 billion kWh) in China. The TEPC predictions corresponded highly with the statistical data with a slope of 0.936 and R 2 of 0.913 in China. The R 2 values for NEPC estimation ranged from 0.830 in NEC to 0.981 in EC, indicating the superior accuracy of the proposed method for NEPC estimation. The energy consumption structures of different industrial sectors are relatively complicated [65]. Therefore, the R 2 values for the IEPC estimation, ranging from 0.635 to 0.896 for four regions, were slightly lower than those for the NEPC estimation. In addition, the RMSE values of the IEPC estimation were larger than those of the NEPC estimation. These findings not only resulted from the variance in predictive performance among different regions but were also attributed to the fact that

Accuracy Assessment
A per-pixel evaluation of the EPC maps was impossible due to the lack of reference EPC data at the grid level. Therefore, the predicted IEPC, NEPC, and TEPC (IECP + NEPC) at the grid cell level was aggregated to the county level and then compared with statistical EPC data from 817 counties to evaluate the performance of the proposed method. Two statistical indicators, namely, regression coefficient (R 2 ) and root mean square error (RMSE), were used to evaluate the accuracies of the estimated EPC maps. Figure 6 shows the results of the accuracy assessment of the estimated EPC. In summary, the proposed method exhibited exceptionally high predictive performance on NEPC estimation (R 2 = 0.958, RMSE = 0.734 billion kWh) and IEPC estimation (R 2 = 0.848, RMSE = 2.375 billion kWh) in China. The TEPC predictions corresponded highly with the statistical data with a slope of 0.936 and R 2 of 0.913 in China. The R 2 values for NEPC estimation ranged from 0.830 in NEC to 0.981 in EC, indicating the superior accuracy of the proposed method for NEPC estimation. The energy consumption structures of different industrial sectors are relatively complicated [65]. Therefore, the R 2 values for the IEPC estimation, ranging from 0.635 to 0.896 for four regions, were slightly lower than those for the NEPC estimation. In addition, the RMSE values of the IEPC estimation were larger than those of the NEPC estimation. These findings not only resulted from the variance in predictive performance among different regions but were also attributed to the fact that the industry accounts for a substantially high proportion of the TEPC in most Chinese cities [28].
Regionally, the highest accuracy of EPC estimation was observed in EC with the highest R 2 and slope. EC had the largest RMSE mainly because it consumed more electricity than the other regions due to its leading position in the urbanization and industrialization processes [66]. The NEC is the only region with an R 2 of IEPC estimation (R 2 = 0.886) higher than that of NEPC estimation (R 2 = 0.830). Two points (marked in blue) were far away from the fitting line, namely, the urban districts of Huludao and Liaoyang (Figure 5b). Excluding these records, the R 2 for the NEPC estimation in NEC reached 0.929 and exceeded that for the IEPC estimation of 0.893. The relatively low prediction accuracy in CC and WC could have been caused by more industry-oriented cities with complex energy consumption structures in the two regions [67]. In addition, renewable energy (e.g., biomass briquette, solar energy, wind energy, and hydropower) is a crucial composition of energy consumption in the Qinghai-Tibet region, considering the special geographical condition and the fragile ecosystem and ecological environment, causing great uncertainty to EPC prediction in this region.

Variable Importance
Without POIs, NTL and NDVImax, which are significantly correlated with human s tlements and impervious surfaces [68][69][70], are the most important predictors in the model for TEPC estimation. The two road-related variables and elevation were rank Without two POI-related variables, the RF model for TEPC estimation yielded slightly worse prediction accuracies (Figure 6p-t). Similar R 2 and RMSE values were observed in WC for the results with and without the POIs covariate because of the relatively low urbanization level and less POIs density. However, the slope was better for TEPC estimation when the POIs were integrated.
Two studies on EPC estimation over mainland China were selected for comparison ( Table 2). The R 2 value of TEPC estimation for China in our study, which used more validation samples, was 0.913, which was substantially higher than 0.490 in the study of Cao et al. (2014) [3] and 0.750 in Xie and Weng (2016) [8]. Our result also showed better performance in terms of the RMSE and the slope of the regressions. Although it had no POIs, the results of the RF algorithm integrating multisource geographical covariates also performed better than the two studies, with an R 2 of 0.893. Therefore, the EPC in China can be estimated with markedly higher precision by the RF-based method, and the integration of POIs and multisource remote sensing data can further improve and refine the EPC estimation.

Variable Importance
Without POIs, NTL and NDVI max , which are significantly correlated with human settlements and impervious surfaces [68][69][70], are the most important predictors in the RF model for TEPC estimation. The two road-related variables and elevation were ranked low in their relative variable importance. The five geographical covariates could explain 89.7% of the TEPC variance within the RF model.
When the two POI-related variables were included in the modeling process, the RF models explained 96.01% of the NEPC variance and 84.08% of the IEPC variance. It is noteworthy that different covariates became the key contributors. For NEPC estimation, the kernel density of the NEPC-related POIs, NTL, and the density of road network were the most important variables (Figure 7a), and the partial dependence plots showed that the NEPC density increased as the three variables increased (Figure 7b-d). Elevation, the distance to the nearest NEPC-related POIs, the distance to the nearest road, and NDVI max ranked low in importance and were negatively correlated with NEPC density (Figure 7e-h). The RF model for the IEPC density estimation revealed that NDVImax was the most important variable, followed by NTL. POIs-related and road network-related variables were ranked low in their relative variable importance (Figure 7i). The kernel density of the POIs and road network was positively correlated with IEPC density, whereas the distance to the nearest POIs and roads were negatively correlated with IEPC density (Figure  7l  The RF model for the IEPC density estimation revealed that NDVI max was the most important variable, followed by NTL. POIs-related and road network-related variables were ranked low in their relative variable importance (Figure 7i). The kernel density of the POIs and road network was positively correlated with IEPC density, whereas the distance to the nearest POIs and roads were negatively correlated with IEPC density (Figure 7l-o). NEPC and IEPC density sharply decreased as the NDVI exceeded 0.75 (Figure 7h,j).

Discussion
Although previous studies have documented the varying degree of effectiveness of DMSP/OLS NTL data for EPC estimation at different spatial scales, several constraints still exist. Tremendous effort has been exerted to overcome the problems of DMSP/OLS data, especially for saturation and blooming [3,9,11,[71][72][73][74]. Compared with DMSP/OLS data, VIIRS data have a finer spatial resolution, greatly mitigated blooming effect, and larger quantization range to avoid the saturation issue [75]. Therefore, VIIRS data are more reliable for estimating EPC than DMSP/OLS data [23]. Despite the improvement in VIIRS data and considerable efforts in correcting DMSP/OLS data, the inherent problem of NTL data is their deficiency in distinguishing urban functions for various socioeconomic activities. For example, NTL data are deficient in distinguishing industrial zones and commercial centers with similar nighttime brightness, thereby resulting in the misdistribution of EPC from factories and commercial buildings.
Since the 1990s, many cities in China have experienced the decentralization of manufacturing from the urban centers and the formation of new industrial agglomerations in suburban industrial zones [76][77][78]. Moreover, most cities in China (125 of 134 prefectures with statistical EPC data in this study) have more IEPC than NEPC and they have markedly different spatial distributions. A recent study by [67] classified Chinese cities into three types (service-oriented, industrial, and technology and education) to represent different EPC characteristics. They found that the relationship between EPC and NTL is more complex in industrial cities. Therefore, disaggregating the EPC from different consumers, such as industrial and nonindustrial, merely on the basis of NTL data is difficult, thereby greatly affecting the accuracy of the EPC estimation. Recent studies have shown that geo-tagged tweets can more accurately estimate EPC than DMSP/OLS NTL data [79] and have nearly the same ability to estimate EPC compared with VIIRS NTL data [80]. However, tweets cannot distinguish IEPC and EPC from commercial buildings and industrial areas as well.
We strived to address these aforementioned problems by integrating POIs and remote sensing datasets. Compared with NTL data and tweet imagery, the superiorities of POIs imagery lie in its unique ability in identifying urban functions [37,38,40,81], which enable us to separately estimate industrial and nonindustrial EPC. POIs can be flexibly converted to raster with arbitrary spatial resolutions [82]. Therefore, they can be easily integrated with NTL and other remote sensing data. Figure 7 provides a visual comparison among radiance-calibrated DMSP/OLS NTL (Figure 8a), the kernel density of the NEPC-related POIs (Figure 8b), the kernel density of the IEPC-related POIs (Figure 8c), the estimated NEPC map (Figure 8d), the estimated IEPC map (Figure 8e), and the Landsat 4/5 TM images (Figure 8g) in four metropolises (i.e., Beijing, Shanghai, Guangzhou, and Chengdu) in China. Evidently, the DMSP/OLS NTL data could not uncover intracity functional zones (Figure 8a). By contrast, the kernel density of the NEPC-related and IEPC-related POIs showed markedly different spatial distribution patterns in the four metropolises (Figure 8b,c). The former shows a characteristic of spatial agglomeration, especially in the urban centers, whereas the latter represents the relatively discrete distribution of industrial enterprises or manufacturing industry clusters in suburban areas. Therefore, industrial and nonindustrial activities can be effectively distinguished by different categories of POIs. As a result of integrating the POIs-related variables in the RF models, the distribution of IEPC and NEPC was effectively distinguished. The urban cores of the four metropolises tended to consume considerably more NEPC (Figure 8d), but high IEPC areas were observed in suburban regions (Figure 8e). The difference maps (Figure 8f), generated from subtracting the TEPC map generated without POIs from the TEPC map generated with POIs, showed high TEPC in suburban areas, consistent with high industrial POI densities (Figure 8d). Therefore, incorporating POI data in the RF model can overcome the considerable underestimation of the IEPC in suburban areas with industrial agglomeration and effectively predict high EPC densities in decentralized industrial zones. In addition, the use of RF enabled the modeling of the complex nonlinear relations between EPC determinants and EPC. Our results showed that RF regression is highly effective for EPC estimation. Even without POI-related variables, the performance of the RF model was also highly satisfactory compared with that of previous studies using linear regressions. In addition to its predictive capability, RF can output useful information about the variable importance. We found that NTL is not the most important predictor for IEPC and NEPC. Including more predictors, especially POI-related variables, substantially improved the model performance. Moreover, GDP was more important than NTL in estimating EPC. Incorporating GDP information in the modeling process also contributed to the high accuracy of our results. The performance of the RF model for IEPC estimation was less satisfactory due to the more complex energy consumption structures in different industrial sectors [67]. Further research should focus on improving the accuracy of IEPC estimation.
Despite the marked improvements in the proposed method, the use of the RF algorithm and POIs for EPC estimation still has certain limitations. For the RF regression, the predictive range of EPC density is restricted to those covered by the training data [83], which were calculated at the prefecture level in this study. Therefore, RF cannot make EPC predictions beyond the training data range, resulting in conservative estimates of high EPC extremes. The main advantage of the POIs data is that they explicitly represent the types and locations of human socioeconomic activities. However, POIs lack the volume information and magnitude of EPC. For example, POIs of small factories and large cement plants, which belong to the same category as industrial enterprise, were considered equally in the RF model, regardless of their distinct EPC. In addition, the distance to the nearest POIs is measured using Euclidean distance, which is inadequate for most realistic analysis [37].
More information on the extent of the POIs and building height, as well as a deeper exploration of the POIs semantic feature may further improve EPC modeling at the local scale. In addition, most POIs are concentrated in urban areas, which most likely limit the improvement of the EPC estimation of our method in urban regions. The POIs density is much lower in rural areas. Therefore, POIs data may not significantly improve the EPC estimation in rural areas. However, compared to urban areas, the EPC in rural areas is much lower. Therefore, the spatial heterogeneity in POIs data in rural areas will not significantly impact the results.

Conclusions
This study highlights the potential of a machine learning-based method and the joint application of POIs and remote sensing data in EPC estimation. For the first time, we uncovered the distinguishing spatial distribution patterns of IEPC and NEPC in cities in China by integrating the POIs of different categories in the RF models. The IEPC, NEPC, and TEPC maps for mainland China, with a 1 km spatial resolution, were produced for 2011. Apparent regional differences in EPC were observed due to the large regional variations in the socioeconomic development levels and physical environment in China. The EPC maps were calibrated using statistical data at the county level. Compared with previous EPC maps over China, our results showed a higher accuracy and precision. It is noteworthy that the previously widely used NTL ranked low in its relative variable importance in the RF models. The POIs density was the most influential variable for NEPC estimation, and NDVI ranked as the most important variable for IEPC prediction. We also discovered that the RF model performance was better for NEPC estimation than IEPC estimation because of the wide differences in energy use among the different industrial sectors. The proposed method represents a novel attempt to effectively improve the rationality and accuracy of the resulting gridded EPC maps. The superiorities of social sensing big data, such as POIs, and machine learning-based methods are expected to become more prominent in spatially disaggregating the socioeconomic parameters obtained from a census to a fine geographic scale.