Mapping China’s Changing Gross Domestic Product Distribution Using Remotely Sensed and Point-of-Interest Data with Geographical Random Forest Model

: Accurate knowledge of the spatiotemporal distribution of gross domestic product (GDP) is critical for achieving sustainable development goals (SDGs). However, there are rarely continuous multitemporal gridded GDP datasets for China in small geographies, and less is known about the variable importance of GDP mapping. Based on remotely sensed and point-of-interest (POI) data, a geographical random forest model was employed to map China’s multitemporal GDP distribution from 2010 to 2020 and to explore the regional differences in the importance of auxiliary variables to GDP modeling. Our new GDP density maps showed that the areas with a GDP density higher than 0.1 million CNY/km 2 account for half of China, mainly distributed on the southeast side of the Hu-line. The proportion of the areas with GDP density lower than 0.05 million CNY/km 2 has decreased by 11.38% over the past decade and the areas with an increase of 0.01 million CNY/km 2 account for 70.73% of China. Our maps also showed that the GDP density of most nonurban areas in northeast China declined, especially during 2015–2020, and the barycenter of China’s GDP moved 128.80 km to the southwest. These results indicate China’s achievements in alleviating poverty and the widening gaps between the South and the North. Meanwhile, the number of counties with the highest importance score for POI density, population density, and nighttime lights in GDP mapping accounts for 52.76%, 23.66%, and 23.56%, respectively, which suggests that they play a crucial role in GDP mapping. Moreover, the relationship between GDP and auxiliary variables displayed obvious regional differences. Our results provide a reference for the formulation of a sustainable development strategy.


Introduction
Gross domestic product (GDP) is considered the most prevalent indicator for measuring national and regional sustainable economic development [1], and multitemporal GDP data are widely used for the spatiotemporal pattern of socioeconomic evolution [1][2][3].Accurate knowledge of the spatiotemporal distribution of GDP is also fundamental for various applications, including economic inequality assessment [4], poverty alleviation [5], environmental protection [6], and public health improvement [7], many of which are related to the UN 2030 sustainable development goals (SDGs) [8].Therefore, accurately mapping the multitemporal GDP and exploring its spatial changes are critical for achieving the SDGs, ranging from the global to the regional level.
GDP statistical data are generally collected at the administrative unit level (e.g., county and province), which not only makes it difficult to reflect the spatial differences of the economic development within administrative units but also has some problems, such as spatial mismatch with other environmental datasets [9].Generating gridded GDP data according to census data is thus an effective way to solve these problems [10,11].Evidence from previous studies showed that using the appropriate auxiliary data is an important prerequisite for GDP spatialization.Elvidge and Baugh et al. [12] pointed out that the satellite-derived nighttime lights (NTL) data are highly correlated to GDP, and then increasing studies verified that NTL could intuitively capture the spatial distribution of economic development to a certain extent.NTL from the Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) and the visible infrared imaging radiometer suite (VIIRS) is thus regarded as the key data in terms of GDP spatialization [5,[13][14][15]].An increasing number of studies that integrated NTL data with multiple geospatial big data, such as density of population [16], land cover type [3,17], human settlements [4], point of interest (POI) [18], and OpenStreetMap [8], have demonstrated that these integrations help depict detailed economy information and improve the GDP density mapping at both global and regional scales.The combination of NTL data and the density of the population can effectively address the saturation and blooming problems in NTL [16].Land cover type is an important complement to economic activities in sectors that do not generate NTL [3,17].Meanwhile, the POI records the spatial information of various geographic entities, which can mirror the spatial distribution of economic activities, and the role of POI has been receiving increasing attention [8,18].
Capturing the complex relationship between GDP and auxiliary variables is critical for enhancing the accuracy of GDP spatialization [8].Simple linear or polynomial models were used to generate global and regional gridded GDP data with limited auxiliary data [17,19].With the increase in auxiliary variables, these methods are susceptible to the multicollinearity of variables, which makes it difficult to capture more complex nonlinear relationships.Machine learning models such as random forest (RF) can overcome these shortcomings to some extent, and their simulation accuracy is better than that of ordinary linear regression models [18,20].Previous studies also pointed out that the relationship between GDP and auxiliary variables might exhibit spatial heterogeneity [21,22]; however, current machine learning models (e.g., RF) do not take into account the spatial difference in the relationship between GDP and auxiliary variables.Some studies have used an economic division model or geographical and time-weighted regression model to map China's GDP [22,23].Huang [20] also noted that the local nonlinear modeling method considering economic division could effectively improve the accuracy of GDP mapping.Recently, Georganos and Grippa et al. [24] proposed the geographical random forest (GRF) model by integrating the theory of geographically weighted regression and the RF model.The GRF model has been applied to the spatial disaggregation of socioeconomic statistical data, such as population [24], corn yield [25], and grazing intensity [26], and confirmed that it can better address spatial nonstationarity.When selecting an appropriate spatial scale to model the data, the GRF model is superior to other models [24][25][26].Moreover, GRF could effectively demonstrate the regional differences in the relationship between dependent and independent variables [27,28].Therefore, GRF provides a new opportunity to further accurately map the spatiotemporal distribution of GDP and identify the regional differences in the importance of related environmental variables.In this way, interventions can be designed and implemented based on regional differences, thus avoiding general guidelines addressed for the entire nation [29,30].
The rapid development of China's economy has affected global economic patterns and human society.In 2010, China became the world's second-largest economy after the United States, with a total GDP of USD 6087 billion, and then grew to USD 14,722 billion in 2020.China has completely eliminated absolute poverty and achieved the poverty reduction goal of "Transforming our World: The 2030 Agenda for Sustainable Development" ten years ahead of schedule [31], which also reduced the size of the world's poor population [32].Until now, several gridded GDP datasets, ranging from global to regional levels, have been generated using various models [8,18,33,34].Most long time series GDP density datasets are typically obtained based on a single NTL data source [33,34].However, the limitations of NTL data in reflecting the GDP of sectors without NTL generation and the differences in various types of socioeconomic activities make it difficult to analyze the spatial and temporal variation in GDP density based on these datasets [8,18].Other studies have employed multiple auxiliary variables to generate gridded GDP datasets, but most of them focus on a given year [8,18], paying little attention to the spatial patterns of GDP change over the last decade.In addition, most gridded GDP datasets were generated using a global model that did not account for spatial nonstationarity [35].
In summary, there are few continuous multitemporal gridded GDP datasets for China in small geographies over the past decade, and less is known about the variable importance in GDP mapping.We aim to map China's changing GDP during 2010-2020 and explore the regional differences in the importance of auxiliary variables for GDP mapping.The details are as follows: (1) to generate the GDP-gridded data in 2010, 2015, and 2020 using the GRF model with China's GDP statistical data at the county level and remote sensing and POI data; (2) to measure the spatiotemporal evolution of China's GDP; (3) to explore the regional differences of the importance of auxiliary variables for GDP modeling.Our results will provide a reference for spatial disaggregation of socioeconomic statistical data, and benefit the formulation of a sustainable development strategy.

Data Source and Preprogressing
The datasets used in this paper contain China's county-level GDP statistical data and auxiliary data.The auxiliary data mainly include NTL, land cover, population density, net primary productivity of vegetation (NPP), DEM, POIs, and road data (Table 1).Based on these data, the gridded datasets of twelve auxiliary variables were generated (Figure A1), and the generation methods are introduced below (seen in Sections 2.1.2and 2.1.3).All datasets were projected in an Albers equal-area projection [36].The GDP statistical data of counties in 2010, 2015, and 2020 in this paper are sourced from the China Statistical Yearbook for Regional Economy in 2011 and the China County Statistical Yearbook in 2016 and 2021, respectively.The GDP statistics of the municipal districts are from the municipal statistical yearbooks of 2011, 2016, and 2021.The GDP statistical data of Hong Kong, Macao, and Taiwan are missing, and the GDP statistics of Tianjin Binhai New Area were not included due to its abnormality.In total, 2767, 2835, and 2793 GDP statistical data at the county level were collected in 2010, 2015, and 2020, respectively.Due to the changes in a few county-level administrative units over the past decade, adjustments were made according to the information on the changes in administrative units in China from 2010 to 2020 (available at http://www.xzqh.org(accessed on 10 May 2023)).Finally, the GDP statistical data at the county level in 2010, 2015, and 2020 were linked to the corresponding administrative units (Figure A2).

Grid Data
Nighttime light intensity data are derived from the extended time series NPP-VIIRSlike NTL data [37].The dataset provides nighttime light information with a spatial resolution of 1 km × 1 km from 2000 to 2020, which has been widely employed to assess and analyze the dynamics of population and socioeconomic development [40].The China land cover dataset (CLCD) contains nine land cover types (farmland, forest, shrub, grassland, water, snow, barren, impervious, and wetland) with a spatial resolution of 30 m per year from 1990 to 2020 [38].Four land cover types were selected to model GDP distribution, including farmland, forest, grassland, and impervious areas.In order to reduce the information loss of data in the resampling process, we produced the percentage data of land cover types with a resolution of 1 km × 1 km according to a previous study [41].The digital elevation model (DEM) data and population density data were resampled to a spatial resolution of 1 km × 1 km using bilinear algorithms [42], and the slope data were obtained based on DEM data.Furthermore, the publicly published GDP-gridded data in 2010, 2015, and 2019, generated by Xu [43] (hereinafter referred to as Xu_GDP gridded data), were downloaded from the resource and environment database of the Chinese Academy of Sciences.These gridded data were developed by the multifactor weight distribution method with the data of land cover types, human settlements, and NTL.The multifactor weight distribution method calculates and standardizes the weight values of each factor, then calculates the total weight value [43].These gridded data were used for comparative analysis of model accuracy.

POI Data
Point-of-interest (POI) data represent typical geospatial big data and are considered important for studying economic activities.POI data record the geographic locations of companies, education services, commercial houses, governments, and other elements.We initially performed data cleaning on the POI data (e.g., screening, deduplication, and sorting) [44].A density analysis was then performed.The optimal bandwidth varies for different categories of POI.To determine the optimal bandwidth for each category of POI data, this paper selected 1~5 km as the bandwidth range, calculated the kernel density of each category of POI at 1 km intervals each time, and generated a gridded layer with a spatial resolution of 1 km × 1 km.After fitting the GDP census and the kernel density data of each POI with different bandwidths using the GRF model (introduced in Section 2.2), the optimal bandwidth was determined according to the optimal R 2 value.Based on the kernel density data of each POI category with the optimal bandwidth, the weight values were calculated according to the percentage increase in the meansquared error (%IncMSE) of each POI category obtained by the GRF model.Table 2 shows the optimal bandwidth, %IncMSE, and weight value of nine categories of POI data in 2010, 2015, and 2020.Finally, nine different categories of POI density gridded data were fused into new density gridded data [45], as shown in Formulas (1) and (2): where n is the number of POI categories, D c denotes the grid value of the fused POI density, D i (i = 1, 2, . . ., 9) denotes the grid value of the ith category of POI density, and W i represents the weight of the ith category of POI density.Furthermore, we also generated the distance of POIs according to the previous study [45].

Geographical Random Forest
A geographical random forest (GRF) is an extension of the classical RF [24].RF is a machine learning method that integrates multiple classification and regression decision trees [46] and has been widely used to solve nonlinear problems [47].RF can measure the performance of the model without cross-validation or setting up a test set, that is, 1/3 of the samples of each tree that are not involved in training are used to estimate the mean square error (MSE) of OOB, and the performance index of OOB R 2 .The model can calculate the %IncMSE of each auxiliary variable, which is used to evaluate the importance of auxiliary variables.The larger the value of %IncMSE, the greater the contribution of the auxiliary variable to the model [48].
The classical RF is a nonspatial model, which makes it difficult to deal with the problem of spatial heterogeneity.Based on the theory of the GWR model, Georganos et al. [24] proposed a GRF model by extending the random forest model to a combination of multiple local submodels in space.Specifically, the GRF model generates a spatial weight matrix by introducing the spatial information of observation points and then integrates it with RF into a local regression analysis framework [49].When dealing with spatial heterogeneity, GRF not only improves the prediction performance of traditional RF models but also helps to explore the coupling relationship between dependent variables and environmental variables in local space.Each local submodel calculates the %IncMSE of each environmental variable, which can reflect the spatial heterogeneity of the importance score of each variable.The maximum distance between the observation point and the kernel is called the bandwidth.Given the large differences in areas of county administrative units in China, the adaptive kernel is chosen [24], and the bi-square kernel function is selected as the weight kernel function, as shown in Formula (3): in which w ij is the weight of the observation point, d ij is the distance from the observation point to the submodel, and b represents the bandwidth.
The geographical random forest model employs the spatial weight matrix to adjust the sampling weight of samples in each local RF, as shown in Formula (4): The equation of the classical RF regression is simplified as shown in Formula (5): where Y i is the county GDP value of the ith observation point, ax i is a nonlinear prediction of RF based on a set of auxiliary variables, x, and e is the residual error.GRF extends Formula (5) to use a subset of the original dataset each time, as shown in Formula (6): where a(u i , v i ) is the prediction of the RF model calibrated at the position of sample i, and The basic steps of geographical random forest are as follows: (1) The distance matrix between samples is calculated, and then the weight matrix is calculated using the bi-square kernel function according to the bandwidth parameters.(2) We input the ith sample, the neighborhood samples with sampling weight larger than 0, and their respective sampling weights, then, we train the ith local RF and calculate the variable importance until all local RFs have been trained.(3) The environmental variables and coordinates are inputted to calculate the distance matrix based on the coordinates.We then select the nearest local RF model for prediction and output the predicted GDP weights.In this paper, the R package SpatialML is used, and more information can be found in Kalogirou [50] and Georganos et al. [24,51].

GDP Density Mapping
Census GDP data were then disaggregated using a dasymetric mapping approach based on these weighting layers [52], as shown in Formula (7): where GDP grid denotes the GDP pixel value, GDP county denotes the GDP statistic value of the county, W grid denotes the pixel value of GDP weight, and W county represents the sum of the pixel values of the county's GDP weight.

Accuracy Assessment
The determination coefficient (R 2 ), root-mean-square error (RMSE), and mean absolute error (MAE) are used to evaluate the simulation effect of the GDP density weight layer, as shown in Formulas ( 8)-(10): where n is the number of counties, Y c,i is the statistical value of the ith county, Y e,i is the weight value of the prediction of the ith county, and Y c and Y e are the averaged value of the GDP statistical data and the prediction.Due to the lack of GDP statistical indicators at the township level, this paper compared the predicted GDP density weight value with the GDP statistical data, and then compared the modified GDP density maps with the publicly available Xu_GDP maps, in order to evaluate the rationality of the new GDP-gridded data.

Spatial Barycenter Model
The spatial barycenter model is a useful tool in visually and accurately tracing the process of spatial change in geographic elements [53].We employed this model to explore the spatial changes in China's GDP.The economic barycenter is calculated as shown in Formula ( 11): where X and Y represent the coordinate of the economic barycenter, X i and Y i represent the coordinate of the ith grid of GDP density, respectively, and M i represents the value of the ith grid of GDP density.The moving distance of the economic barycenter is calculated as shown in Formula ( 12): where D s−k denotes the distance of the economic barycenter moving between two different years (s and k) and (X s , Y s ) and (X k , Y k ) denote the coordinate of economic barycenter in the sth year and kth year, respectively.In this study, we converted gridded data from three periods into point data using ArcGIS software and then conducted a spatial barycenter analysis of China's GDP density from 2010 to 2020.

Multitemporal GDP Density Maps
The GRF model, used to predict China's GDP density, has a consistently high variance explained for 2010, 2015, and 2020, with values of 95.73%, 95.86%, and 96.61%, respectively.Further assessments show that the GDP density weight and GDP census data at the county level in 2010, 2015, and 2020 are highly consistent, with R 2 values of 0.79, 0.79, and 0.75, and MAE values of 4.91 × 10 6 , 8.02 × 10 6 , and 12.97 × 10 6 , respectively (Figure 1).Our GDP density maps finely depict the spatial pattern of China's GDP distribution, which is high in the southeast and low in the northwest (Figure 2).According to the GDP density classification in previous studies [8,10], we found that the areas with a GDP density below 0.1 million CNY/km 2 account for about half of China (Table 3), mainly distrib- Our GDP density maps finely depict the spatial pattern of China's GDP distribution, which is high in the southeast and low in the northwest (Figure 2).According to the GDP density classification in previous studies [8,10], we found that the areas with a GDP density below 0.1 million CNY/km 2 account for about half of China (Table 3), mainly distributed in grassland and desert areas in the northwest side of the Hu-line.The areas with a GDP density higher than 5 million CNY/km 2 account for about one-tenth of China, mainly distributed in urban agglomerations such as the Pearl River Delta and Yangtze River Delta.The proportion of the areas below 0.05 million CNY/km 2 decreased from 43.82% in 2010 to 32.44% in 2020, with a decrease of 8.32% and 3.06% in the first and second five-year periods, respectively.The proportion of the areas above 1 million CNY/km 2 increased from 22.11% in 2010 to 32.22% in 2020, with an increase of 6.74% and 3.37% in the first and second five-year periods, respectively.Moreover, the proportion of the areas above 25 million CNY/km 2 increased from 2.21% in 2010 to 4.68% in 2020, with an increase of more than 100%.These results showed that China's GDP density shows a trend of continuous increase in the proportion of medium and rich areas and a rapid decrease in the proportion of poor areas.Our GDP density maps finely depict the spatial pattern of China's GDP distribution, which is high in the southeast and low in the northwest (Figure 2).According to the GDP density classification in previous studies [8,10], we found that the areas with a GDP density below 0.1 million CNY/km 2 account for about half of China (Table 3), mainly distributed in grassland and desert areas in the northwest side of the Hu-line.The areas with a GDP density higher than 5 million CNY/km 2 account for about one-tenth of China, mainly distributed in urban agglomerations such as the Pearl River Delta and Yangtze River Delta.The proportion of the areas below 0.05 million CNY/km 2 decreased from 43.82% in 2010 to 32.44% in 2020, with a decrease of 8.32% and 3.06% in the first and second five-year periods, respectively.The proportion of the areas above 1 million CNY/km 2 increased from 22.11% in 2010 to 32.22% in 2020, with an increase of 6.74% and 3.37% in the first and second five-year periods, respectively.Moreover, the proportion of the areas above 25 million CNY/km 2 increased from 2.21% in 2010 to 4.68% in 2020, with an increase of more than 100%.These results showed that China's GDP density shows a trend of continuous increase in the proportion of medium and rich areas and a rapid decrease in the proportion of poor areas.

Spatiotemporal Changes in GDP Density
China's GDP increased from 2010 to 2020 overall.The areas with an increase and decrease of 0.01 million CNY/km 2 account for 70.73% and 6.77% of China, respectively (Figure 3c and Table 4).Spatially, GDP density in central and southern China and nondesert regions in western China showed a significant increase, while GDP density in some areas in northeast China declined to a certain extent.

Spatiotemporal Changes in GDP Density
China's GDP increased from 2010 to 2020 overall.The areas with an increase a decrease of 0.01 million CNY/km 2 account for 70.73% and 6.77% of China, respectiv (Figure 3c and Table 4).Spatially, GDP density in central and southern China and no desert regions in western China showed a significant increase, while GDP density in som areas in northeast China declined to a certain extent.The changes in GDP density both during 2010-2015 and 2015-2020 display spatial heterogeneity (Figure 3a,b).The areas where GDP density increased by more than 0.1 million CNY/km 2 in the five years before and after account for 45.10% and 32.38% of China, respectively (Table 4), with the former being approximately 1.4 times the latter.The former is primarily distributed in the southeast side of the Hu-line, while the latter is mainly concentrated in southeast China and urban agglomerations.Furthermore, the area with GDP declining by more than 0.01 million CNY/km 2 in the five years before and after 2015 accounts for 4.03% and 23.58% of China, respectively.The latter is approximately six times the former, mainly in the rural areas of northeast China and parts of north China.
The result showed that barycenters of China's GDP moved 128.80 km to the southwest in the past decade (Figure 3d).The barycenters of China's GDP shifted 57.33 km to the southwest in 2010-2015 and continued to move 73.78 km to the south and west in 2015-2020, that is, from the northwestern edge of Anhui Province to the southwest to the border area between Henan Province and Hubei Province.The moving trajectory of barycenters of China's GDP was dominated by sustainable development in southeast China, the rise of central China, and the economic recession in northeast China, (Figure 3).These results indicated that the phenomenon of the shift from "strong north and weak south" to "strong south and weak north" was enhanced.

Regional Difference of Random Forest Variable Importance
Among the twelve auxiliary variables, the mean value of the %IncMSE in the GRF model showed that POI density, NLT, and population density ranked in the top three.The value of the %IncMSE of human environment variables was generally higher than that of the natural environment variables (Figure 4a).POI density, NLT, and population density reflect spatial aggregation and regional differences of labor, resources, and capital in detail, which is beneficial for capturing the spatial differences of GDP.The spatial distribution of the variables with the highest importance scores of each submodel is shown in Figure 4b.The number of counties with the highest importance score of POI density in GDP mapping accounts for 52.78% of total counties, while those for population density and NTL accounted for 23.66% and 23.56%, respectively.The areas with the highest importance score of NLT to GDP are concentrated in the coastal provinces from Hebei to Guangdong, while the areas with the highest importance score of population density are mainly located in southwest China (e.g., Yunnan Province and southern Sichuan Province) and central north China (e.g., southwestern Shanxi Province).These results suggest that POI density, NLT, and population density are the three key auxiliary variables that affect the spatial distribution of GDP density, and they show obvious spatial heterogeneity.
The changes in GDP density both during 2010-2015 and 2015-2020 display spatial heterogeneity (Figure 3a,b).The areas where GDP density increased by more than 0.1 million CNY/km 2 in the five years before and after account for 45.10% and 32.38% of China, respectively (Table 4), with the former being approximately 1.4 times the latter.The former is primarily distributed in the southeast side of the Hu-line, while the latter is mainly concentrated in southeast China and urban agglomerations.Furthermore, the area with GDP declining by more than 0.01 million CNY/km 2 in the five years before and after 2015 accounts for 4.03% and 23.58% of China, respectively.The latter is approximately six times the former, mainly in the rural areas of northeast China and parts of north China.
The result showed that barycenters of China's GDP moved 128.80 km to the southwest in the past decade (Figure 3d).The barycenters of China's GDP shifted 57.33 km to the southwest in 2010-2015 and continued to move 73.78 km to the south and west in 2015-2020, that is, from the northwestern edge of Anhui Province to the southwest to the border area between Henan Province and Hubei Province.The moving trajectory of barycenters of China's GDP was dominated by sustainable development in southeast China, the rise of central China, and the economic recession in northeast China, (Figure 3).These results indicated that the phenomenon of the shift from "strong north and weak south" to "strong south and weak north" was enhanced.

Regional Difference of Random Forest Variable Importance
Among the twelve auxiliary variables, the mean value of the %IncMSE in the GRF model showed that POI density, NLT, and population density ranked in the top three.The value of the %IncMSE of human environment variables was generally higher than that of the natural environment variables (Figure 4a).POI density, NLT, and population density reflect spatial aggregation and regional differences of labor, resources, and capital in detail, which is beneficial for capturing the spatial differences of GDP.The spatial distribution of the variables with the highest importance scores of each submodel is shown in Figure 4b.The number of counties with the highest importance score of POI density in GDP mapping accounts for 52.78% of total counties, while those for population density and NTL accounted for 23.66% and 23.56%, respectively.The areas with the highest importance score of NLT to GDP are concentrated in the coastal provinces from Hebei to Guangdong, while the areas with the highest importance score of population density are mainly located in southwest China (e.g., Yunnan Province and southern Sichuan Province) and central north China (e.g., southwestern Shanxi Province).These results suggest that POI density, NLT, and population density are the three key auxiliary variables that affect the spatial distribution of GDP density, and they show obvious spatial heterogeneity.The spatial distribution of the importance score of the twelve auxiliary variables is shown in Figure 5.The %IncMSE values of twelve variables show regional differences, indicating spatial heterogeneity in the relationship between the auxiliary variable and GDP density.Specifically, the importance scores of POI density, NLT, and population density were generally greater than 15.Among them, the importance score of POI was high in northwest China and low in southeast China, while the importance score of population density was high in western China and low in eastern China.The importance scores of natural environmental variables such as slope, elevation, NPP, and proportion of forest area are relatively low.The importance score of both the proportion of forest area and NPP were high in the southeast and low in the northwest.
The spatial distribution of the importance score of the twelve auxiliary variables is shown in Figure 5.The %IncMSE values of twelve variables show regional differences, indicating spatial heterogeneity in the relationship between the auxiliary variable and GDP density.Specifically, the importance scores of POI density, NLT, and population density were generally greater than 15.Among them, the importance score of POI was high in northwest China and low in southeast China, while the importance score of population density was high in western China and low in eastern China.The importance scores of natural environmental variables such as slope, elevation, NPP, and proportion of forest area are relatively low.The importance score of both the proportion of forest area and NPP were high in the southeast and low in the northwest.

Comparison with the Open Access Gridded GDP Datasets
The determination coefficient (R 2 ) for the GDP density simulated by GRF and the county statistical value reached or exceeded 0.75 (Figure 1).In previous studies of county population spatialization, an R 2 value of 0.70 or higher indicates that the spatial characteristics of population density can be described well [54,55], which to some extent indicates the feasibility of our spatialized GDP results.
This study generated multitemporal GDP density maps based on the geographical random forest and multisource data.Here, we compared them with the publicly available Xu_GDP gridded data, taking Beijing, Shanghai, and Guangzhou in 2015 as examples (Figure 6).In terms of the spatial distribution of GDP, the two gridded data show obvious clusters of GDP in urban areas, while our new maps depict more detailed differences within the administrative units at the county level, avoiding overestimation in nonurban areas.These phenomena benefit from the advantage of the geographical random forest's ability to capture the complex relationship between GDP and POI, NLT, land cover, and other auxiliary variables.Furthermore, in terms of the spatial pattern of the changes in GDP from 2010 to 2020, our new GDP maps also accurately depicted the spatial differences in changes in GDP density within the administrative units at the county level and were consistent with changes in NLT, POI density, and population density (Figure 7).

Comparison with the Open Access Gridded GDP Datasets
The determination coefficient (R 2 ) for the GDP density simulated by GRF and county statistical value reached or exceeded 0.75 (Figure 1).In previous studies of coun population spatialization, an R 2 value of 0.70 or higher indicates that the spatial char teristics of population density can be described well [54,55], which to some extent in cates the feasibility of our spatialized GDP results.
This study generated multitemporal GDP density maps based on the geographi random forest and multisource data.Here, we compared them with the publicly availa Xu_GDP gridded data, taking Beijing, Shanghai, and Guangzhou in 2015 as examp (Figure 6).In terms of the spatial distribution of GDP, the two gridded data show obvio clusters of GDP in urban areas, while our new maps depict more detailed differen within the administrative units at the county level, avoiding overestimation in nonurb areas.These phenomena benefit from the advantage of the geographical random fores ability to capture the complex relationship between GDP and POI, NLT, land cover, a other auxiliary variables.Furthermore, in terms of the spatial pattern of the changes GDP from 2010 to 2020, our new GDP maps also accurately depicted the spatial diff ences in changes in GDP density within the administrative units at the county level a were consistent with changes in NLT, POI density, and population density (Figure 7).

Change in GDP Density over the Past Decades
In 2020, the areas with a GDP density below 0.05 million CNY/km 2 comprised approximately 1/3 of China, mainly located in remote mountainous regions with sparse populations and vast wilderness.Compared to their extent in 2010, these areas experienced an 11.38% reduction (Figure 2 and Table 3).(Figure 2 and Table 3).Over the past decade, areas experiencing an increase of 0.01 million CNY/km 2 in GDP density account for more than 2/3 of China (Figure 3a and Table 4).These results are a great achievement, as China has insisted on comprehensive poverty alleviation and focused on the implementation of the rural revitalization strategy [56].Meanwhile, in 2020, areas with a GDP density higher than 25 million CNY/km 2 accounted for 4.68% of China.These areas mainly comprised urban agglomerations such as Beijing, Shanghai, Guangzhou, and Shenzhen, as well as prefecture-level cities, and their GDP density nearly doubled compared to that in 2010.This is related to China actively promoting the urbanization process, especially for the urban agglomerations [57].
The areas with a GDP density increase of more than 0.1 million CNY/km 2 during 2010-2015 are 1.4 times greater than those during 2015-2020, reflecting the general background of a continued slowdown in China's economic growth, that is, from approximately 10% to 6.1% in 2019 compared with the previous decade [53].Meanwhile, the GDP in some rural areas of northeast China and parts of northern China declined from 2015 to 2020, accounting for approximately a quarter of China.This phenomenon may be related to the changing demographics of China's rural population, as well as ecological environmental protection and construction efforts, such as reforestation projects [53,58,59].The 2020 census data showed that the northeastern region has experienced population shrinkage, especially in rural areas [60].In addition, the growth rate of the old industrial bases in northeast China has declined, and the radiating and leading roles of large cities in in-

Change in GDP Density over the Past Decades
In 2020, the areas with a GDP density below 0.05 million CNY/km 2 comprised approximately 1/3 of China, mainly located in remote mountainous regions with sparse populations and vast wilderness.Compared to their extent in 2010, these areas experienced an 11.38% reduction (Figure 2 and Table 3).(Figure 2 and Table 3).Over the past decade, areas experiencing an increase of 0.01 million CNY/km 2 in GDP density account for more than 2/3 of China (Figure 3a and Table 4).These results are a great achievement, as China has insisted on comprehensive poverty alleviation and focused on the implementation of the rural revitalization strategy [56].Meanwhile, in 2020, areas with a GDP density higher than 25 million CNY/km 2 accounted for 4.68% of China.These areas mainly comprised urban agglomerations such as Beijing, Shanghai, Guangzhou, and Shenzhen, as well as prefecture-level cities, and their GDP density nearly doubled compared to that in 2010.This is related to China actively promoting the urbanization process, especially for the urban agglomerations [57].
The areas with a GDP density increase of more than 0.1 million CNY/km 2 during 2010-2015 are 1.4 times greater than those during 2015-2020, reflecting the general background of a continued slowdown in China's economic growth, that is, from approximately 10% to 6.1% in 2019 compared with the previous decade [53].Meanwhile, the GDP in some rural areas of northeast China and parts of northern China declined from 2015 to 2020, accounting for approximately a quarter of China.This phenomenon may be related to the changing demographics of China's rural population, as well as ecological environmental protection and construction efforts, such as reforestation projects [53,58,59].The 2020 census data showed that the northeastern region has experienced population shrinkage, especially in rural areas [60].In addition, the growth rate of the old industrial bases in northeast China has declined, and the radiating and leading roles of large cities in influencing surrounding small and medium-sized towns are limited [53].The potential reasons for the decline in GDP density in these regions should be further explored, to prevent poverty from returning.
Over the past decade, the economic gravity center shifted southwestward by 128.80 km (Figure 3d).This is consistent with the existing result based on provincial GDP in terms of direction [53]; however, the distance moved is approximately twice as far as the latter, which is likely due to the use of refined GDP-gridded data in our study.As China's economy continues to grow, the widening gaps between the South and the North [53] suggest that regional development imbalances might become a pressing issue for China in the future [61].

Regional Differences in Random Forest Variable Importance
POI density ranks first in the importance score in the GRF model, and the proportion of counties with the highest importance score of POI density to GDP mapping is 52.78% (Figure 4), which reflects the key role of POI data in GDP spatialization [18].This is followed by NLT and population density, accounting for 23.56% and 23.66%, respectively.Accelerating the flow of production factors and concentrating production factors help promote economic development.The former is concentrated in the eastern coastal provinces of China.This concentration is related to the development and opening policy tilted toward the coastal areas in the last 40 years, which is likely related to the obvious difference between urban and nonurban power facilities [61].These results show that the relationship between GDP density and key auxiliary variables exhibits spatial heterogeneity, necessitating the integration of multisource data in GDP spatialization.

Limitation and Future Work
The reliability of spatialized GDP results depends heavily on the quality of statistical data, while a little GDP statistical data at the county level may contain errors, likely due to intentional manipulation [5].Meanwhile, we did not take into account the Black Swan event in 2020-the COVID-19 pandemic-which led to slower GDP growth rates compared to other years, especially for Hubei Province.Hence, it might have a negative effect on the result of the spatial pattern of the change in GDP density during 2010-2020.The selected indicators may not apply to some resource-based cities, which brings uncertainty to the spatial distribution of GDP density in these cities.Additionally, to the best of our knowledge, the POI dataset can be traced back to 2012, and no earlier dataset has been published publicly.Meanwhile, the POI dataset might not be complete in remote areas in the year 2012, while it might be more abundant in the year 2020, which also brings uncertainties to the analysis of the change in GDP density.
A recent study noted that new geospatial big data (e.g., Tencent user density (TUD) data) is a promising data source for mapping GDP density [8]; more attention should be paid to integrating multisource geospatial big data to further improve the precision of multitemporal GDP mapping in future work.Furthermore, even though our maps are seasonable and acceptable, the GRF is a shallow machine learning method.Recent studies pointed out that deep learning performs better in generating population grid data than shallow machine learning [62].A geographically deep learning model was used to downscale land surface temperature and water quality [63,64], suggesting the potential to map GDP density using a geographical deep learning model.

Conclusions
We mapped China's GDP distribution using GRF based on remotely sensed and POI data in 2010, 2015, and 2020.Our new maps finely depict the spatial pattern of China's GDP distribution, which mirrors China's achievements in alleviating poverty and the widening gaps between the South and North.The proportion of areas with GDP density lower than 0.05 million CNY/km 2 decreased by 11.38% over the past decade, while that higher than 1 million CNY/km 2 increased by 11.11%.The areas with an increase of 0.01 million CNY/km 2 account for more than 2/3 of China, while the GDP density of nonurban areas in northeast China declined, especially between 2015 and 2020.The spatial barycenter analysis further showed that the barycenter of China's GDP had moved to the southwest.Meanwhile, the relationships between GDP density and auxiliary variables exhibit spatial heterogeneity.The number of counties with the highest importance score of POI density in GDP mapping accounts for 52.78% of the total number of counties, while population density and NTL accounted for 23.66% and 23.56%, respectively.Spatially, the counties with the highest importance score of population density to GDP mapping are primarily distributed in the southwest and central north China, and those for NTL in GDP mapping are mainly located in the coastal provinces.These indicate that spatial differences in the relationship between GDP and auxiliary variables should be taken into account in the current models of GDP mapping.

Figure 4 .
Figure 4.The mean value of %IncMSE (a) and spatial distribution of the highest importance score of POI density, NLT, and population density (b) in the GRF model.

Figure 4 .
Figure 4.The mean value of %IncMSE (a) and spatial distribution of the highest importance score of POI density, NLT, and population density (b) in the GRF model.

Figure 5 .
Figure 5. Spatial pattern of the %IncMSE of (a) elevation, (b) slope, (c) percentage of cultivated areas, (d) percentage of forest areas, (e) percentage of grassland areas, (f) percentage of impervious areas, (g) night-light intensity, (h) net primary productivity, (i) density of POIs, (j) distance of POIs, (k) density of population, (l) distance of road, respectively.

Figure 5 .
Figure 5. Spatial pattern of the %IncMSE of (a) elevation, (b) slope, (c) percentage of cultivated areas, (d) percentage of forest areas, (e) percentage of grassland areas, (f) percentage of impervious areas, (g) ight-light intensity, (h) net primary productivity, (i) density of POIs, (j) distance of POIs, (k) density of population, (l) distance of road, respectively.

Figure 6 .
Figure 6.Comparison of our new GDP and Xu_GDP gridded data in 2015: (a-e) Beijing, (f-j) Sha hai, (k-o) Guangzhou.(a,f,k) indicate our new GDP maps; (b,g,i) indicate Xu_GDP maps; (c,h indicate NTL; (d,i,n) represent the density of POIs; and (e,j,o) are the land cover types.

Figure 6 .
Figure 6.Comparison of our new GDP and Xu_GDP gridded data in 2015: (a-e) Beijing, (f-j) Shanghai, (k-o) Guangzhou.(a,f,k) indicate our new GDP maps; (b,g,i) indicate Xu_GDP maps; (c,h,m) indicate NTL; (d,i,n) represent the density of POIs; and (e,j,o) are the land cover types.

Figure 7 .
Figure 7.Comparison of spatial changes in our new GDP from 2010 to 2020 and Xu_GDP gridded data from 2010 to 2019: (a-e) Beijing, (f-j) Shanghai, (k-o) Guangzhou.(a,f,k) indicate our new GDP maps; (b,g,i) indicate Xu_GDP maps; (c,h,m) indicate NTL; (d,i,n) represent the density of POIs; and (e,j,o) represent the density of Pop.

Figure 7 .
Figure 7.Comparison of spatial changes in our new GDP from 2010 to 2020 and Xu_GDP gridded data from 2010 to 2019: (a-e) Beijing, (f-j) Shanghai, (k-o) Guangzhou.(a,f,k) indicate our new GDP maps; (b,g,i) indicate Xu_GDP maps; (c,h,m) indicate NTL; (d,i,n) represent the density of POIs; and (e,j,o) represent the density of Pop.

Figure A1 .
Figure A1.Spatial pattern of the twelve auxiliary variables in 2020: (a) elevation, (b) slope, (c) the percentage of cultivated areas, (d) the percentage of forest areas, (e) the percentage of grassland areas, (f) the percentage of impervious areas, (g) night-light intensity, (h) net primary productivity, (i) density of POIs, (j) distance of POIs, (k) density of population, and (l) distance of the road.

Figure A1 .
Figure A1.Spatial pattern of the twelve auxiliary variables in 2020: (a) elevation, (b) slope, (c) the percentage of cultivated areas, (d) the percentage of forest areas, (e) the percentage of grassland areas, (f) the percentage of impervious areas, (g) night-light intensity, (h) net primary productivity, (i) density of POIs, (j) distance of POIs, (k) density of population, and (l) distance of the road.

Figure A2 .
Figure A2.Maps of the GDP statistical data at the county level in (a) 2010, (b) 2015, and (c) 2020.

Table 1 .
Summary of the datasets used in this study.

Table 2 .
The optimal bandwidth and kernel density weight of each category of POI.

Table 3 .
The area proportion of GDP density classification.

Table 4 .
The area proportion of GDP density changes.

Table 4 .
The area proportion of GDP density changes.