E ﬀ ect of Urbanization on Ecosystem Service Values in the Beijing-Tianjin-Hebei Urban Agglomeration of China from 2000 to 2014

: Rapid urbanization has led to the continuous deterioration of the surrounding natural ecosystem. It is important to identify the key urbanization factors that a ﬀ ect ecosystem services and analyze the potential e ﬀ ects of these factors on the ecosystem. We selected the Beijing, Tianjin, and Hebei (BTH) urban agglomeration to investigate these e ﬀ ects, and designed three indicators to map the urbanization level: Population density, gross domestic product (GDP) density, and the construction land proportion. Four indicators were chosen to quantify ecosystem services: Food production, carbon sequestration and oxygen production, water conservation, and soil conservation. To handle the nonlinear interactions, we used a random forest (RF) method to assess the e ﬀ ect of urbanization on ecosystem services in the BTH area from 2000 to 2014. Our study demonstrated that population density and economic growth were the internal driving forces a ﬀ ecting ecosystem services. We observed changing trends in the e ﬀ ect of urbanization: The e ﬀ ect of population density on ecosystem services increased, the e ﬀ ect of the proportion of construction land was consistent with population density, and the e ﬀ ect of GDP density on ecosystem services decreased. Our results suggest that controlling the population and GDP would signiﬁcantly inﬂuence the sustainable development in large urban areas.


Introduction
Since the beginning of the last century, cities have gradually become the dominant habitat for mankind [1]. With high-density urban populations, industrialization, and the rapid expansion of urbanized land, cities have had an enormous impact on the surrounding ecosystems that maintain the normal functioning of the natural world and that foster human societies [2]. Ecosystem service values have increased with biodiversity protection, and increases in the areas of grassland, forest land, water bodies [3]. However, in recent decades, the deepening of the global urbanization process has led to the continuous deterioration of natural ecosystems and decreases in the values of ecosystem service [4][5][6]. The sustainable development of cities has become an important topic in governmental and academic circles. In China, the sustainable development of cities is the key to a harmonious society. The policymakers need to consider both the socioeconomic and ecological environment factors in order to make the right decisions for the sustainable development of cities. Hence, it is important to identify

Data Sources
We used both spatial and statistical data in this research. The spatial data comprised: (1) The digital elevation model (DEM) at a spatial resolution of 90 × 90 m, supplied by the Geo-spatial Data Cloud Station (http://www.giscloud.cn/) of the Computer Network Information Center of the Chinese Academy of Sciences. (2) Population and GDP data, rainfall data, and normalized vegetation index (NDVI) data at a spatial resolution of 1 × 1 km, supplied by the Resource and Environment Data Cloud Platform (http://www.resdc.cn/) of the Chinese Academy of Sciences. We used the inverse distance weighting method to interpolate the relevant missing data. (3) The net primary productivity (NPP) data at a spatial resolution of 1 × 1 km were provided by the MOD17A3 data set (https://lpdaac.usgs.gov/). (4) The land use and land cover change (LUCC) data at a spatial resolution of 300 × 300 m were provided by the European Space Agency (http://neo.ssa.esa.int/advanced-search) and were reclassified. The landscape was divided into eight categories: paddy fields, dry land, woodland, other woodland, shrub land, water areas, construction land, and barren land. (5) The soil data were provided by the World Soil Database

Data Sources
We used both spatial and statistical data in this research. The spatial data comprised: (1) The digital elevation model (DEM) at a spatial resolution of 90 × 90 m, supplied by the Geo-spatial Data Cloud Station (http://www.giscloud.cn/) of the Computer Network Information Center of the Chinese Academy of Sciences. (2) Population and GDP data, rainfall data, and normalized vegetation index (NDVI) data at a spatial resolution of 1 × 1 km, supplied by the Resource and Environment Data Cloud Platform (http://www.resdc.cn/) of the Chinese Academy of Sciences. We used the inverse distance weighting method to interpolate the relevant missing data. (3) The net primary productivity (NPP) data at a spatial resolution of 1 × 1 km were provided by the MOD17A3 data set (https://lpdaac.usgs.gov/). (4) The land use and land cover change (LUCC) data at a spatial resolution of 300 × 300 m were provided by the European Space Agency (http://neo.ssa.esa.int/advanced-search) and were reclassified. The landscape was divided into eight categories: paddy fields, dry land, woodland, other woodland, shrub land, water areas, construction land, and barren land. (5) The soil data were provided by the World Soil Database (http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database). The statistical data included grain production and other production supplies obtained from the Beijing Statistical Yearbook [38][39][40][41], Tianjin Statistical Yearbook [42][43][44][45], and Hebei Statistical Yearbook [46][47][48][49]. Population, GDP, and LUCC data were used to quantify urbanization levels. Other data were used to calculate ecosystem service values by empirical models.

Selection of Urbanization Indicators
Urbanization is reflected in four aspects: Living standards improvement, population growth, economic development, and construction land expansion [50]. The improvements in the standard of living can be represented by population growth, economic development, and construction land expansion [27]. We described the urbanization levels of the BTH urban agglomeration in terms of population growth, economic development, and construction land expansion. The population density (person/km 2 ) was made use of characterizing population urbanization, GDP density (yuan/km 2 ) was made use of reflecting economic urbanization, and the construction land proportion in the every pixel (1 × 1 km) was made use of describing the situation of construction land expansion [22] (see Table 1).

RF Regression Model
RF is a machine learning method using a supervised learning algorithm that improves the stability and accuracy of regression and classification [30]. The RF method can reduce large bias and avoid over-fitting to a certain extent based on mean prediction of individual trees. RF is normally used to deal with regression and classification problems, and the methods of data treatment differ according to the details of the problem. In principle, an RF regression model can be designed by following the steps outlined below. When there is a training set D that contains N data points, we use RF to perform regression on the training set D. First, bootstrapping is used to select n subsets D i (the range of i is 1 to n) from the training set D as new training sets. Each of the D i contain N' data points (the amount of data N' in the subsets D i is the same as the amount N in the set D. However, the content is different, because elements in subsets D i can be repeated). Afterwards, using the regression algorithm on the n subsets D i , we have n results. Taking the average value of these results, we can obtain the final result of the RF regression model [51].
Based on the data for population, GDP, and LUCC, we generated raster maps of urbanization levels (population density, GDP density, and construction land proportion) at a spatial resolution of 1 × 1 km in 2000, 2005, 2010, and 2014. Then, we extracted relevant raster data as input to construct the RF regression model by Python. We randomly divided the data set into a training set (70% of the total) and a testing set (30% of the total). The training set was used for constructing the RF regression model under the optimal parameter combination, and the testing set was used for evaluating model performance. Finally, the RF regression model output importance scores for the three urbanization indicators (based on the Gini index) reflecting the extent of the effect of urbanization on ecosystem service in the BTH urban agglomeration in 2000-2014 ( Figure 2).

Quantifying Ecosystem Services Using Empirical Models
Our study selected four kinds of ecosystem services (see Table 1), the methods of quantifying them are summarized as follows:  Food production Food production is essential for sustainable development of the BTH region [52]. We calculated the annual amount of food production according to the method [22]. The calculation formula is as follows: where Px (t/km 2 ) is the annual amount of food production for grid x (1 km 2 ). The subscript x refers to the grid point in the study area; i represents three types of food (grain, oil crops, and vegetables). A represents the area of cultivated land for grid point x. P refers to the average yield of the selected three food types per unit area [23,53], and the data were obtained from the Beijing Statistical Yearbook [38][39][40][41], Tianjin Statistical Yearbook [42][43][44][45], and Hebei Statistical Yearbook [46][47][48][49].
 Carbon sequestration and oxygen production Feature Importance Scores represent the importance scores of three urbanization levels. The grid data (210,000) were extracted from raster maps of population density, GDP density, and the proportion of construction land as input to construct the RF regression model by Python. The model outputs feature importance scores (the Gini index can be used to measure the contribution of each feature in the RF regression model; the contributions of features represent importance scores). The scores reflect the effect of urbanization on ecosystem service values.

Quantifying Ecosystem Services Using Empirical Models
Our study selected four kinds of ecosystem services (see Table 1), the methods of quantifying them are summarized as follows: • Food production Food production is essential for sustainable development of the BTH region [52]. We calculated the annual amount of food production according to the method [22]. The calculation formula is as follows: where P x (t/km 2 ) is the annual amount of food production for grid x (1 km 2 ). The subscript x refers to the grid point in the study area; i represents three types of food (grain, oil crops, and vegetables).

Carbon sequestration and oxygen production
We used the following equation to calculate the annual amounts of carbon sequestration and oxygen production [22,54].
where C x (t/km 2 ) is the annual amount of carbon sequestration and oxygen production for grid x (1 km 2 ); x refers to the grid point in the study area; and NPP x (t/km 2 ) is the value of NPP for grid point x. •

Water conservation
We calculated the annual amount of water conservation according to the method [55]. The calculation formula was as follows: where W x (mm) represents the amount of water conservation for grid point x (1 km 2 ); x refers to the grid point in the study area; PPT x (mm) is the quantity of precipitation for grid point x; and ET x (mm) is the quantity of evapotranspiration for grid point x. ET x is calculated as where PET (mm) represents potential evapotranspiration and can be obtained by a simple calculation method [56]. LAI is leaf area index, this can be obtained by using the relationship between NDVI and LAI under different land use types [57]. The coefficients of Equation (4) are defined through the calibration for the equation with statistically modeling of the relationships between observed data and estimated data by Jia et al. [55]. •

Soil conservation
We used the InVEST model to quantify soil conservation. The calculation formula of annual soil conservation function is as follows [27]: where S x (t/km 2 ) represents the annual soil conservation function for grid point x (1 km 2 ); x refers to the grid point in the study area. RKLS x (t/km 2 ) is the annual amount of potential soil erosion for grid point x. USLE x (t/km 2 ) is the annual amount of actual soil erosion for grid point x. R x (MJ·mm/(km 2 ·hr)) is the rainfall erosivity for grid point x, which can be calculated by the Brinell algorithm. K x (t·km 2 ·hr/(MJ·km 2 ·mm)) is the soil erodibility factor for grid point x, which can be obtained by the estimation method for K in the EPIC model [58]. LS x is the slope length-gradient factor for grid point x. The InVEST model can automatically generate the result map according to the filled DEM. C x and P x are the crop management factor and the governance measure factor for grid point x, respectively. The two factors for each land type were provided by Li et al. [59]. Food Production (t/km 2 ) Provisioning service of ecosystem service function Carbon Sequestration and Oxygen Production (t/km 2 ) Regulating service of ecosystem service function Water Conservation (mm) Provisioning service of ecosystem service function Soil Conservation (t/km 2 ) Supporting service of ecosystem service function

The Spatial and Temporal Patterns and Dynamic Characteristics of Urbanization Levels
The spatial and temporal patterns and dynamic characteristics of urbanization indicators of the BTH region in 2000 and 2014 are shown in Figure 3. The spatial patterns of POP_D (population density), GDP_D (gross domestic product density), and CLP (proportion of construction land) were generally consistent from 2000 to 2014. The higher levels of urbanization were mainly concentrated in two mega-cities, #2 (Beijing) and #7 (Tianjin) (Figure 3a 11%, 808.97%, and 166.67%, respectively. The growth of GDP density was much higher than the others ( Table 2). As for dynamic characteristics of urbanization levels on the spatial and temporal scales, the population density increased in the wide range of the area (greater growth was mainly distributed in areas #1, 2, 7, and 11) and declined in a small part of the area. Similarly, the GDP density increased in the majority of the areas (greater growth was located in the area of #5) and declined in a few areas. The change in the proportion of construction land differed in that no changes occurred in most areas, and the increase was mainly distributed in areas #2, 4, 6, 7, 9, 11, and 12 ( Figure 3c,f,i).

The Effect of Urbanization on Ecosystem Services in the BTH Urban Agglomeration
The feature importance scores from the RF regression model are shown in Figure 4. We measured the impacts of urbanization on ecosystem services in the BTH region in 2000-2014. For food production (ecosystem service indicator #1), the population density had the greatest impact followed by GDP density, and the proportion of construction land had the least impact (Figure 4a). The impact of urbanization on carbon sequestration and oxygen production (ecosystem service indicator #2) and soil conservation (ecosystem service indicator #4) were basically consistent (Figure 4b,d). In contrast, in 2000-2010, the GDP density had the greatest impact on water conservation (ecosystem service indicator #3) followed by population density, and the proportion of construction land had the least impact (Figure 4c

The Comparisons of Simulations between the RF Regression Model and an Empirical Model
We used both empirical models and the RF regression model to estimate the values of the ecosystem services. The results of comparisons between the RF regression model and the empirical models using the testing set for 2000 were plotted with standard deviation bars ( Figure 5). The R 2 values for the four kinds of ecosystem services ranged from 0.714 to 0.986 (p < 0.05 for each). The highest R 2 values were found for carbon sequestration and oxygen production, followed by food production and water conservation. The R 2 values for those services were all greater than 0.9 ( Figure  5a-c). The lowest R 2 was found for soil conservation (R 2 = 0.714) (Figure 5d).

The Comparisons of Simulations between the RF Regression Model and an Empirical Model
We used both empirical models and the RF regression model to estimate the values of the ecosystem services (ecosystem service values estimated by empirical models could see Figure S1). The results of comparisons between the RF regression model and the empirical models using the testing set for 2000 were plotted with standard deviation bars ( Figure 5). The R 2 values for the four kinds of ecosystem services ranged from 0.714 to 0.986 (p < 0.05 for each). The highest R 2 values were found for carbon sequestration and oxygen production, followed by food production and water conservation. The R 2 values for those services were all greater than 0.9 (Figure 5a-c). The lowest R 2 was found for soil conservation (R 2 = 0.714) (Figure 5d). ecosystem services. The results of comparisons between the RF regression model and the empirical models using the testing set for 2000 were plotted with standard deviation bars ( Figure 5). The R 2 values for the four kinds of ecosystem services ranged from 0.714 to 0.986 (p < 0.05 for each). The highest R 2 values were found for carbon sequestration and oxygen production, followed by food production and water conservation. The R 2 values for those services were all greater than 0.9 ( Figure  5a-c). The lowest R 2 was found for soil conservation (R 2 = 0.714) (Figure 5d).  (5). The black dots represent the averages of simulations of the model, and the sizes of the dots represent the amount of grid data. Food production from the empirical model in (a) does not have a standard deviation, as the data for the average yield of the selected grain, oil crops, and vegetables per unit area were gathered according to administrative region.

The Spatio-Temporal Dynamic Characteristics of Urbanization
The average values of population density, GDP density, and the proportion of construction land in the BTH region from 2000 to 2014 are shown in Table 2. The increase rates for population density, GDP density, and construction land proportion were 29.11%, 808.97%, and 166.67%, respectively. The population density and GDP density increased in a wide range of areas and declined in a small part of the region in 2000-2014 (Figure 3c,f). The situation for construction land proportion was different. In the majority of areas, this was unchanged, the increase has been mainly distributed in small areas (Figure 3i). These results demonstrated that the urbanization of the BTH has surpassed the initial

The Spatio-Temporal Dynamic Characteristics of Urbanization
The average values of population density, GDP density, and the proportion of construction land in the BTH region from 2000 to 2014 are shown in Table 2. The increase rates for population density, GDP density, and construction land proportion were 29.11%, 808.97%, and 166.67%, respectively. The population density and GDP density increased in a wide range of areas and declined in a small part of the region in 2000-2014 (Figure 3c,f). The situation for construction land proportion was different. In the majority of areas, this was unchanged, the increase has been mainly distributed in small areas (Figure 3i). These results demonstrated that the urbanization of the BTH has surpassed the initial stage represented by population concentration and land expansion and has entered the middle stage represented by population concentration and economic growth [61].
The economy emerged as an explosive growth accompanied by the expansion of urban areas. Our study has highlighted two key factors causing this phenomenon: (1) National policy, (2) China's admission to the World Trade Organization (WTO) in 2001 under the background of economic globalization [62]. This has a profound impact on China's urbanization. At the country level, the rate of urbanization rose from 17.9% in 1978 to 54.8% in 2014 [63]. The transformations of urbanization are described as both high-speed and great using conventional indicators of growth (population, GDP, urban area) [64]. However, the population of the BTH has shown a relatively small growth trend, as China implemented a one-child family policy to relieve the pressure of a massive population increase since 1982 [65]. Facing these changes, some debates have emerged to assess the forces driving the urbanization of China. Several important forces have been deeply discussed [66], e.g., economic globalization [67], national policy [68], inter-regional competition [69], and spatial administrative reorganization [70]. Currently, the BTH urban agglomeration is still growing. Geographically predicting the policy influences on urban development remains a big challenge.

The Quantification of the Effect of Urbanization on Ecosystem Services in the BTH Urban Agglomeration Using the RF Regression Model
Differing from traditional methods, our study confirmed that RF regression models could directly reveal the effect of urbanization on ecosystem services based on the importance scores of three urbanization indicators. Based on our results, the importance scores for population density and GDP density were much higher than for construction land proportion regarding the four ecosystem services (Figure 4). This suggested that population density and GDP density were the key factors affecting ecosystem services, and the proportion of construction land had a lesser impact. In general, the expansion of the urban population and rapid economic development were the dominant factors affecting ecosystem services. In particular, the accumulation of industry and population growth had greatly increased the consumption of land resources, water resources, and other energy sources. These use patterns had caused significant levels of pollution that negatively affected the ecosystem [71]. We found changing trends in the effect of urbanization on ecosystem services: the effect of population density increased, while the effect of construction land proportion was consistent with population density. In contrast, the effect of GDP density decreased (Figure 4). These results indicated that controlling the population and GDP levels is important for the sustainable development of cities in large urban areas.

The Comparisons of Simulations between RF Regression and Empirical Models
The results of comparisons between the RF regression model and empirical models for assessing ecosystem services using the testing set in 2000 are shown in Figure 5. The R 2 values for four kinds of ecosystem service indicators were all more than 0.9 (Figure 5a-c). In terms of the simulation outputs, both RF regression and empirical models were able to effectively simulate ecosystem service indicators. There was a slight difference between the RF regression model and empirical models. The R 2 for soil conservation was only 0.714. The simulated result of the RF regression model was lower than for the empirical model in the high value for soil conservation (Figure 5d). This implied that the soil conservation was differently estimated by the RF regression and empirical models. This might result in a difficulty to assess the effect of the urbanization on soil conservation. Nevertheless, it may not be a significant issue in the BTH region, as this area primarily contains fertile farmland in the second largest flatland of China. There might not be a significant correlation between urbanization and soil conservation [72].

Uncertainties
In mapping urbanization levels using remote sensing (RS) data, there were differences in the urbanization levels under different sources of data. We used the LUCC data to calculate the proportion of construction land, and the data of LUCC were provided by the European Space Agency (http://neo.ssa.esa.int/advanced-search). It is possible that the spatio-temporal patterns of construction land proportion would not be completely consistent if the LUCC data were obtained from other RS data. To compare the results employing different methods, four empirical models were chosen for testing the performance. Although both RF regression and empirical models demonstrated consistently reasonable estimates in most of the simulation outputs, there was still some discrepancy in soil conservation.
Currently, it is still a challenge to obtain observed ecosystem service values. Future improvement in the observation of ecosystems services and model integration with these observations could certainly reduce the uncertainties in our study.

Conclusions
The rapid economic development at national scale involves many aspects, in which some major sections may have negative effects on ecosystem services. Especially, the population growth, construction land expansion, and traffic management present serious challenges to the sustainable development of the urban agglomeration. As the largest urbanization in China, BTH urban agglomeration has the most representative features. Through exploring the spatial and temporal dynamic characteristics of urbanization levels from 2000 to 2014 in the BTH using spatial data analysis, we found that the urbanization of the BTH was at a stage represented by population concentration and economic growth. The results suggested that population density and GDP density were the two dominant factors affecting ecosystem services, and that construction land proportion had a lesser impact on ecosystem services. Our study using the RF regression model demonstrated the changing trends of the effect of urbanization on ecosystem services. The effect of population density on ecosystem services increased, and the effect of the proportion of construction land on ecosystem services was consistent with population density. In contrast, the effect of GDP density on ecosystem services decreased. This study paves the way for quantitative evaluation of the effect of urbanization on ecosystem services at a regional scale.