Increasing the Accuracy of Mapping Urban Forest Carbon Density by Combining Spatial Modeling and Spectral Unmixing Analysis

Accurately mapping urban vegetation carbon density is challenging because of complex landscapes and mixed pixels. In this study, a novel methodology was proposed that combines a linear spectral unmixing analysis (LSUA) with a linear stepwise regression (LSR), a logistic model-based stepwise regression (LMSR) and k-Nearest Neighbors (kNN), to map the forest carbon density of Shenzhen City of China, using Landsat 8 imagery and sample plot data collected in 2014. The independent variables that contributed to statistically significantly improving the fit of a model to data and reducing the sum of squared errors were first selected from a total of 284 spectral variables derived from the image bands. The vegetation fraction from LSUA was then added as an independent variable. The results obtained using cross-validation showed that: (1) Compared to the methods without the vegetation information, adding the vegetation fraction increased the accuracy of mapping carbon density by 1%–9.3%; (2) As the observed values increased, the LSR and kNN OPEN ACCESS Remote Sens. 2015, 7 15115 residuals showed overestimates and underestimates for the smaller and larger observations, respectively, while LMSR improved the systematical over and underestimations; (3) LSR resulted in illogically negative and unreasonably large estimates, while KNN produced the greatest values of root mean square error (RMSE). The results indicate that combining the spatial modeling method LMSR and the spectral unmixing analysis LUSA, coupled with Landsat imagery, is most promising for increasing the accuracy of urban forest carbon density maps. In addition, this method has considerable potential for accurate, rapid and nondestructive prediction of urban and peri-urban forest carbon stocks with an acceptable level of error and low cost.


Introduction
During the last 35 years, rapid urbanization of Southern China has led to the development of a cluster of cities, including Shenzhen, Guangzhou, Foshan, and Dongguan.These cities have grown so rapidly that they have merged into an interconnected megalopolis with a population of 42 million.Rapid urban sprawl and increasing human activities have resulted in deforestation and loss of vegetated areas and agricultural lands, increased ratio of anthropogenic to natural carbon stocks, increased carbon emission and aggravated air pollution in this region, and thus contributed to carbon concentration in the atmosphere and to global warming [1].To mitigate these effects, an effective strategy is to protect forests and re-vegetate the cities, which can thus potentially enhance vegetation carbon sequestration in the region.However, urban vegetation carbon sequestration and its dynamics in this region are still unknown mainly because of the complexity of urban landscapes in which buildings, roads, parking lots, water bodies, forests, trees along city roads, shrubs, grasslands and other greened areas are mixed together.The large number of mixed pixels impedes accurate estimation of urban forest carbon.Currently, most of the methods used to map vegetation carbon are related to general or non-urban forests and are not appropriate for estimating urban vegetation carbon [1,2].There are also no reports of techniques for introducing the vegetation fraction extracted from the mixed pixels of urban landscapes into spatial interpolation methods to improve mapping urban vegetation carbon.Therefore, there is a strong need to develop a novel method for urban vegetation carbon modeling and dynamic monitoring.

Forest Carbon Modeling
Forest carbon sequestration plays an important role in controlling global climate change [3][4][5][6][7].During the last 20 years, substantial research on the estimation of forest carbon stock per unit or density has been conducted, and various methods have been developed [8,9].These methods can be divided into multiple categories: use of biomass or net primary productivity observations from field sample plots [10]; use of biomass and carbon conversion factors or coefficients based on the data from forest inventory [3,11]; eddy correlation and covariance based methods [12][13][14]; growth and especially process model-based methods [15] such as BIOME-BGC [16][17][18]; and remote sensing-based methods [9,[19][20][21][22][23][24].Compared to other methods, remote sensing-based methods have the following advantages: (1) remotely sensed imagery provides coverage of large areas, reveals the spatial variability of vegetation and facilitates generation of the spatial distribution and pattern of vegetation carbon density for large vegetated areas and complex urban vegetation landscapes; (2) various sensors (optical, Radar, Lidar) and spatial resolution images can be selected to tailor vegetation carbon modeling at various scales; and (3) multi-temporal imagery makes it possible to monitor the dynamics of vegetation carbon density.Thus, remote sensing-based methods are a good alternative for estimating urban vegetation carbon stocks and monitoring their dynamics.The success of the methods will depend on selection of the remotely sensed data, selection of the spectral variables that account for their relationships with vegetation carbon density, selection of sample plots and collection of field observations, and modeling the relationship between vegetation carbon data from sample plots and the spectral variables.
Substantial research has been conducted on estimation of forest carbon density estimation using a combination of sample plot and remotely sensed data.These studies used various remotely sensed data including Moderate Resolution Imaging Spectroradiometer (MODIS), Landsat Thematic Mapper™ and SPOT [8].Landsat TM images are the most widely used because they are free, downloadable, and have a long time history, large coverage, and medium spatial and temporal resolutions [8,9,20,25,26].Moreover, the 30 m × 30 m spatial resolution of Landsat TM multi-spectral images often approximates the size of sample plots used for field data collection by the national forest inventories of many countries.For instance, the size of sample plots used by China's national forest inventory is 25.82 m × 25.82 m.This size relationship potentially reduces the uncertainty due to geospatial mismatches between pixels and sample plots.The major disadvantage of Landsat TM imagery is the lack of sensitivity of brightness values to biomass of very dense and multi-layer canopy forests in tropical rain forests, which leads to underestimation of biomass and carbon [8,20].However, this saturation problem usually does not occur in the estimation of urban forest carbon.SPOT imagery has characteristics similar to those of Landsat TM, but its high cost limits its application.MODIS imagery has been widely used for global, national and regional vegetation carbon modeling, but is generally not applicable for estimating urban forest carbon density due to its coarse spatial resolution.An exception is the study conducted by Villa et al. [27] who found that remotely sensed images with spatial resolutions of 500 m × 500 m to 1000 m × 1000 m could be used to effectively estimate carbon stocks for peri-urban areas.This exception can be attributed to the low intensity of urbanization in the peri-urban areas.The fine spatial resolution images such as those from Ikonos and Worldview-3 have the potential to extract vegetation fraction from mixed pixels and, thus, are promising for mapping urban forest carbon density.However, the high cost required is a challenge associated with fine spatial resolution imagery.
Radar imagery overcomes the shortcoming of optical imagery due to clouds [28][29][30][31][32].However, the images are not very sensitive to the structures of dense and multi-layer canopy forests.Moreover, forest carbon estimation studies during the last decade indicate that both airborne LiDAR and Terrestrial Laser Scanning (TLS) data provide greater potential than other remotely sensed images because of their capacity to estimate tree heights and crown widths [33][34][35].In addition, the integration of LiDAR data with optical images can potentially increase the accuracy of forest carbon estimation [36][37][38][39][40][41][42][43].Like fine spatial resolution optical images, however, the high cost of both Radar and LiDAR data limit their applications.
With remote sensing-based methods, various spatial interpolation techniques, including linear and non-linear regressions, artificial neural networks (ANN), k-Nearest Neighbors (kNN), and geostatistical methods such as cokriging and spatial co-simulation, are used to model the relationships between forest biomass/carbon and spectral variables from images [8,10,44].The limitations of linear regressions are the linearity and normality assumptions for the data.In addition, linear regressions often lead to not only negative predictions of forest biomass/carbon, but also extremely large and meaningless predictions [45,46].ANN and kNN do not require the above assumptions and are more flexible for mapping vegetation biomass/carbon [25,47].However, ANN is still considered a "black box" approach due to unknown weights and difficulty in terminating the learning process [48].With kNN, the estimate of forest carbon for an unknown location can be calculated as an inverse, distance-weighted mean of observations from k sample plots that are nearest in a multi-dimension spectral space [49].This method has been widely used with the national forest inventory plot and remotely sensed data [49][50][51][52][53][54].In addition, Wang et al. [9,22] developed an image-based, spatial co-simulation algorithm based on spatial autocorrelation of variables to map forest carbon density.Fleming et al. [45] further compared this algorithm with regressions for mapping forest carbon density and found that the co-simulation was more promising than the regressions.However, the complexity of this co-simulation algorithm impedes its applications.

Urban Forest Carbon Modeling
An urban vegetation landscape is much more complex than general terrestrial vegetation ecosystems.Thus, accurately mapping urban forest carbon density is more difficult because mixed pixels and building-induced shadows often impede extraction of vegetation information.Moreover, urban forests often have less capacity for carbon sequestration compared to other forests [55].Many authors have studied the estimation of urban forest carbon [56][57][58][59][60][61][62][63] using methods that are similar to those mentioned above for non-urban forest carbon modeling.For example, Kordowski and Kuttler [64] used eddy correlation and covariance methods to estimate the carbon flux of Essen in Germany from 2006 to 2007.Churkina et al. [1] applied a simple average per unit for homogeneous polygons to estimate urban carbon stocks of the continental USA.Zhang et al. [65] modeled the dynamics of vegetation carbon from 1945 to 2007 for the cities in the southern USA using a process-based model.Strohbach et al. [66] proposed a new method in which the sequestration of urban forests due to tree growth was simulated and compared to the corresponding carbon emission.Nowak et al. [67] estimated the carbon density for a total of 28 cities across six states in the USA using ground sample plot data.
In addition, remote sensing-based methods have been widely used to estimate and monitor the dynamics of urban forest carbon density [68].McPherson et al. [69] developed spatial distributions of forest carbon density by integrating forest growth models, ground sample plot and remotely sensed data for Los Angeles and Sacramento.Zheng et al. [70] combined forest growth models, a land use and land cover (LULC) classification map and a tree canopy cover map from satellite images to estimate urban forest carbon density for county-level cities in northern New England, USA.Strohbach and Haase [66] combined a tree canopy cover map based on hyperspectral color infrared imagery and ground observations to estimate the urban forest carbon stock of Leipzig.Hutyra et al. [71] estimated the decrease in forested areas and the loss of forest carbon stocks due to urban sprawl for Seattle from 1986 to 2007 by combining a time series of remotely sensed images and ground sample plot data.Shrestha and Wynne [72] estimated the height and crown width of each tree using airborne LiDAR data and further estimated tree biomass for central Oklahoma, USA.Moskal and Zheng [73] explored methods used to estimate tree heights, diameters, and basal areas for an urban landscape using TLS.Davies et al. [74] and Pataki et al. [75] reviewed methods for urban vegetation carbon estimation and indicated that the complexity of urban landscapes and great variation in vegetation carbon density may lead to large estimation uncertainties.
In China, urban forest carbon modeling research has mainly been conducted for large cities such as Beijing [76], Shanghai [77], and Guangzhou [78][79][80][81].For example, He et al. [76] estimated Beijing's forest biomass by combining TLS data and SPOT imagery.Ying et al. [82] estimated the spatial distribution of Harbin's forest carbon density by combining data from field sample plots and a remote sensing-based LULC classification.After reviewing the widely used methods for modeling urban vegetation carbon, Zhou et al. [81] suggested that the integration of sample plot data, models and remote sensing technologies would have considerable potential increasing the accuracy of urban vegetation carbon modeling by decomposing the mixed pixels of complex urban vegetation landscapes and introducing vegetation information from spectral unmixing analyses into the models.

Spectral Mixture Analysis
Mixed pixels in complex urban vegetation landscapes can greatly reduce the estimation accuracy of vegetation carbon density.The key for development of methods for mapping urban vegetation carbon density is a method for decomposing mixed pixels and using the derived vegetation information.However, accurately extracting the vegetation information of mixed pixels in images is often a great challenge [83,84].As the spatial resolutions of images become coarser, the mixed pixels become more problematic.Various methods have been developed for decomposition of land cover components or endmembers in mixed pixels.These methods are divided into linear and nonlinear spectral unmixing algorithms [85][86][87].Lu et al. [88] used a linear spectral unmixing method and obtained a classification accuracy of 78.2% for Brazilian Amazon vegetation types based on Landsat TM images.Chen et al. [89] developed a method based on linear spectral unmixing to estimate leaf area index.Dennison and Roberts [83] discussed the selection of endmembers.Hu and Weng [90] provided a summary of spectral unmixing analysis methods and then compared the fraction image of impervious surface from ANN with that from a linear spectral unmixing analysis using a Hyperion image.Although there have been many reports, general rules on how to select endmembers and spectral unmixing methods are still lacking.In addition, there have been no reported studies on how to integrate spectral unmixing analyses and spatial interpolation methods to map urban vegetation carbon density.
This study aimed at developing an improved method for increasing the accuracy of mapping forest carbon density for complex urban landscapes by combining a linear spectral unmixing analysis (LSUA), Landsat 8 imagery, sample plot data and three spatial interpolation methods: a linear stepwise regression (LSR), a logistic model-based regression (LMSR) and kNN.The combinations were compared by introducing the vegetation fraction from LSUA into the spatial interpolation methods.Methods that use the vegetation fraction are expected to increase estimation accuracy for urban forest carbon density relative to methods that do not use the vegetation fraction.

Study Area and Datasets
This study was conducted in Shenzhen city, one of the four rapidly sprawling cities in Guangdong province of Southern China (Figure 1).During the past 30 years, Shenzhen has rapidly spread from a small village to become a big city with a population of 11 million.The rapid-city expansion and increasing human activities have converted large amounts of the vegetated areas and agricultural lands into built up areas and, thus, led to the great changes in LULC types characterized by deforestation.Currently, Shenzhen has an area of 81.4 km from east to west and 45 km from north to south.It is characterized by a sub-tropical monsoon climate with the annual average temperature and precipitation of 22.5 °C and 1924.3 mm.The vegetated area of the city consists of forested lands, shrub lands, grasslands and greened roofs (planted grass and flowers on roofs).Forested lands have an area of 79,700 ha with about 4,880,000 trees planted along roads and streets.Deciduous forests, deciduous and coniferous mixed forests, and shrub lands dominate the forested lands.Deciduous forests mainly consist of Eucalyptus urophylla S.T.Blake, Schefflera actinophylla (Endl.)Harms, Ficus benjamina Linn., Acacia confusa Merr., Evodia glabrifolia (Champ.ex Benth.)C.C. Huang, Elaeocarpus chinensis (Gardn.et Chanp.)Hook.f. ex Benth., Cratoxylum cochinchinense (Lour.)Bl., Schima superba Gardn.et Champ., Cinnamomum camphora (Linn.)Presl, etc. Pinus massoniana is the only coniferous tree species.In addition, in recent years, Shenzhen has been re-vegetated by planting trees along its roads and streets, expanding grass lands around the buildings and developing greened roofs.However, the city's vegetation and soil carbon stocks, the ratio of anthropogenic over natural carbon stocks and their dynamics are unknown.This study was limited to mapping the urban forest carbon density.
A Landsat 8 image dated 15 October 2014 was acquired.The image consists of eight bands, including coastal aerosol, blue, green, red, near infrared, two shortwave infrared, and cirrus band at a spatial resolution of 30 m × 30 m (Figure 1).The reasons for using the Landsat 8 image included free downloadable data and its moderate spatial resolution of 30 m × 30 m that approximated the size of sample plots used in this study.Moreover, although using fine spatial resolution images such as those from Worldview-2 can enhance the decomposition of mixed pixels in this urban landscape, the extremely high cost of acquiring the images limited their applications in this study.This is also true for mapping forest carbon density for large cities in developing countries.In addition, Landsat 8 is a recent satellite and use of a Landsat image in this study provided an opportunity to evaluate its utility for urban forest carbon modeling.Radiometric correction of the image was conducted using ATCOR2013, and pixel values were converted to reflectance values at the surface of the Earth.The image was also geo-referenced to the Universal Transverse Mercator (UTM) projection and coordinate system with root mean square error (RMSE) of less than one pixel.
The Landsat image was used to classify the entire city landscape into LULC types, including forests, grasslands, built-up areas, bare lands and water bodies.Based on the classification, a random sampling design was conducted to select sample plots for each LULC type.The number of sample plots for each LULC type was proportional to its area and the plots were then randomly located within the area of each type.The plot size was 25.82 m × 25.82 m, which was similar to the spatial resolution of the Landsat 8 image and national forest inventory sample plots.Most of the field work was carried out from 1 August to 20 November 2014, and a total of 161 plots were measured.Because about 23 sample plots located in the eastern part of the city were not measured, the eastern part of the city was not involved in this study.
The sample plots were located using a global positioning system (GPS) receiver that had a positional error of about 5 m.Tree and stand variables were measured based on national forest inventory protocols [91].For each plot, observations of tree diameters at breast height (1.3 m above ground) (DBH) and height (H) were first collected, and tree volume was predicted using allometric models based on DBH and H by tree species for Guangdong province [91].The uncertainties associated with the tree volume model predictions were assumed to be negligible.The volume values were then converted into biomass and carbon stock pools using biomass and carbon conversion coefficients by tree species [92].The resulting forest carbon stocks were further divided by the plot area, leading to plot forest carbon density (Mg/ha).

Methods
In addition to the eight original image bands, various image transformations were considered including their inversions, ratios between two or three bands, principal component analysis, grey-level co-occurrence matrix-based texture measures (mean, angular second moment, contrast, correlation, dissimilarity, entropy, homogeneity and variance), vegetation indices, and difference vegetation indices.A total of 284 spectral variables were derived (Table 1).Pearson product moment correlation coefficients between plot forest carbon density and the spectral variables were calculated.The statistical significance of the coefficients (whether they significantly differed from zero) was tested using 2 2 ( 2 ) r t n t (n is number of the plots) based on the student's distribution at the significance level of 0.05.Three spatial interpolation methods, LSR, LMSR [45,46,93] and kNN [47,[49][50][51][52][53][54], were used to model the relationship between plot forest carbon density and the spectral variables.LSR is a widely used method for remote sensing-based spatial interpolation of forest biomass and carbon.In this study, it was used to help select the spectral variables that contributed to statistically significantly improving the fit of a model to data and reducing the sum of squared errors.It was also used to test for nonlinearity in the relationship between urban forest carbon density and the spectral variables derived from the Landsat 8 image.The spectral variables selected as significant independent variables were then used in the backward-based linear stepwise regression.
Yan et al. [94] found that compared to LSR, LMSR led to more accurate estimates of carbon density for forests that were dominated by plantations.In this study, LMSR was selected for comparison with LSR and also for testing for nonlinearity in the relationship between forest carbon density for the complex urban landscape and the spectral variables derived from Landsat 8 image.For LMSR, the plot forest carbon density values were standardized before the regression so that they had a range of 0-1, and LMSR was then conducted to determine the significant spectral variables.During the stepwise regression for both LSR and LMSR, a variance inflation factor (VIF) that assesses the degree to which the variance of a regression coefficient estimate is inflated by the correlation among the independent variables selected in the model was used to analyze the multicollinearity between each pair of the selected independent variables and to eliminate spectral variables with VIF values greater than 10 [95].In this study, the correlation coefficients among the independent variables varied from 0.03 to 0.99, which indicated high correlations among at least some pairs of variables.This led to large variances and instability for the regression coefficient estimates and VIF values, and thus reduced the effectiveness of stepwise techniques.In this study, testing of VIF showed that a VIF value exceeding 10 was a good indicator of serious multicollinearity [95].The regressions were carried out using the R statistical software package.
The significant spectral variables from both LSR and LMSR were also used with kNN to measure the spectral distances of an unknown location from all the sample plots based on Euclidean distance and a neighbor weighting scheme.Results were compared for k = 3, 5, 7 and 10 nearest neighbors.kNN-based estimation was carried out using a computer program developed by the authors in IDL.Because kNN directly measures spectral distances and selects the given number of the nearest neighbors in a spectral space for each location whose value is unknown, it is a more intuitive method than LSR and LMSR and has been widely used in the Nordic countries to map forest types and estimate tree volumes and biomass/carbon [47,[49][50][51][52][53][54].In this study, we tested if kNN was appropriate for mapping urban forest carbon density.
LSUA, with the constraint that the sum of fractions for all endmembers must equal one, was conducted for the study area using the fully constrained least square linear spectral unmixing tool.Four endmembers were determined: vegetation, urbanized area, water and bare soil.First, the minimum noise fraction (MNF) transformation of Landsat 8 image bands was conducted [86][87][88], and the endmembers were then selected based on the scatter plots of MNF.
LSUA was combined with the aforementioned LSR, LMSR and kNN by introducing the vegetation fraction into the spatial interpolation methods.The significant spectral variables selected above together with the vegetation fraction were used as the independent variables in the models.The accuracies of the predicted forest carbon density values from the combinations were compared with those from the models that did not include the vegetation fraction.The combinations of the spatial interpolation methods LSR, LMSR and kNN with LSUA and the comparisons with the methods in which the vegetation fraction is not involved are expected to lead to improved estimation.Theoretically, this approach is appropriate for mapping the forest carbon density of complex urban vegetation landscapes having large numbers of mixed pixels, but no reports are known to have been published.
Accuracy assessments of the resulting forest carbon density estimates and maps were conducted using cross-validation.With this method, a sample plot was first randomly selected and removed from the dataset, the remaining plots were used to develop the models, the estimate of forest carbon density for the removed plot was calculated, and the deviation between the predicted and observed values was computed.The removed plot was then replaced and another plot was randomly selected and removed from the dataset.The corresponding modeling and estimation were conducted for this plot.The process was repeated until all the sample plots yielded their estimates.Using the observations and predictions for the sample plots, the coefficient of determination R 2 and RMSE were then computed.In addition, this study led to maps that served mainly as intermediate products that supported increasing the accuracy of estimates.However, the maps can be also used as stand-alone products that depict the spatial distribution of Shenzhen urban forest carbon density.Thus, the evaluation with respect to the final applications of the maps was conducted by calculating the mean density of estimated forest carbon and its variance for each of the maps using modelassisted regression estimators [51].

Results
A statistical summary of the sample plot data used for mapping urban forest carbon density of Shenzhen is shown in Table 2.The plot-level sample mean and standard deviation were estimated using simple random sampling estimators.The resulting urban forest carbon density estimate was characterized by a large coefficient of variation with a confidence interval of 14.92 Mg/ha-22.15Mg/ha determined at the significance level of 0.05.The Pearson product moment correlation coefficients between plot forest carbon density and the 284 spectral variables were calculated.Among them, 241 spectral variables had absolute values of the coefficients varying from 0.156 to 0.655 and significantly differed from zero at the significance level of 0.05.The coefficients of the 60 most correlated spectral variables ranged from 0.557 to 0.655.Atmospherically resistant vegetation index (ARVI) had the greatest correlation with forest carbon density.The widely used vegetation indices, including normalized difference vegetation index (NDVI), modified NDVI (MNDVI), reduced simple ratio (RSR), soil adjusted vegetation indices (SAVIi , i = 0.1, 0.25, 0.3 and 0.5), enhanced vegetation index (EVI), had correlation coefficients of 0.586-0.647and were included among the 60 most correlated spectral variables.All other spectral variables came from simple two-and three-band ratios (SRi,j and RSi,j,k).Compared to the band ratios and vegetation indices, the complicated image transformations PCA and grey-level co-occurrence matrix-based texture measures resulted in smaller coefficients of correlation.
LSUA for the study area was conducted using the spectral variables derived from Landsat 8 image bands (Figure 2).Vegetation was mainly distributed in the southeast, southwest central, northeast and northwest parts of Shenzhen city, in which urban fraction had smaller estimates (Figure 2a,b).Bare soil was almost evenly scattered over the city except for the west and south parts dominated by urbanized area (Figure 2d).LSUA accurately detected the water bodies for which the values of vegetation, urban and bare soil fractions were close to zero (Figure 2c).The resulting vegetation fraction had a correlation coefficient of 0.600 with the plot urban forest carbon density.The linear model LSR led to four spectral band ratios that made significant contributions to the estimation of Shenzhen urban forest carbon density (Equation ( 1 3), almost the same as the sample mean of 18.54 Mg/ha.However, the negative carbon density estimates were obviously illogical and there were also some unreasonable estimates that were much larger than the maximum observation of 100.65 Mg/ha.Based on cross-validation, the coefficient of determination and RMSE between the estimated and observed values for the sample plots were 0.482 and 16.67 Mg/ha.After the vegetation fraction from LSUA was introduced into LSR as an independent variable, the integration (Equation ( 2)) of LSR with LSUA resulted in urban forest carbon density estimates that ranged from −55 Mg/ha-225 Mg/ha with an average estimate of 18.76 Mg/ha for the sample plots, not significantly different from the sample mean (Table 3).Moreover, the integration increased the coefficient of determination to 0.514 and reduced the RMSE to 16.13 Mg/ha.Compared to the results for LSR without the vegetation fraction, the correlation increase and RMSE decrease were 6.6% and 3.3%, respectively.The estimates of the mean obtained from the carbon density maps based on both LSR and its integration with LSUA were slightly smaller than the sample mean, but were included in the confidence interval at the 0.05 significance level (Table 3).Model (2) had a statistical F-value of 32.90 with a p-value of zero.
Table 3.The accuracy assessments of Shenzhen urban forest carbon density estimates based on cross-validation and the evaluation of map applications by estimating the mean density for each forest carbon density map and its variance using model-assisted regression estimators from the linear stepwise regression (LSR) and the logistic model-based stepwise regression (LMSR) and their integration with linear spectral unmixing analysis (LSUA), that is, LSR&LSUA, LMSR&LSUA (Note: The mean is the average estimate of the observed locations, R 2 and RMSE are the coefficient of determination and root mean square error between the estimated and observed values of the sample plots,  The non-linear model LMSR also selected four spectral variables that made significant contributions to improving the fit of a model to the data and reducing the sum of squared errors for estimating Shenzhen urban forest carbon density (Equation ( 3)), including ARVI, 25 SR , 32 SR , and 34 SR .Most of the variables differed from those selected by LSR.However, three simple band ratios were still selected.
Both LMSR and its integration with LSUA resulted in a reasonable range of estimates from 0 Mg/ha-100.65Mg/ha.For LMSR without the vegetation fraction, the resulting coefficient of determination was 0.537, greater than those from LSR with and without the vegetation fraction (Table 3).
The RMSE was 17.72 Mg/ha and increased by 6.3% and 9.9%, respectively, compared to those from LSR with and without the vegetation fraction.For LMSR with the vegetation fraction (Equation ( 4)), the resulting coefficient of determination and RMSE were 0.578 and 17.08 Mg/ha, and increased by 7.6% and decreased by 3.6%, respectively, compared to those from LMSR without the vegetation fraction.Compared to those from the integration of LSR with LSUA, the combination of LMSR with LSUA increased both the coefficient of determination and RMSE by 12.5% and 5.9%, respectively.For both LMSR and its integration with LSUA, the average estimates of the forest carbon density for both the sample plots and the resulting maps were smaller than the sample mean 18.53 Mg/ha, but were in the confidence interval (Table 3). ) Shenzhen urban forest carbon density was also mapped using kNN with four different numbers of nearest neighbors, k = 3, 5, 7 and 10.When the significant variables from LSR were used to calculate the spectral distance between each of the sample plots and an unobserved location for kNN, the results from cross-validation are listed in Table 4.The individual estimates varied from 4.00 Mg/ha to 100.60 Mg/ha.For kNN with and without the vegetation fraction from LSUA, the average estimates of the forest carbon density for the sample plots and the resulting maps ranged from 18.10 Mg/ha to 19.03 Mg/ha and all were in the confidence interval at the 0.05 significance level (Table 4).For the kNN method without the vegetation fraction, as the number of the nearest neighbors increased, the coefficients of determination increased from 0.298 to 0.388 and the RMSE decreased from 20.51 Mg/ha to 18.37 Mg/ha.Compared to the results for kNN without the vegetation fraction, adding the vegetation fraction from LSUA into the set of the significant spectral variables led to increases in the estimation accuracy.The increase for the coefficient of determination varied from 1.7% to 9.3% and the decrease for the RMSE ranged from 1.0% to 3.3%, depending on the numbers of nearest neighbors used.Using k = 10 nearest neighbors resulted in the largest estimation accuracy for kNN, regardless of whether the vegetation fraction was added.However, compared to the results from LSR and LMSR with and without the vegetation fraction, kNN produced larger values of RMSE (Tables 3 and 4).
When the significant spectral variables from LMSR were used for kNN, both with and without the vegetation fraction from LSUA, all the average estimates of forest carbon density for the sample plots and the resulting maps were in the confidence interval at the 0.05 significance level (Table 4).The accuracies of the estimates based on cross-validation were slightly greater than those for kNN using the significant variables from LSR.However, they were smaller than those using LMSR, regardless of whether the vegetation fraction was or was not used.With increasing numbers of nearest neighbors, the coefficient of determination also increased and RMSE decreased, indicating that 10 nearest neighbors led to the greatest accuracy.Introducing the vegetation fraction into kNN increased the coefficients of determination by 5.4%-6.6%and decreased the RMSE by 2.1%-3.1%.
In Figure 3, the residuals for the predicted values for urban forest carbon density were graphed against the observed values for LSR, LMSR and KNN with and without the vegetation fraction from LSUA.For kNN, only the results with the significant variables from LMSR and k = 10 were presented.Obviously, the residuals showed a trend of linear decrease as the observed values increased for both LSR and kNN, which indicates that the smaller forest carbon density values were overestimated, while the greater ones were underestimated (Figure 3).The trend was improved for LMSR with and without the vegetation fraction.Moreover, all the methods led to similar maps of Shenzhen urban forest carbon density (Figure 4).The carbon density showed greater estimates in the southeast, west central, north and south central parts of the city than the other areas.The spatial distributions were similar to the spatial patterns of the vegetated areas demonstrated by the false color composite image of Landsat 8 in Figure 1.At the eastern border of the study area, very large forest carbon density estimates occurred in a linear pattern.This was mainly caused by noise due to the clouds existing in this portion of the image.Table 4. Accuracy assessments for the Shenzhen urban forest carbon density estimates based on cross-validation and the evaluation of map applications by calculating the mean density of each forest carbon density map and its variance using model-assisted regression estimators from k-Nearest Neighbors (kNN) and the significant spectral variables from both the linear stepwise regression (LSR) and the logistic model-based stepwise regression (LMSR), and their integrations with the linear spectral unmixing analysis (LSUA) (Note: k is the number of nearest neighbors, mean is the average estimate of observed locations, R 2 and RMSE are the coefficient of determination and root mean square error between the estimated and observed values of the sample plots,

Discussion
Urban forests play an important role in the mitigation of carbon concentration in the atmosphere, the reduction of human activity induced air pollution and noise and the control of global climate change [56][57][58][59][60][61][62][63][64][65][66][67][68][69][70][77][78][79][80][81].Accurately estimating and mapping urban forest carbon density has thus become very important.However, a great challenge for this purpose is selecting or developing methods for extracting vegetation fraction from the mixed pixels of complex urban landscapes and then introducing the information into spatial modeling [81].The results of this study showed that by comparing three spatial interpolation methods-LSR, LMSR and kNN-without the vegetation fraction, introduction of the vegetation fraction images derived using LSUA into the modeling methods increased the accuracy of mapping Shenzhen urban forest carbon density by 1.0%-9.3%,depending on the used methods.This indicates that extraction and use of the vegetation fraction has considerable potential for increasing the accuracy of mapping the urban forest carbon density.This finding was supported the study of Yan et al. [94] in which increases in accuracy of 31%-39% were achieved by adding the vegetation fraction information into a regression model for mapping the carbon density of general or non-urban forests dominated by pure plantations.Comparatively, the increase in estimation accuracy for this study was relatively smaller, mainly because Shenzhen had a more complex landscape and more mixed pixels than the non-urban forests.
For this study, a limited budget precluded acquisition and use of fine spatial resolution images such as from Worldview-2; instead, a new and free downloaded Landsat 8 image was used.This image had a spatial resolution of 30 m × 30 m, which was relatively too coarse for the decomposition of the mixed pixels in the complex Shenzhen city landscape.The use of fine spatial resolution images may increase the accuracy of unmixing the mixed pixels and, thus, the accuracy of estimating the urban forest carbon density.In this study, moreover, the lack of fine spatial resolution images or land use and land cover maps limited the accuracy assessment of the vegetation fraction from LSUA.In addition, the lack of fine spatial resolution images also made it impossible to separate the vegetated areas of Shenzhen city into forested lands, shrub lands and grasslands, mainly because, in this city, the trees and shrubs were often mixed together, and the grasslands were in small segments around buildings and bare lands.Readers should thus use the conclusions drawn in this study with caution.
On the other hand, mapping urban forest carbon density requires a higher cost compared to the generation of a carbon density map for non-urban forests for the same size with the same estimation accuracy.The cost increase results from the requirement for more field observations because of more complex urban landscapes and higher spatial and temporal variability of urban forest carbon density.For example, the coefficient of variation of 125.9% for the carbon density of the urban forests in Shenzhen was very large.For accuracy expressed as a relative error of 15% or less, a total of 161 sample plots were selected using a random sampling design at a 0.05 significance level.For the carbon density of non-urban forests, the coefficient of variation generally varies from 60% to 100% and a total 35-100 sample plots is required.Thus, the increase of cost and time to collect and process the additional sample plot data is about 60%-360%.Moreover, free Landsat images at a spatial resolution of 30 m × 30 m are often adequate for mapping the carbon density of non-urban forests.For urban forests, remotely sensed images at spatial resolutions finer than 5 m × 5 m may be needed to unmix the mixed pixels.The cost to acquire the images varies depending on the spatial resolution of the images.For instance, the image costs for Shenzhen city are $4688 for a RapidEye basic product at a 5 m × 5 m spatial resolution, $18,864 for SPOT 6 pan-sharpened data at a 1.5 m × 1.5 m spatial resolution, and $47,619 for Pleiades pan-sharpened images and $58,608 for Worldview-2 at a 0.5 m × 0.5 m spatial resolution.The extremely high cost often makes it impossible to use the fine spatial resolution images to map forest carbon density for large cities in developing countries.Thus, using a free Landsat 8 image could greatly reduce the image cost.Instead, the estimation accuracy may decrease, but the amount of decrease is unknown and a further study is necessary.
The results of RMSE from cross-validation showed that LSR with and without the vegetation fraction led to the smallest RMSE values, followed by LMSR with and without the vegetation fraction, and kNN with the significant spectral variables from LMSR and the vegetation fraction together.The largest RMSE came from kNN with the significant variables from LSR and without the vegetation fraction.However, LSR, regardless of whether the vegetation fraction was used, resulted in not only illogically negative estimates, but also unreasonably large estimates.Moreover, in this study, urban forest carbon density and all the selected spectral variables were characterized by skewed instead of normal distributions.This required transformation of the data to normality.Thus, a logarithm transformation, which is most appropriate for skewed distributions [96], was applied to the data before the linear regressions.However, the results showed that LSR with and without the vegetation fraction led to average estimates of the urban forest carbon density that were outside the confidence interval obtained using simple random sampling design at the significance level of 0.05 for both the sample plots and the resulting maps with extremely large estimates.This indicated that, in practice, it is not always necessary or desirable to transform a data set to achieve normality [96].This finding was consistent with both Fleming et al. [45] and Wang et al. [46].
In addition, both LSR and kNN and their integration with LSUA led to a linearly decreasing trend of the residuals over increasing observations of urban forest carbon density, i.e., overestimates for the smaller observations and underestimates for larger observations.The over-and underestimates were mainly caused by the non-linear relationships between urban forest carbon density and the selected spectral variables.The linearly decreasing trend of the residuals was not apparent for the non-linear model LMSR with and without the vegetation fraction.This indicated that the non-linear logistic model was a better choice than the linear LSR and kNN techniques for modeling the relationship between urban forest carbon density and the spectral variables and improving estimation of Shenzhen urban forest carbon density.
In this study, the resulting relative errors of estimating the urban forest carbon density were 87.0% and 89.9% for LSR, 92.1% and 95.6 for LMSR, and 96.2%-110.6% for kNN with and without the vegetation fraction, respectively.Compared to those from Yan et al. [94], these relative errors were much larger.This was mainly caused by the larger coefficient of variation of 125.9% for the urban forest carbon density in the complex Shenzhen landscape and the greater number of mixed pixels compared to the study of Yan et al. [94] in which the study area was dominated by the pure plantations.This was supported by the findings from Davies et al. [74] and Pataki et al. [75].This indicated greater potential for further increasing the accuracy of mapping urban forest carbon density by improving the decomposition of mixed pixels using finer spatial resolution images.
Studies in the Nordic countries and the USA have shown that kNN is a powerful tool for mapping forest types and estimating tree volumes and biomass by combining national forest inventory sample plot data and remotely sensed images [47,[49][50][51][52][53][54].However, its performance for mapping Shenzhen urban forest carbon density in this study produced a much smaller accuracy than that using the non-linear regression LMSR.The reason was mainly because kNN is based on the spectral distance between pure pixels [88][89][90], whereas the complex Shenzhen urban vegetation landscape has a large number of mixed pixels with a variety of structures and compositions of endmembers.The spectral reflectance values of the mixed pixels varied greatly depending on their inherent structures and compositions.Thus, the resulting spectral distances between the mixed pixels did not correspond well with those for pure pixels.On the other hand, shadows in the urban landscape may also affect the representativeness of the spectral distance to the similarity among pixels.
Urbanization causes LULC changes and, thus, changes in the total carbon stock and its pools of cities.Based on the study by Scalenghe et al. [97], in a populated area the proportion of anthropogenic carbon stock increased and was accompanied by a large increase in the carbon emission per capita per unit area, while the proportions of soil and vegetation carbon stock decreased.Moreover, Villa et al. [27] also quantified the ratio of anthropogenic to natural carbon stocks for a peri-urban area in Italy using Landsat multi-temporal data.However, this study concentrated only on the estimation and mapping of the forest carbon density in Shenzhen and did not deal with assessing the impacts of urbanization on the amounts of other carbon pool stocks because of a lack of data.In the future, anthropogenic (buildings, roads, impervious surface, etc.), soil and vegetation carbon stocks, carbon emission and their dynamics due to urbanization, sprawl patterns, surface imperviousness and soil sealing, should be studied.
This study demonstrated the potential of combining spatial interpolation methods and spectral unmixing analysis for improving mapping and estimation of urban forest carbon density.Further studies will focus on the use of fine spatial resolution images and spectral unmixing analysis to separate vegetated areas into vegetation types such as forests, individual trees along streets, shrub lands and grasslands.Moreover, selecting or developing methods for removing or reducing the effects of building-induced shadows on the estimation of urban forest carbon density is another great challenge.In addition, airborne LiDAR data or the data from terrestrial 3D laser scanning and their data fusion with optical images may also offer great potential for increasing the accuracy of forest carbon density estimation [72,73].

Conclusions
In this study, a novel approach was proposed to map the urban forest carbon density of Shenzhen, a rapidly sprawling city in southern China.In this approach, three spatial interpolation methods-LSR, LMSR and kNN-were integrated with LSUA by introducing the derived vegetation fraction image into the modeling methods to increase estimation accuracy.The results led to the following conclusions: (1) Adding the vegetation fraction from LSUA into the linear and non-linear regressions and kNN increased the accuracy of mapping Shenzhen urban forest carbon density by 1.0%-9.3%compared to the methods without the vegetation fraction; (2) All the average estimates of urban forest carbon density from these methods were in the confidence interval estimated using only the sample plot data at the significance level of 0.05.However, LSR resulted in illogically negative and extremely large estimates, and kNN produced greater RMSEs than LSR and LMSR; (3) Integrating LMSR and LSUA improved the systematical over and underestimations caused by LSR and KNN for the smaller and greater urban forest carbon density values, respectively.The integration also led to reasonable estimates with a RMSE value that was much smaller than those from kNN with and without the vegetation fraction.Thus, combining the spatial modeling method LMSR and the spectral unmixing analysis LUSA coupled with Landsat images is most promising for increasing the accuracy of mapping urban forest carbon density.In addition, this method is characterized by the great potential for accurate, rapid and nondestructive prediction of urban and peri-urban forest carbon stocks with an acceptable level of error and low cost.

Figure 1 .
Figure 1.Location of the study area-Shenzhen city and spatial distribution of sample plots within land use and land cover categories.
density of a forest carbon density map and its variance using model-assisted regression estimators, and the confidence interval at the significance level of 0.05 was 14.92 Mg/ha-22.15Mg/ha).
are the mean density of a forest carbon density map and its variance using model-assisted regression estimators, and the confidence interval of 14.92 Mg/ha-22.15Mg/ha determined at the significance level of 0.05.

Figure 3 .
Figure 3.The residuals for the predicted forest carbon density values graphed against the observed values using (a) the linear stepwise regression (LSR); (b) the integration of LSR and the linear spectral unmixing analysis (LSUA) by adding the vegetation fraction into LSR; (c) the logistic model-based stepwise regression (LMSR); (d) the integration of LMSR and LSUA; (e) the k-Nearest Neighbors (kNN) with k = 10 using the significant variables from LMSR; and (f) the integration of kNN and LSUA with k = 10.

Figure 4 .
Figure 4.The spatial distributions of the predicted forest carbon density values using (a) the linear stepwise regression (LSR); (b) the integration of LSR and the linear spectral unmixing analysis (LSUA); (c) the logistic model-based stepwise regression (LMSR); (d) the integration of LMSR and LSUA; (e) the k-Nearest Neighbors (kNN) with k = 10 using the significant variables from LMSR; and (f) the integration of kNN and LSUA with k = 10.

Table 1 .
Spectral variables (SV) derived from a total of eight bands for the Landsat 8 image.

Table 2 .
The statistical summary of sample plot data used for mapping urban forest carbon density of Shenzhen.