Estimating urban vegetation biomass from Sentinel-2A image data

: Urban vegetation biomass is a key indicator of the carbon storage and sequestration capacity and ecological effect of an urban ecosystem. Rapid and effective monitoring and measurement of urban vegetation biomass provide not only an understanding of urban carbon circulation and energy flow but also a basis for assessing the ecological function of urban forest and ecology. In this study, field observations and Sentinel ‐ 2A image data were used to construct models for estimating urban vegetation biomass in the case study of the east Chinese city of Xuzhou. Results show that (1) Sentinel ‐ 2A data can be used for urban vegetation biomass estimation; (2) compared with the Boruta based multiple linear regression models, the stepwise regression models—also multiple linear regression models—achieve better estimations (RMSE = 7.99 t/hm 2 for low vegetation, 45.66 t/hm 2 for broadleaved forest, and 6.89 t/hm 2 for coniferous forest); (3) the models for specific vegetation types are superior to the models for all ‐ type vegetation; and (4) vegetation biomass is generally lowest in September and highest in January and December. Our study demonstrates the potential of the free Sentinel ‐ 2A images for urban ecosystem studies and provides useful insights on urban vegetation biomass estimation with such satellite remote sensing data.


Introduction
According to the World Urbanized Prospects, urban residents are expected to compose 68% of the global population by 2050 [1], and this would bring increasingly intensive urban heat island (UHI) effects, environmental degradation, and ecological damage.As an important carrier of urban ecosystems, urban vegetation-which refers to all naturally growing and human-planted vegetation within an urban area [2,3]-brings considerable ecological, economic, and social benefits [4].These include improving urban microclimates, mitigating UHI effects, increasing surface runoffs, maintaining the urban carbon-oxygen balance, and equally importantly, enhancing the quality of urban life by providing spaces for relaxation and recreation [5][6][7][8].As such, the focus of urban ecoenvironmental studies has been long on urban vegetation, particularly the biomass of urban vegetation [9].Urban vegetation biomass is an effective indicator of the capacity of carbon storage and sequestration, and ecological effect of an urban ecosystem [10,11]; it is, therefore, important to estimate urban vegetation biomass in urban eco-environmental management.
Traditional biomass measurement is simply to remove and weigh all the biomass occurring in quadrats, which is a labor-intensive and time-consuming practice [12,13].This method does not allow quick monitoring and, more importantly, to some extent, might be destructive to the phenomenon being investigated.Remote sensing, however, provides an alternative to biomass measurement largely because it makes objective and mostly non-destructive observations of vegetated areas at various spatial and temporal resolutions.While vegetation biomass cannot be directly derived from remote sensing image data, remote sensing based estimation requires the use of sample plots to acquire field measurements for allometric growth equations based modeling and image interpretation for estimation (e.g., [14]).Vegetation biomass estimation with remote sensing has been summarized and reviewed in previous studies [15][16][17].While optical sensor, radar, and lidar data can be used for biomass estimation separately or jointly [18][19][20][21][22], multispectral data is the most frequently used data type [15].Although it has been widely recognized for its advantages, remote sensing has been mostly used to measure the biomass of individual vegetation types in natural forest [23,24], grassland [25][26][27], wetlands [28,29], and deserts [30] but rarely the biomass of urban vegetation [14,31].
Sentinel satellites are an Earth observation satellite constellation developed by the European Space Agency (ESA) as part of the Copernicus Program.Sentinel-2 is a wide-swath, high-resolution, multispectral imaging mission with two twin satellites (Sentinel-2A and Sentinel-2B), supporting land and climate-change monitoring [32].Sentinel-2A was launched in June 2015 and has offered free image data at the ESA's website as of December 2015.The Sentinel-2 MSI (multispectral imager) samples 13 different spectral bands ranging from the visible to shortwave infrared of electromagnetic spectrum, four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution [32].It has now been used for a variety of forestry applications such as fire damage monitoring [33,34], forest storage estimation [35,36], and canopy cover calculation [37].While some researchers have combined Sentinel-2A with radar data for biomass estimation [24], using such free optical sensor data alone has not been assessed.Testing the capability of Sentinel-2A data to estimate urban vegetation biomass would be interesting as Sentinel-2A data is being increasingly important for land monitoring, particularly for forestry.
In this study, we therefore focus on the modeling of urban vegetation biomass estimation from Sentinel-2A image data.Quadrat biomass was calculated using the allometric biomass equations with field measurements, and then vegetation biomass models were constructed with remote sensing derived variables.Specific objectives are testing the capability of Sentinel-2A data to estimate urban vegetation biomass and examining whether vegetation type-specific modeling can improve estimation accuracy.

Study Area
Bordering the provinces of Shandong, Henan, and Anhui, Xuzhou (33°43'~34°58' N, 116°22'~118°40' E) (Figure 1) is a national key railway hub located in the northwestern part of Jiangsu province, east China [38].It has a monsoon-influenced humid subtropical climate with an annual mean daily temperature of 14.5 °C and an annual total precipitation of 832 mm [39].As a typical forested city, Xuzhou has received multiple titles and awards such as the National Forest City in 2012, the National Ecological Gardening City in 2015, and particularly the UN-Habitat Scroll of Honor Award in 2018 [40], which is attributed largely to the implementation of several greening and ecological restoration programs in recent decades.Although the importance of urban vegetation to cities is generally acknowledged here, no research has been conducted to estimate and assess the urban vegetation biomass for Xuzhou.
The area within the third ring road of Xuzhou (indicated by the red line in Figure 1a) was selected for this research, covering a geographical area of ~108.51 km 2 .The area within the third ring road is traditionally considered as the urbanized part of Xuzhou and home to the majority of Xuzhou's urban residents.Its urban green areas have expanded remarkably in recent years and would be an ideal area for this research.The study area is flat in the central area with thick soil and hilly in the north, east, and south parts with thin humus-poor soil.The soil type is leached cinnamon soil, weak alkaline with pH ranging from 7.63 to 8.07 [41].According to our fieldwork, most of the trees in the study area are coniferous, consisting largely of arborvitae trees (Platycladus orientalis).These evergreen trees were mainly planted during the 1950s and 1960s with 700-3000 trees per hectare [41].They are usually 5-12 m high (avg.8.36 m) with diameters at breast height (DBH) ranging from 5 to 15 cm (avg.12.47 cm) [41].Broadleaved forest is dominated by poplar (Populus euramevicana), black locust (Robinia pseudoacacia), and paper mulberry (Broussonetia papyrifera) trees.While the poplar trees are usually large (avg.DBH = 21.40 cm) and high (avg.height = 20 m) and concentrated along rivers and roads, the black locust and paper mulberry trees are scattered in parks and small hills.Shrubs are mostly found in parks, including colorful and decorative species such as Buxus megistophylla, and Berberis thunbergii.Grassland is relatively small in urban Xuzhou, usually in parks and residential/institutional properties.Typical grass includes Setaria viridis, Ophiopogon bodinieri, Iris tectorum, and Allium macrostemon.

Remote Sensing Data
In this study, we used Sentinel-2A image data-freely obtained from ESA's website-for urban vegetation biomass estimation.These L1C-level data, which have already been radiometrically calibrated, were acquired in six different months of 2017 (Table 1).The image quality is generally good with a mean cloudiness of less than 10%.Although the January and May images were more cloud-contaminated, the study area remains cloud-free in the images-the images are therefore still usable.For data preprocessing, they were first atmospherically corrected and then re-sampled to 10m, both using SNAP (SentiNel Application Platform), an image processing package developed by ESA for processing Sentinel data [42].Lastly, the study area was extracted from the image data in ENVI 5.1 software for further processing.

Urban Vegetation Classification
Based on our preliminary field investigations, we decided to classify the vegetation of the study area into three coarse categories, namely low vegetation (mostly shrubs and grass), broadleaved forest (mostly poplar, black locust, and paper mulberry), and coniferous forest (mostly arborvitae trees).While many areas are characterized by a single vegetation type, there are some areas with mixed vegetation, which justifies the use of linear spectral mixture analysis (LSMA) [38,43]-where the spectrum of a pixel is considered a linear combination of spectra of pure endmembers within the pixel weighted by their fractional abundance.To this end, a wide variety of features, such as spectral features (spectral reflectance and spectral indices), textural features (calculated by the gray level cooccurrence matrix), and vegetation abundances (the abundances of coniferous forest, broad-leaved forest, and low vegetation, obtained by LSMA) were derived from the Sentinel-2A image data and combined with topographical features (DEM-digital elevation model, and slope and aspect derived from DEM) to classify urban vegetation classification using the support vector machine (SVM) method.SVM is a machine learning algorithm used for image classification [44,45] and can achieve high accuracy.We compared SVM with other classifiers, namely random forest (RF), artificial neural network (ANN), and quick unbiased efficient statistical tree (QUEST), and found that the SVM produced the best result when vegetation abundances were added for classification.For a detailed description of the classification procedure, please refer to our previous research [2].The produced classification map helps to identify the dominant vegetation type of each pixel so the biomass of each vegetated pixel can be estimated with the models constructed later.

Candidate Variables for Modeling
A total of 116 variables (features) on spectral reflectance, vegetation indices, topographical features, and vegetation abundances were selected as candidate variables (features) for biomass estimation.They are given in Table 2 (see Table A1 for their description and calculation formulas).Note: VRE1-VRE3 represent the spectral reflectance in the three red-edge bands of Sentinel-2A image data and N_NIR represents the narrow near-infrared band.Low, BLF, and CLF represent the abundances of low vegetation, broadleaved forest, and coniferous forest.The description and formulas for the vegetation indices are detailed in Table A1.Mean (*), Var (*), Homo (*), Cont (*), Diss (*), Entr (*), Sec_M (*), and Cor (*) refer to the eight textural features obtained by the gray level cooccurrence matrix using the 10 original image bands, namely mean, variance, homogeneity, contrast, difference, entropy, second moment, and correlation.

Field Measurements
Biomass sampling is necessary for vegetation biomass modeling.Usually, quadrat biomass is the sum of the dry weight of every single plant in the quadrat [12,13].Despite high accuracy, this method requires the vegetation being investigated to be cut.As such, it is applicable to primeval forest or experimental plots but not desirable for urban green land.As a frequently used indirect biomass estimation method [46], the allometric biomass equations, where the quantitative relationships between the biomass and the growth variables of a plant are established [11], however, provide an alternative biomass sampling approach in an urban context.As they are reliable for determining tree biomass, a growing number of biomass equations have been proposed for various vegetation species across the world [47][48][49][50][51][52][53][54].In this study, the allometric biomass equations were considered for calculating the biomass of each quadrat.
From extensive literature, the allometric biomass equations for various types of trees and shrubs in Xuzhou were summarized (Tables A2 and A3).For grass, a different estimation approach was adopted in this study: the average unit grassland biomass of Xuzhou is the spatially weighted biomass of Jiangsu, Anhui, Henan, and Shandong provinces [55] since Xuzhou is located at the junction of these four provinces (Table 3).Through the calculation, the average unit biomass of Xuzhou's grassland is 61.89 g/m 2 .The growth variables of plants required in the allometric biomass equations were measured in the field investigations conducted from October to December 2017.The general investigation procedure is as follows: (1) a total of 192 urban vegetation quadrats were randomly pre-selected over the false-color Sentinel-2A imagery of the study area and their central coordinates were retrieved; (2) 10 m × 10 m quadrats were determined (matching the spatial resolution of Sentinel-2A imagery) by navigation in the field with hand-held GPS (Global Positioning System) devices to these coordinates; (3) the growth variables of each single plant (shrubs and trees only) in each quadrat were recorded and the biomass of each single plant using the plant-specific allometric biomass equations was calculated; and (4) the biomass of the all the plants in a quadrat were summed to obtain the total biomass of that quadrat and this was repeated for each quadrat.
Note that our records varied with vegetation type.Within each quadrat, we documented the name, tree height (from the base to the crown), and DBH (diameter at breast height, i.e., ~1.3 m) for trees, the name, basal diameter, height, and crown width for shrubs, and the name, height, and coverage area for grass.Different measuring tools were used in accordance with the plants to be investigated and the parameters to be recorded.The DBHs and basal diameters were measured by a 2-m tape measure with a minimum scale of 1 mm while shrub heights were measured by a 5-m tape measure with a minimum scale of 1 mm.For tree heights, we used a telescopic height measuring rod with a maximum range of 20 m and a minimal scale of 1 mm.Photos illustrating the fieldwork are shown in Figure 2.
Although 192 vegetation quadrats were initially selected, only 140 quadrats of them (shown in Figure 1) were visited and investigated in practice-because some of the pre-selected quadrats were not accessible for various reasons (e.g., physical barriers and refusal to access).Among the 140 quadrats were 35 dominated by coniferous forest, 73 by broadleaved forest, and 32 by low vegetation.The results of quadrat biomass calculated mainly by using the allometric biomass equations are detailed in Table A4.

Correlation Analysis
Prior to modeling, the relationship between the candidate variables (Table 2) and the vegetation biomass was examined through correlation analysis.The biomass of the quadrats dominated by low vegetation, broadleaved forest, and coniferous forest is hereinafter referred to as low vegetation biomass, broadleaved forest biomass, and coniferous forest biomass, respectively.The correlation coefficients were computed with and without vegetation types discriminated.

Stepwise Regression Modeling
Stepwise regression (SR) is essentially a multiple linear regression method, but it is different from the general multiple linear regression in the selection of variables.In a stepwise regression analysis, the most significant or least significant variable is added to or removed with iteration from the multiple linear regression model based on its statistical significance [56,57].At each iteration of adding or removing a potential independent variable, resultant models are assessed by means of the p-value of an F-statistic (p-value < 0.05 for statistical significance) [56,57].Stepwise regression has proved effective in selecting variables for modeling and has been widely used in different fields [58,59], including forest biomass estimation [60].As such, it was considered more suitable for constructing the urban vegetation biomass estimation models in this study.
As it is likely that collinearity exists in the predictive variables, the variance inflation factor (VIF) [57,61] is used to examine it in this study: where Ri is the correlation coefficient between the ith predictive variable and the remaining predictive variables.There is no multicollinearity if VIF ranges between 0 and 10.If VIF ≥ 10, high multicollinearity exists between variables and some of them should be removed from the model [62].

Boruta Based Multiple Linear Regression Modeling
In addition to the SR modeling, the general multiple linear regression (MLR) is also considered in this study for comparative analysis.It is too complicated to include all the 116 candidate variables (Table 2) in the MLR modeling as it would decrease accuracy, cause overfitting, and slow computation.It is advisable to reduce the dimensionality of data when there are a large number of variables [63].To this end, a group of important variables is then selected, which is done in this study by using the Boruta algorithm.Boruta is a feature selection wrapper built around the random forest classification algorithm and helps to determine important variables [64,65].A detailed description of this feature selection technique can be found in [65,66].The Boruta algorithm can be performed in the statistical software of , where important variables are confirmed for modeling and unimportant one are rejected, and some artificial variables called shadow variables are generated from the original variables [65]).
Despite the capability to locate important variables, the Boruta algorithm does not consider the collinearity among these variables.Like the SR modeling, closely correlated variables are removed if VIF ≥ 10.The final MLR biomass estimation models are finally determined until the VIF of each remaining variable is less than 10.

Accuracy Assessment
While 70% of the calculated quadrat biomass were used for modeling, the remaining 30% were reserved for assessing the models using two measures, namely the coefficient of determination (Ryz 2 ) and the root-mean-square-error (RMSEyz): where  , is the calculated quadrat biomass,  , is the modeled quadrat biomass,  is the average of calculated biomass of all quadrats, and  is the number of quadrats.

Seasonal Variation of Urban Vegetation Biomass
After the accuracy assessment, the superior models can be determined and used for exploring the seasonal vegetation biomass variation of the study area.With the variables required by the determined models derived from the Sentinel-2A image data (Table 1), the biomass of low vegetation, broadleaved forest, and coniferous forest can be estimated for January, March, May, July, September, and December of 2017, respectively.The total urban vegetation biomass of the study area is then calculated by summing the estimated type-specific biomass.The change rate (CR) is defined by the following equation: where  and  are the maximum and minimum biomass of the year 2017.

Urban Vegetation Classification
By the SVM classifier, the urban vegetation of the study area was classified into three types, namely low vegetation, broadleaved forest, and coniferous forest (Figure 3) in the 24-July-2107 image; the overall accuracy of this classification was 89.86% with a Kappa coefficient of 0.83.While the central part of the study area had limited vegetation, vegetated areas were mostly covered by low vegetation, followed by coniferous forest.

For Low Vegetation
There were 14 candidate variables significantly correlated with low vegetation biomass (Table 4).Eight spectral reflectance variables had negative correlations with all-vegetation biomass, coefficients ranging from −0.364 to −0.553.It was negatively associated with low vegetation abundance and positively with coniferous forest abundance.Low vegetation biomass is generally lower than the biomass of broadleaved and coniferous forests, and more low vegetation in the quadrat means lower quadrat biomass.The correlation of low vegetation biomass with topographic features was not significant because low vegetation is usually scattered in the study area.Low vegetation biomass was negatively correlated with two vegetation indices and two textural features.

For Broadleaved Forest
A total of 54 variables were significantly correlated with broadleaved forest biomass (Table 5).Four spectral reflectance variables were negatively correlated with broadleaved forest biomass.Regarding vegetation abundance variables, only low vegetation abundance was negatively correlated with broadleaved forest biomass, but the coefficient was low.As for topographic features, broadleaved forest grows in relatively flat areas (e.g., parks and residential land) and low-elevated hills in the study area and, therefore, no significant correlation exists between topography and broadleaved forest biomass.The biomass was also correlated with seven vegetation indices, higher correlation coefficients with DVI and SR4.Textural features had close, mostly positive, correlations with broadleaved forest biomass, although the highest correlation (−0.72), with Cor (VRE2), was negative.6).Seven spectral reflectance variables were all negatively correlated with coniferous forest biomass, with correlation coefficients mostly higher than 0.5.Not surprisingly, only coniferous forest abundance (CLF) was highly positively correlated with coniferous forest biomass.DEM was the only topographic feature significantly correlated with coniferous forest biomass, and the negative correlation is probably linked to the fact that coniferous forest grows in hills and its biomass decreases with elevation.Coniferous forest biomass was highly significantly correlated with several vegetation indices but, interestingly, no correlation was found with textural features.The Var (variance), Cont (contrast), Diss (difference), Entr (entropy) values were all zero while Mean (mean), Homo (homogeneity), Sec_M (second moment), and Cor (correlation) values were all one-coniferous forest is densely distributed in the study area, thus no clear textural characteristics.

For All-Type Vegetation
Results show that 39 variables were significantly correlated with all-type vegetation biomass (Table 7).In total, ten spectral reflectance variables had negative correlations with all-type vegetation biomass, coefficients ranging from −0.308 (Red) to −0.496 (Green).It was negatively associated with low vegetation abundance but positively with broadleaved and coniferous forest abundances.Low vegetation has lower biomass than coniferous and broadleaved forest and, in a given area (e.g., a pixel size), the all-type vegetation biomass would be lower if low vegetation abundance is larger than the other two vegetation abundances.While it had no significant correlation with topographic features, all-type vegetation biomass was correlated with half of the vegetation indices.The highest positive correlation coefficient was found with SR4 (0.390) while the highest negative with DVI (−0.396) (Table A1).In addition, only 14 (17.50% of the total) textural features were significantly correlated with all-type vegetation biomass and coefficients were generally low.

Stepwise Regression Models
The results of performing SR for constructing vegetation biomass estimation models are presented in Table A5.All the (adjusted) coefficients of determination (Rnh 2 and adj-Rnh 2 ) were higher than 0.70, and the fitting was generally good.The variables in the models were less than those (highly) significantly correlated with vegetation biomass (Tables 4-7).The type-specific and all-vegetation biomass estimation models are given below.
The SR biomass estimation model for low vegetation: The SR biomass estimation model for all-type vegetation: The results of performing the Boruta algorithm in the statistical software of  are shown in Figure 4. Important variables were labeled as Confirmed in blue, unimportant ones as Rejected in red, and shadow ones as Shadow in grey.Using the same biomass data as the SR modeling, the MLR biomass estimation models for low vegetation, broadleaved forest, coniferous forest, and all-type vegetation were built with the important variables identified through the Boruta algorithm and the use of VIF.
The MLR biomass estimation model for low vegetation biomass: Figure 5 illustrates the results of assessing the SR biomass estimation models for low vegetation, broadleaved forest, coniferous forest, and all-type vegetation.It shows that Ryz 2 values of the models for specific vegetation types (viz.the models for low vegetation, broadleaved forest, and coniferous forest) were all higher than 0.7.The coniferous model had the highest Ryz 2 (0.786) and the lowest RMSEyz (6.89 t/hm 2 ).The all-type model had a larger RMSE than the type-specific models.Similarly, the remaining 30% of field observation data are used to assess the accuracy of the MLR biomass estimation models.After this, the two types of models are compared in terms of accuracy measured by the coefficient of determination (Ryz 2 ) and root-mean-square-error (RMSEyz) (Table 8).

Seasonal Variation
As the SR models produced better estimates, they were used to calculate the biomass of each urban vegetation type in January, March, May, July, September, and December of 2017.The typespecific vegetation biomass and total vegetation biomass are shown in Figure 6.
Overall, vegetation biomass increased over time and decreased after peaking in autumn.The highest biomass of low vegetation was in September (28,423 t) and lowest in January and December (~15,000 t) with a maximal change rate of 87.60%.Despite an increase of 27,150 t biomass from January to September, the change rate of broadleaved forest was 58.93%, much lower than low vegetation (Figure 7).The biomass change rate of coniferous forest (25.58%) was the lowest in the three vegetation types.The total vegetation biomass change was 67,524 t with a change rate of 40.39%.

Discussion
Correlation analysis is useful to identify what variables are related to the dependent variable [59].While the biomass of low vegetation and broadleaved forest is correlated mostly with spectral reflectance, broadleaved biomass is correlated mostly with textural features.Although there might be close correlations among some of the candidate variables (e.g., NDVI and RVI in the category of vegetation indices), we here did not provide a full correlation matrix for this because the number of variables was so large and would take substantial space of the publication.In addition, the use of stepwise regression and variance inflation factor can avoid the models with correlated variables [57].
Our modeling results show that for both individual vegetation types and all-type vegetation, the SR models have higher coefficients of determination and lower root-mean-square-errors than the MLR models.This clearly suggests that the SR modeling outperforms the MLR modeling in the estimation of urban vegetation biomass.The superiority of SR modeling is also noted in the study of Xu et al., where degraded grassland biomass was estimated using machine learning methods from terrestrial laser scanning data [27].By comparing SR, random forest, and artificial neural network, they claimed that SR produced the highest accuracy (R 2 = 0.84, RMSE = 48.89g/m 2 ).However, it might be controversial to conclude that SR is best for vegetation biomass modeling as some researchers favor machine learning algorithms.For example, Lu et al. report that RF (R 2 = 0.78, RMSE = 1.34 t/ha) performs better than SR (R 2 = 0.75, RMSE = 1.46 t/ha) in wheat biomass estimation with unmanned aerial vehicle data [67].We here do not attempt to compare the results of our models with those of others because the data for modeling and the contexts (various vegetation types in an urban area vs. a single type of vegetation in (semi-) environments) were different.
Although some researchers estimated vegetation biomass from remote sensing without discriminating types [29], our study revealed that vegetation biomass should be modeled for specific vegetation types for higher modeling accuracy.This is often done for different contexts by other researchers, e.g., Gao et al. who discriminated broadleaved, coniferous, mixed, and bamboo forest in China's Zhejiang province [68], and González-Jaramillo et al. who divided vegetation of the San Francisco watershed (south Ecuador) into tropical mountain forest, subpáramo, and pastures [23].In fact, the finding of correlation analysis that variables significantly correlated with vegetation biomass varies largely with vegetation type implies that type-specific biomass estimations models should be constructed.Similarly, non-species-specific allometric growth models yielded larger errors than species-specific ones [69].Urban vegetation cannot be regarded as a single vegetation type as it varies largely in biophysical characteristics and thus biomass.Such variations, which might be minimized in plantations, should be considered for urban green areas.As such, it is important to discriminate urban vegetation types through image classification before modeling urban vegetation biomass from remote sensing image data.
Regarding the seasonal variation of vegetation biomass, coniferous forest has much lower biomass loss than low vegetation and broadleaved forest, which is because coniferous forest consists mainly of evergreen arborvitae trees that do not lose their leaves through the year.This suggests that more coniferous trees should be planted if the biomass loss of low vegetation and broadleaved forest needs to be compensated.In this multi-season analysis, the same type-specific estimation models were used for estimating vegetation biomass from remote sensing data imaged in different months.For a plant species in an area, there is only one allometric growth equation, which is often built with measurements acquired, e.g., when plants are luxuriant with maximal biomass in a year.The biomass estimation models constructed with quadrat biomass calculated using these equations should best reflect that time.If these models are used for other dates, estimation biomass would be less accurate (e.g., due to less leaves in winter).Remote sensing variables derived from remote sensing images can however characterize the vegetative status of the plants and compensate the impact.
In addition, there are some other limitations that might undermine the results.Firstly, the allometric biomass equations for a variety of plant species with high reported accuracies were borrowed from previous studies, but we were not able to individually verify these equations as this work is out of the scope of the present study.Secondly, tree biomass could be, to some extent, underestimated from remote sensing image data.While it is likely that under large coniferous and broadleaved and coniferous trees grow some low vegetation like grass and bushes, this cannot be recognized in pixels, notwithstanding the application of linear spectral mixture analysis.Despite these limitations, our study proves the capability of free optical sensor data like Sentinel-2A to estimate urban vegetation biomass.It would be interesting if urban vegetation biomass could be regularly monitored; however, this seems currently challenging as Sentinel-2A data now remains scarce and does not allow a retrospective assessment.

Conclusions
This study demonstrates how Sentinel-2A image data can be used for vegetation biomass in an urban context.The main findings and conclusions of this study are as follows:


Freely available multispectral Sentinel-2A satellite data has proven its capability in urban vegetation biomass estimation.The measured biomass of each vegetation type is closely correlated with different remote sensing derived variables, mostly spectral reflectance for low vegetation and coniferous forest and mostly textural features for broadleaved forest.


The vegetation biomass estimation models built by the stepwise regression (SR) outperform those with the multiple linear regression.It is necessary to discriminate vegetation types in biomass modeling and the highest accuracy is obtained by the SR model for coniferous forest.


Highest vegetation biomass occurs in autumn (September) while lowest in winter (January and December).Low vegetation and broadleaved forest have larger seasonal change rates than coniferous forest that consists mostly of evergreen trees.
Urban green areas are a key component of urban eco-environment and make a vital contribution to improving the quality of life and moderating climate.In general, trees have a stronger carbon sequestration capability and produce more biomass than low vegetation.More coniferous trees can maintain less biomass loss in winter.However, tree species should be diversified to reduce ecological vulnerability and guarantee a more robust urban ecosystem and more sustainable urban development.Note: VRE1-VRE3 represent the three red-edge bands; N_NIR represents the narrow near-infrared bands.
Table A2.Allometric biomass equations for trees, used for calculating quadrat biomass.

Figure 1 .
Figure 1.The location of the study area: (a) the border of the study area (i.e., the third ring road of Xuzhou) and the sites for field investigations (yellow for low vegetation, green for broadleaved forest, and purple for coniferous forest); (b) Xuzhou in east China.

Figure 2 .
Figure 2. Photos taken in the field illustrating the measurements.

Figure 3 .
Figure 3. Urban vegetation classification by support vector machine.

Figure 4 .
Figure 4. Importance of candidate variables: (a) low vegetation; (b) broadleaved forest; (c) coniferous forest; and (d) all-type vegetation.Important variables are labeled as Confirmed in blue, unimportant ones as Rejected in red, and shadow ones as Shadow in grey.

Figure 6 .
Figure 6.Type-specific biomass and the total vegetation biomass in the selected months of 2017: (a) low vegetation; (b) broadleaved forest; (c) coniferous forest; and (d) all vegetation.

Figure 7 .
Figure 7.The seasonal change rate of vegetation biomass of the study area.

Table 1 .
Remote sensing image data used for urban vegetation estimation.

Table 2 .
Candidate variables for biomass estimation.

Table 4 .
Variables significantly correlated with low vegetation biomass.

Table 5 .
Variables significantly correlated with broadleaved forest biomass.

Table 6 .
Variables significantly correlated with coniferous forest biomass.

Table 7 .
Variables significantly correlated with all-type vegetation biomass.

Table 8 .
Comparing the accuracies of the SR and MLR biomass estimation models (unit for RMSE: t/hm 2 ).