Constructing a Finer-Resolution Forest Height in China Using ICESat/GLAS, Landsat and ALOS PALSAR Data and Height Patterns of Natural Forests and Plantations

Monitoring forest height is crucial to determine the structure and biodiversity of forest ecosystems. However, detailed spatial patterns of forest height from 30 m resolution remotely sensed data are currently unavailable. In this study, we present a new method for mapping forest height by combining spaceborne Light Detection and Ranging (LiDAR) with imagery from multiple remote sensing sources, including the Landsat 5 Thematic Mapper (TM), the Phased Array L-band Synthetic Aperture Radars (PALSAR), and topographic data. The nationwide forest heights agree well with results obtained from 525 independent forest height field measurements, yielding correlation coefficient, root mean square error (RMSE), and mean absolute error (MAE) values of 0.92, 4.31 m, and 3.87 m, respectively. Forest heights derived from remotely sensed data range from 1.41 m to 38.94 m, with an average forest height of 16.08 ± 3.34 m. Mean forest heights differ only slightly among different forest types. In natural forests, conifer forests have the greatest mean forest heights, whereas in plantations, bamboo forests have the greatest mean forest heights. Important predictors for modeling forest height using the random forest regression tree method include slope, surface reflectance of red bands and HV backscatter. The uncertainty caused by the uneven distribution of Geoscience Laser Altimeter System (GLAS) footprints is estimated to be 0.64 m. After integrating PALSAR data into the model, the uncertainty associated with forest height estimation was reduced by 4.58%. Our finer-resolution forest height could serve as a benchmark to estimate forest carbon storage and would greatly contribute to better understanding the roles of ecological engineering projects in China.


Introduction
Height is a crucial component of forest ecological structure due to its ability to govern elemental processing, competitive dynamics and habitat suitability for many plant and animal species [1,2].Moreover, height is main feature for land cover classification, especially for separate forests and shrubs [3].However, measuring forest height on the ground is laborious and difficult to implement on a national scale.China has the largest area of plantations in the world via some ecological engineering projects [4].An accurate account of China's forest height spatial distribution is critical for assessing the effect of ecological projects, such as the Natural Forest Conservation Program and the Grain for Green Program [5].However, a Chinese forest height map with finer resolution (30 m) is not available to the scientific community.Height variation within forest types and their differences, especially for natural forests and plantations, also remain unknown.Therefore, it is important to quantify forest height and its spatial pattern to reduce uncertainty in forest carbon and ecological value assessments, especially for plantations in China.
Light Detection and Ranging (LiDAR) is commonly used to acquire accurate forest heights by measuring the distance between the top of a tree canopy and the ground [6,7].The Geoscience Laser Altimeter System (GLAS), located aboard the Ice, Cloud and land Elevation Satellite (ICESat), can measure global forest height from space.Although the height detection accuracy of GLAS has been reported to be 10 cm in flat areas [8], it cannot map wall-to-wall forest height directly because it is operated in a point sampling mode and there is a large distance between its tracks [9].Specifically, spots are spaced approximately 170 m apart along a given track, with the distance between two tracks ranging from 2.5 km at a latitude of 80 • to 15 km at the Equator [10,11].Therefore, a significant challenge in mapping wall-to-wall forest height is how to extrapolate the height of a given forest area with no GLAS footprints coverage.Many studies use optical images, such as spectral data from the Moderate Resolution Imaging Spectrometer (MODIS) and/or topographic data, to extrapolate heights measured at GLAS footprints into global forest heights with low resolution (500 m or 1000 m) [12][13][14][15].However, few similar studies have been conducted to map Chinese forest height, especially at 30 m scale [16,17].Additionally, some of these studies also incorporate climate data, such as temperature and precipitation, in their modeling of forest height [13,15].However, introducing multi-year climate data into forest height modeling caused autocorrelation to analysis climatic determinants of global patterns of forest canopy height [18].
To date, relatively few studies have combined Synthetic Aperture Radar (SAR) imagery with GLAS data in estimating forest heights in China because high resolution (~30 m) SAR imagery has not been available in the past at national scales [19].The SAR backscatter is sensitive to biophysical parameters; in forests, double bounces from the ground and tree trunks contribute to backscatter energy, as does the volumetric scattering from the heterogeneous forest canopy and plant stems [20].The total contribution of each component to this backscatter depends on the frequency used by the SAR.For example, L-band (1-2 GHz) radar backscatter has been shown to be useful in identifying forest areas and can be highly correlated to the presence of woody biomass [21,22].However, it is still unclear whether this L-band radar backscatter can improve Chinese forest height estimation accuracy due to the variations in forest structure on this spatial scale [23].
This study builds on the previous study conducted by Huang et al. (2017).They employed MODIS Nadir BRDF (Bidirectional Reflectance Distribution Function) Adjusted Reflectance (NBAR) in the growing season, leaf area index (LAI), canopy cover and GLAS data to map Chinese vegetation height at 500 m resolution.Here, we improve upon the model inputs, moving from a coarse resolution MODIS growing season data set to a set of finer resolution Landsat growing season composite and L-band SAR.We will answer the following three questions: (1) What is the accuracy of the finer-resolution forest height map of China generated by the combination of GLAS, Landsat imagery, L-band SAR, topographical data and tree cover?(2) What is the difference between forest heights in natural forests and those in plantations?(3) To what extent can the addition of L-band SAR data improve the accuracy of extrapolated forest heights on a large scale?

Height Extraction from Geoscience Laser Altimeter System (GLAS) Footprints
The ICESat/GLAS was launched on 12 January 2003.GLAS acquires data by transmitting a laser pulse with a duration of 10 nanoseconds and subsequently recording the pulse that is returned after being reflected from the footprint on the ground that is approximately 60 m in diameter [24].GLAS data was acquired from the National Snow and Ice Data Center (NSIDC, http://nsidc.org/data/icesat),Release 33, focusing on the transmitted and received waveform data (GLA01), waveform parameterization data (GLA05) and Land surface altimetry data (GLA14).All of them were obtained from May to June 2006.
A three-step procedure was used to identify high-quality footprints and to extract their heights [25].First, the footprints with low signal-to-noise (SNR < 18) and saturation (i_satCorrFlg > 2) were removed to reduce atmospheric attenuation, including could.Second, the GLAS footprints were classified to three types (open space, single tree or single layer building, and complex tree or multi-layer building) in terms of waveform indices derived from the GLAS original waveform [26].Thirdly, for the first type (open space), which might include bare land, water, road, and sparse grassland, heights were derived from the GLA14 product.Due to the number restrictions in the returned waveform and the lack of slope correction in GLA14 product, the other two types (single tree or single layer building, and complex tree or multi-layer building) were processed with a Gaussian decomposition algorithm to extract returned waveforms from GLA01 and with a topographic correction factor (Equation ( 1)) to correct for the impact of topography on height extraction [27][28][29].
where VH slope represents the vegetation height in the rugged area; in this study, a 'rugged area' is defined as a region with a slope greater than 5 • , as determined by previous studies [13,27].gpCntRng(1) is the center position of the last peaks corresponding to the ground, SigBeg represents the start of the earliest return peaks, corresponding to the presence of the forest canopy, σ gp represents the standard deviation of Gaussian decomposition from the ground peak, and σ transmit is defined as the standard deviation of Gaussian decomposition from a transmitted waveform.GLAS spots overlaying with areas without vegetation (including bare land, impervious terrains, and water-rich regions) were identified and removed from the dataset with published land cover maps [30,31].The final GLAS dataset contains 651,265 measurements of heights obtained from across China.The GLAS footprints were assigned to 50 ecozones of China for further modelling given that the similar climatic, geographic, and environmental condition in given ecozone [32].The corresponding information of 50 ecozones is listed in the supporting Table S1.

The Greenest Compositions from Landsat 5 Thematic Mapper (TM)
Landsat 5 Thematic Mapper (TM) images were obtained from the Google Earth Engine (GEE), and original images were obtained from the United States Geological Survey (USGS).We processed all available Landsat 5 TM standard surface reflectance data from 2006, including six reflectance bands B1-B5 and B7, as well as the cloud mask band.These surface reflectance data were processed using the LEDAPS method (http://ledaps.nascom.nasa.gov/,accessed on 6 August 2016), and data with cloud were processed using the Fmask algorithm [33].We masked pixels from shadows and clouds in the mask band.Then, we calculated the Normalized Difference Vegetation Index (NDVI) for all available Landsat 5 TM data obtained in 2006.Pixels with maximum NDVI in 2006 were selected to represent the greenest compositions recorded in 2006, which relate to data from the growing season.The final 'greenest composition' thus includes six reflectance bands from Landsat 5 TM and NDVI.

Landsat Vegetation Continuous Fields (VCF) Tree Cover and Topographic Data
Vegetation continuous fields (VCF) data represent the percentage of a pixel's area covered by woody vegetation greater than 5 m in height at a resolution of 30 m.These data are derived from all seven bands of Landsat 5 TM and Landsat 7 ETM+; these datasets have been produced in 2000, 2005 and 2010 [34].In this study, because our Landsat 5 TM images and GLAS data were obtained close to 2005, we use VCF data obtained in 2005.
Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) data, which also have a spatial resolution of 30 m, were released in 2015.This dataset underwent a void-filling process [35,36] in which its elevations, slopes, and other aspects [37] were extracted as input variables for subsequent modeling.

Phased Array L-Band Synthetic Aperture Radar (PALSAR-2/PALSAR) Mosaic Dataset
The Phased Array L-band Synthetic Aperture Radar (PALSAR) yearly mosaic dataset was collected globally from 2007 to 2010.It has a resolution of 25 m and was aggregated from PALSAR located on the Advanced Land Observing Satellite (ALOS) [38].Original SAR data were obtained using the fine beam dual polarization mode (FBD), which utilizes HH (horizontal-transmit, horizontal receive) and HV (horizontal-transmit, vertical receive).Geometric distortion of these images and topographic effects on the images' intensities were reduced with an estimated geometric accuracy of 12 m [39].Original images recording a minimum response to surface moisture were selected and composited to the mosaic dataset, which was then radiometrically calibrated and stored as digital numbers (DN).These DN values can be converted to gamma naught values (representing the backscattering coefficient), which are expressed in decibel units (dB), using the following equation [40]: where CF represents a calibration factor equal to −83.0 dB, and < > represents the ensemble averaging.We downloaded all PALSAR HH and HV data acquired in 2007 because it is the closest year to the times our optical and GLAS data were collected.Then we converted these data into backscattering coefficients (in decibels).We also selected PALSAR data from the growing season and from closer years in cases of limited availability in 2007 (Figure 1).To achieve consistency between datasets, HH and HV data were resampled to acquire a resolution of 30 m with a nearest neighbor interpolation.

Phased Array L-band Synthetic Aperture Radar (PALSAR-2/PALSAR) Mosaic Dataset
The Phased Array L-band Synthetic Aperture Radar (PALSAR) yearly mosaic dataset was collected globally from 2007 to 2010.It has a resolution of 25 m and was aggregated from PALSAR located on the Advanced Land Observing Satellite (ALOS) [38].Original SAR data were obtained using the fine beam dual polarization mode (FBD), which utilizes HH (horizontal-transmit, horizontal receive) and HV (horizontal-transmit, vertical receive).Geometric distortion of these images and topographic effects on the images' intensities were reduced with an estimated geometric accuracy of 12 m [39].Original images recording a minimum response to surface moisture were selected and composited to the mosaic dataset, which was then radiometrically calibrated and stored as digital numbers (DN).These DN values can be converted to gamma naught values (representing the backscattering coefficient), which are expressed in decibel units (dB), using the following equation [40]: where CF represents a calibration factor equal to −83.0 dB, and < > represents the ensemble averaging.We downloaded all PALSAR HH and HV data acquired in 2007 because it is the closest year to the times our optical and GLAS data were collected.Then we converted these data into backscattering coefficients (in decibels).We also selected PALSAR data from the growing season and from closer years in cases of limited availability in 2007 (Figure 1).To achieve consistency between datasets, HH and HV data were resampled to acquire a resolution of 30 m with a nearest neighbor interpolation.

Sampling Selection for Natural Forests and Plantations
The State Forestry Bureau of China generates and examines the National Forestry Inventories dataset (NFI), which compiles data concerning the areas, types, origins and distributions of forest resources [4].Survey samples are systematically allocated according to statistical principles, and investigators periodically revisit survey sample sites.For example, the NFI has a 5-year cycle of survey site visits, and approximately 20% of provinces conduct inventories each year.Therefore, national statistics were thus obtained by summing all statistical data from all provinces acquired in a 5-year cycle.Through this sampling, the overall accuracy of these results was assessed to be approximately 95% [41].We interpreted the forest origins (natural forest or plantations) of these samples at the province scale by using the current forest map of the eighth NFI (2009-2013).In total, 9772 samples were interpreted from the Atlas of Forest Resources of China to have originated from natural forests, and 5010 samples were deemed likely to have originated from plantations (Figure 2).Moreover, forest types for natural forest and plantations were interpreted as natural conifer forests

Sampling Selection for Natural Forests and Plantations
The State Forestry Bureau of China generates and examines the National Forestry Inventories dataset (NFI), which compiles data concerning the areas, types, origins and distributions of forest resources [4].Survey samples are systematically allocated according to statistical principles, and investigators periodically revisit survey sample sites.For example, the NFI has a 5-year cycle of survey site visits, and approximately 20% of provinces conduct inventories each year.Therefore, national statistics were thus obtained by summing all statistical data from all provinces acquired in a 5-year cycle.Through this sampling, the overall accuracy of these results was assessed to be approximately 95% [41].We interpreted the forest origins (natural forest or plantations) of these samples at the province scale by using the current forest map of the eighth NFI (2009-2013).In total, 9772 samples were interpreted from the Atlas of Forest Resources of China to have originated from natural forests, and 5010 samples were deemed likely to have originated from plantations (Figure 2).Moreover, forest types for natural forest and plantations were interpreted as natural conifer forests (NCF), natural broadleaf forests (NBF), natural conifer and broadleaf mixed forests (NMF), natural bamboo forests (NB), conifer plantations (CP), broadleaf plantations (BLP), conifer and broadleaf mixed plantations (MP) and bamboo plantations (BP).

Wall-to-Wall Forest Height Estimation
In order to model spatially continuous forest heights, GLAS height data were used as training data and related to the data from seven Landsat 5 TM variables (B1-B5, B7 and NDVI), two PALSAR mosaic data (HH and HV), three SRTM-derived metrics (elevation, slope and aspect) and VCF with the random forest (RF) regression tree (Figure 3), given its ability to handle a great deal of input predictors that are not independent and non-linearly separable [42][43][44][45].
Using an RF regression tree to produce spatial modeling involves two main steps: first, determining the optimal parameters for the training of the RF regression tree, and second, producing gridded forest heights using RF regression tree.These training and modeling processes were implemented one ecozone at a time.Both the number of trees and the number of variables to split on each node were determined iteratively [46]; the initial value for the number of trees was set at 100 and the initial value for the number of nodes was set at 3. The iterative test was programmed to stop if the predicted height showed a strong correlation with the height measured from its corresponding GLAS footprint in the cross validation dataset, or if the values of trees and nodes reached 500 (using an interval of 100) and 7 (using an interval of 1), respectively.The importance of the 13 input predictors was estimated based on an out-of-bag error estimate [46].After an ensemble decision tree was built into the training procedure, the rules of each decision tree, along with two optimal parameters, were used to predict height in areas without GLAS footprints.Finally, heights from nonforested areas were masked with previous forest classification data [47].

Wall-to-Wall Forest Height Estimation
In order to model spatially continuous forest heights, GLAS height data were used as training data and related to the data from seven Landsat 5 TM variables (B1-B5, B7 and NDVI), two PALSAR mosaic data (HH and HV), three SRTM-derived metrics (elevation, slope and aspect) and VCF with the random forest (RF) regression tree (Figure 3), given its ability to handle a great deal of input predictors that are not independent and non-linearly separable [42][43][44][45].
Using an RF regression tree to produce spatial modeling involves two main steps: first, determining the optimal parameters for the training of the RF regression tree, and second, producing gridded forest heights using RF regression tree.These training and modeling processes were implemented one ecozone at a time.Both the number of trees and the number of variables to split on each node were determined iteratively [46]; the initial value for the number of trees was set at 100 and the initial value for the number of nodes was set at 3. The iterative test was programmed to stop if the predicted height showed a strong correlation with the height measured from its corresponding GLAS footprint in the cross validation dataset, or if the values of trees and nodes reached 500 (using an interval of 100) and 7 (using an interval of 1), respectively.The importance of the 13 input predictors was estimated based on an out-of-bag error estimate [46].After an ensemble decision tree was built into the training procedure, the rules of each decision tree, along with two optimal parameters, were used to predict height in areas without GLAS footprints.Finally, heights from non-forested areas were masked with previous forest classification data [47].

Assessment of Accuracy and Prediction of Uncertainty
Since the NFI only provides general information, such as forest type, forest area and origins at provincial level and does not provide forest height at plot scale [48], we collected 525 forest height measurements to estimate the accuracy of the modeled wall-to-wall forest heights.The distribution of these measurements is illustrated in Figure 4.This dataset includes the main tree species across China, including locust, eucalyptus, polar, Korean pine, larch, birch, mason pine, Chinese fir, and spruce.435 measurements were obtained from peer-reviewed Chinese journals and 90 measurements were collected from national forest inventories [49].Each plot was at least 0.05 ha in size, and the mean height of dominant trees from the emergent layer and the canopy layer in each plot was recorded.To reduce the difference between the mean tree height of our collected data and the modeled forest canopy height from remote sensing data, the plots belongs to mature forest or relatively uniform growing forests were selected in terms of the screening criteria in a previous study [27].

Assessment of Accuracy and Prediction of Uncertainty
Since the NFI only provides general information, such as forest type, forest area and origins at provincial level and does not provide forest height at plot scale [48], we collected 525 forest height measurements to estimate the accuracy of the modeled wall-to-wall forest heights.The distribution of these measurements is illustrated in Figure 4.This dataset includes the main tree species across China, including locust, eucalyptus, polar, Korean pine, larch, birch, mason pine, Chinese fir, and spruce.435 measurements were obtained from peer-reviewed Chinese journals and 90 measurements were collected from national forest inventories [49].Each plot was at least 0.05 ha in size, and the mean height of dominant trees from the emergent layer and the canopy layer in each plot was recorded.To reduce the difference between the mean tree height of our collected data and the modeled forest canopy height from remote sensing data, the plots belongs to mature forest or relatively uniform growing forests were selected in terms of the screening criteria in a previous study [27].
A regression through the origin (RTO) analysis was used to calculate the degree of correlation between the height data measured in the field and the predicted forest heights [50].The coefficient of determination (R 2 ) and root mean square error (RMSE) are used to quantify the accuracy of this correlation.MAE also calculated to check the bias between the field measured heights and the modeled forest heights, considering the regression analysis is sensitive to data region.S1.
A regression through the origin (RTO) analysis was used to calculate the degree of correlation between the height data measured in the field and the predicted forest heights [50].The coefficient of determination (R 2 ) and root mean square error (RMSE) are used to quantify the accuracy of this correlation.MAE also calculated to check the bias between the field measured heights and the modeled forest heights, considering the regression analysis is sensitive to data region.
The uneven spatial distribution and gaps among GLAS footprints would affect wall-to-wall forest height mapping accuracy in the course of extrapolating forest heights in areas without corresponding GLAS footprints.To quantify this uncertainty or variance (ε ), we randomly picked up 10 subsets of GLAS footprints which comprising from 55% to 100% of the whole GLAS footprints (with a 5% interval, e.g., 55%, 60%, 65%, …, 95%, 100%).Ten forest heights were then modeled with the 10 GLAS footprints subsets by running the RF regression tree.The deviation (within ten forest heights) was then calculated to indicate the uncertainty associated with the uneven spatial distribution of GLAS footprints: where  is the predicted forest height at the ith iteration and  is the average of all predicted forest heights.
To quantify the performance of PALSAR data on large scale forest heights estimation, we calculate the accuracy (RMSE and R 2 ) and uncertainty ( ) after and before integrating PALSAR data.

Accuracy and Spatial Patterns of Forest Height in China
The modeled wall-to-wall forest heights, as shown in Figure 5, have a linear correlation with the 525 in situ measured forest heights, yielding an R 2 value of 0.92 and an RMSE value of 4.31 m with regression through the origin, and with an overall MAE for height estimation of 3.87 m.Compared  S1.
The uneven spatial distribution and gaps among GLAS footprints would affect wall-to-wall forest height mapping accuracy in the course of extrapolating forest heights in areas without corresponding GLAS footprints.To quantify this uncertainty or variance (ε GLAS ), we randomly picked up 10 subsets of GLAS footprints which comprising from 55% to 100% of the whole GLAS footprints (with a 5% interval, e.g., 55%, 60%, 65%, . . ., 95%, 100%).Ten forest heights were then modeled with the 10 GLAS footprints subsets by running the RF regression tree.The deviation (within ten forest heights) was then calculated to indicate the uncertainty associated with the uneven spatial distribution of GLAS footprints: where Ĥi is the predicted forest height at the ith iteration and H is the average of all predicted forest heights.
To quantify the performance of PALSAR data on large scale forest heights estimation, we calculate the accuracy (RMSE and R 2 ) and uncertainty (ε GLAS ) after and before integrating PALSAR data.

Accuracy and Spatial Patterns of Forest Height in China
The modeled wall-to-wall forest heights, as shown in Figure 5, have a linear correlation with the 525 in situ measured forest heights, yielding an R 2 value of 0.92 and an RMSE value of 4.31 m with regression through the origin, and with an overall MAE for height estimation of 3.87 m.Compared with field measurement, low (<5 m) and high (>20 m) forest heights show over and under estimations, respectively (Figure 6).The modeled forest heights in China range from 1.41-38.94m, with an average forest height of 16.08 m, a median forest height of 15.78 m and a standard deviation of 3.34 m.In general, regions of maximum forest height in China (mean heights > 20 m) include the regions of southeast Tibet which are occupied by tropical rainforests and monsoon forests, the conifer forests of the Sichuan, and the tropical forests, rainforests, and evergreen broadleaf forests of Taiwan.The region with the minimum forest height (mean height < 3 m) is located in western Inner Mongolia.Forest height in northeast China, considered to be a major forest resource base, is 14.42 ± 2.37 m.Additionally, forest heights vary among 50 different ecozones (Figure 7).For example, the mean forest height in forest-dominated ecozones is 16.25 ± 2.77 m, which is larger than the mean forest height of 13.05 ± 2.91 m recorded in other ecozones.The statistical information about minimum, maximum, 10th and 90th percentiles, median mean, standard deviation (SD) of forest height in 50 ecozones, is listed in supplement Table S2.
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 17 with field measurement, low (<5 m) and high (>20 m) forest heights show over and under estimations, respectively (Figure 6).The modeled forest heights in China range from 1.41-38.94m, with an average forest height of 16.08 m, a median forest height of 15.78 m and a standard deviation of 3.34 m.In general, regions of maximum forest height in China (mean heights > 20 m) include the regions of southeast Tibet which are occupied by tropical rainforests and monsoon forests, the conifer forests of the Sichuan, and the tropical forests, rainforests, and evergreen broadleaf forests of Taiwan.The region with the minimum forest height (mean height < 3 m) is located in western Inner Mongolia.Forest height in northeast China, considered to be a major forest resource base, is 14.42 ± 2.37 m.Additionally, forest heights vary among 50 different ecozones (Figure 7).For example, the mean forest height in forest-dominated ecozones is 16.25 ± 2.77 m, which is larger than the mean forest height of 13.05 ± 2.91 m recorded in other ecozones.The statistical information about minimum, maximum, 10th and 90th percentiles, median mean, standard deviation (SD) of forest height in 50 ecozones, is listed in supplement Table S2.S2.

Height Differences between Natural Forests and Plantations
Analysis of these samples indicates that natural forests have a mean height of 16.17 ± 2.94 m (N = 9772), whereas plantations have a mean height of 15.60 ± 2.82 m (N = 5010).Therefore, these samples suggest that natural forests and plantations have similar mean heights.These mean heights vary only slightly between different forest types (Figure 8): the greatest mean heights are seen in natural conifer forests (NCF: 16.72 ± 3.10 m), followed by natural bamboo forests (NB: 16.37 ± 2.40 m), natural broadleaf forests (NBF: 15.83 ± 2.89 m), and natural conifer and broadleaf mixed forests (NMF: 15.75   S2.

Height Differences between Natural Forests and Plantations
Analysis of these samples indicates that natural forests have a mean height of 16.17 ± 2.94 m (N = 9772), whereas plantations have a mean height of 15.60 ± 2.82 m (N = 5010).Therefore, these samples suggest that natural forests and plantations have similar mean heights.These mean heights vary only slightly between different forest types (Figure 8): the greatest mean heights are seen in natural conifer forests (NCF: 16.72 ± 3.10 m), followed by natural bamboo forests (NB: 16.37 ± 2.40 m), natural broadleaf forests (NBF: 15.83 ± 2.89 m), and natural conifer and broadleaf mixed forests (NMF: 15.75  S2.

Height Differences between Natural Forests and Plantations
Analysis of these samples indicates that natural forests have a mean height of 16.17 ± 2.94 m (N = 9772), whereas plantations have a mean height of 15.60 ± 2.82 m (N = 5010).Therefore, these samples suggest that natural forests and plantations have similar mean heights.These mean heights vary only slightly between different forest types (Figure 8): the greatest mean heights are seen in natural conifer forests (NCF: 16.72 ± 3.10 m), followed by natural bamboo forests (NB: 16.37 ± 2.40 m), natural broadleaf forests (NBF: 15.83 ± 2.89 m), and natural conifer and broadleaf mixed forests (NMF: 15.75 ± 2.51 m).Within plantations, the greatest mean heights are seen in bamboo plantations (17.13 ± 3.43 m), followed by conifer and broadleaf mixed plantations (MP: 16.54 ± 2.13 m), conifer plantations (CP: 15.64 ± 2.70 m), and broadleaf plantations (BP: 15.53 ± 2.89 m).

Improvement after Integrating PALSAR Data
The integration of PALSAR data caused the RMSE to decrease from 4.57 m to 4.31 m and the R 2 to increase from 0.90 to 0.92 when compared with forest heights without PALSAR data, and the uncertainty (  ) decreased by an average of 4.58% at the ecozone level (Figure 9).Desertdominated ecozones recorded the greatest decrease in uncertainty (−45.53%).Meanwhile, uncertainties decreased in all forest-dominated ecozones.Contrastingly, combining PALSAR data slightly increased the uncertainty in six ecozones, with the greatest increase in uncertainty (12.58%) recorded in grassland-dominated ecozones.Previous studies have demonstrated that SAR is useful for estimating forest biomass [51], but few have suggested that PALSAR can be used to estimate forest height.The results of this study demonstrate that PALSAR can improve the accuracy of forest height estimations.García et al., have also reported that including PALSAR for estimating forest height improved the estimation accuracy, and this improvement depends on the vegetation type and structure [52].Although the improvement of R 2 and RMSE in our study is not obvious when considering all ecozones together, we also found that the importance of PALSAR varies with forest types and structure among the sub zones.

Improvement after Integrating PALSAR Data
The integration of PALSAR data caused the RMSE to decrease from 4.57 m to 4.31 m and the R 2 to increase from 0.90 to 0.92 when compared with forest heights without PALSAR data, and the uncertainty (ε GLAS ) decreased by an average of 4.58% at the ecozone level (Figure 9).Desert-dominated ecozones recorded the greatest decrease in uncertainty (−45.53%).Meanwhile, uncertainties decreased in all forest-dominated ecozones.Contrastingly, combining PALSAR data slightly increased the uncertainty in six ecozones, with the greatest increase in uncertainty (12.58%) recorded in grassland-dominated ecozones.Previous studies have demonstrated that SAR is useful for estimating forest biomass [51], but few have suggested that PALSAR can be used to estimate forest height.The results of this study demonstrate that PALSAR can improve the accuracy of forest height estimations.García et al., have also reported that including PALSAR for estimating forest height improved the estimation accuracy, and this improvement depends on the vegetation type and structure [52].Although the improvement of R 2 and RMSE in our study is not obvious when considering all ecozones together, we also found that the importance of PALSAR varies with forest types and structure among the sub zones.

Importance of Predictors to Model Wall-to-Wall Forest Height
The importance of these 13 predictors varies between different sub zones, as is shown in Figure 10, which emphasizes the importance of a given predictor by the darkness of the variable and the size of its value.In general, the most important metric for modeling gridded forest height is the slope derived from the SRTM, except in ecozones dominated by agricultural ecosystems.Aspect and HH are the least important predictors, and the HV metric is more important in non-forest-dominated ecozones than it is in forest-dominated ecozones.Predictors such as surface reflectance of the red band, HV backscatter, and slope are more important than other Landsat-and PALSAR-related metrics and SRTM-derived metrics.In particular, red band outperformed than other bands from Landsat, which conformed to previous study that employed time series Landsat images to map forest height in Sub-Saharan Africa [53].Differences in variables' importance between the sub zones could be explained by forest types and their structural characters in different biomes, such as crown diameter, distance between trees, tree positioning, stand age and LAI.Moreover, determining the relative importance of these predictors can help improve future computational efficiency by allowing researchers to acquire important data for nationwide or worldwide forest-height modeling.

Importance of Predictors to Model Wall-to-Wall Forest Height
The importance of these 13 predictors varies between different sub zones, as is shown in Figure 10, which emphasizes the importance of a given predictor by the darkness of the variable and the size of its value.In general, the most important metric for modeling gridded forest height is the slope derived from the SRTM, except in ecozones dominated by agricultural ecosystems.Aspect and HH are the least important predictors, and the HV metric is more important in non-forest-dominated ecozones than it is in forest-dominated ecozones.Predictors such as surface reflectance of the red band, HV backscatter, and slope are more important than other Landsat-and PALSAR-related metrics and SRTM-derived metrics.In particular, red band outperformed than other bands from Landsat, which conformed to previous study that employed time series Landsat images to map forest height in Sub-Saharan Africa [53].Differences in variables' importance between the sub zones could be explained by forest types and their structural characters in different biomes, such as crown diameter, distance between trees, tree positioning, stand age and LAI.Moreover, determining the relative importance of these predictors can help improve future computational efficiency by allowing researchers to acquire important data for nationwide or worldwide forest-height modeling.

Quantifying Uncertainty Caused by Footprints Space in Modeled Forest Heights
The uncertainty ( ) induced by the spatial gaps among GLAS footprints in predicting wallto-wall forest height varies from 0.026-4.43m, with a mean uncertainty of 0.64 m (Figure 11).This uncertainty can result in differences between in situ measured forest heights and their corresponding modeled heights, and consideration of this uncertainty reduced the RMSE from 4.40 m to 4.31 m.Forests in southeast Tibet and Taiwan, which have the greatest mean forest heights, contribute the most to this uncertainty (>1 m), whereas forests in Northeast China contribute the least (<0.5 m).When comparing the spatial patterns of forest heights and their uncertainties, it becomes apparent that uncertainty increases with increasing mean forest height in these ecozones.At the pixel scale, the relative error caused by this uncertainty (defined as the percentage of predicted average forest heights that are encompassed by a given uncertainty) has been estimated to be 3.97 ± 1.44% (Figure 11).Additionally, forest-dominated ecozones yield lower relative errors than non-forest-dominated ecozones (Figure 12).According to the standards of China's NFI, the uncertainty of a tree height measurement should not be greater than 5% [4].More than 80% (82.59%) of analyzed pixels yield a relative error of less than 5% in this finer-resolution forest height map.To further improve the accuracy of forest height measurements in China, future research should focus on regions with large uncertainties or relative errors, such as southeast Tibet.

Quantifying Uncertainty Caused by Footprints Space in Modeled Forest Heights
The uncertainty (ε GLAS ) induced by the spatial gaps among GLAS footprints in predicting wall-to-wall forest height varies from 0.026-4.43m, with a mean uncertainty of 0.64 m (Figure 11).This uncertainty can result in differences between in situ measured forest heights and their corresponding modeled heights, and consideration of this uncertainty reduced the RMSE from 4.40 m to 4.31 m.Forests in southeast Tibet and Taiwan, which have the greatest mean forest heights, contribute the most to this uncertainty (>1 m), whereas forests in Northeast China contribute the least (<0.5 m).When comparing the spatial patterns of forest heights and their uncertainties, it becomes apparent that uncertainty increases with increasing mean forest height in these ecozones.At the pixel scale, the relative error caused by this uncertainty (defined as the percentage of predicted average forest heights that are encompassed by a given uncertainty) has been estimated to be 3.97 ± 1.44% (Figure 11).Additionally, forest-dominated ecozones yield lower relative errors than non-forest-dominated ecozones (Figure 12).According to the standards of China's NFI, the uncertainty of a tree height measurement should not be greater than 5% [4].More than 80% (82.59%) of analyzed pixels yield a relative error of less than 5% in this finer-resolution forest height map.To further improve the accuracy of forest height measurements in China, future research should focus on regions with large uncertainties or relative errors, such as southeast Tibet.

Comparisons with Published Forest Heights
We compared the forest heights presented here with previous forest heights obtained by remote sensing mapping and field measurements.Nationwide field datasets have a mean tree height of ~10 m [54], which is less than the mean forest height reported in our study (16.08 m).This difference might be caused by the limited number of forest plots used in the field datasets.For example, the above nationwide field datasets have included field measurement datasets of approximately 1500 plots, from which mean forest heights are generated by applying statistics through these random samples.However, the modeled mean forest height is dominant canopy height at the pixel scale.This inconsistency causes differences between field data and the forest height generated in this study.Another study used 1 km remote sending data to calculate a mean height of 36.44 m [16], which is much larger than both our results and the nationwide field datasets.The difference between these two forest heights can be explained by the fact that this study was undertaken in the mountainous regions of southeast and northeast Tibet, whereas we used a different slope-correction for GLAS height extraction.This difference could also be caused by differences in the source and scale of the remote-sensing data.Although the differences imply that forest height mapping is still a challenge, our modeled forest heights show good agreement with independent field measurements.Future work should reduce the uncertainty of forest height mapping with more field plots and new LiDAR data.

Comparisons with Published Forest Heights
We compared the forest heights presented here with previous forest heights obtained by remote sensing mapping and field measurements.Nationwide field datasets have a mean tree height of ~10 m [54], which is less than the mean forest height reported in our study (16.08 m).This difference might be caused by the limited number of forest plots used in the field datasets.For example, the above nationwide field datasets have included field measurement datasets of approximately 1500 plots, from which mean forest heights are generated by applying statistics through these random samples.However, the modeled mean forest height is dominant canopy height at the pixel scale.This inconsistency causes differences between field data and the forest height generated in this study.Another study used 1 km remote sending data to calculate a mean height of 36.44 m [16], which is much larger than both our results and the nationwide field datasets.The difference between these two forest heights can be explained by the fact that this study was undertaken in the mountainous regions of southeast and northeast Tibet, whereas we used a different slope-correction for GLAS height extraction.This difference could also be caused by differences in the source and scale of the remote-sensing data.Although the differences imply that forest height mapping is still a challenge, our modeled forest heights show good agreement with independent field measurements.Future work should reduce the uncertainty of forest height mapping with more field plots and new LiDAR data.

Comparisons with Published Forest Heights
We compared the forest heights presented here with previous forest heights obtained by remote sensing mapping and field measurements.Nationwide field datasets have a mean tree height of ~10 m [54], which is less than the mean forest height reported in our study (16.08 m).This difference might be caused by the limited number of forest plots used in the field datasets.For example, the above nationwide field datasets have included field measurement datasets of approximately 1500 plots, from which mean forest heights are generated by applying statistics through these random samples.However, the modeled mean forest height is dominant canopy height at the pixel scale.This inconsistency causes differences between field data and the forest height generated in this study.Another study used 1 km remote sending data to calculate a mean height of 36.44 m [16], which is much larger than both our results and the nationwide field datasets.The difference between these two forest heights can be explained by the fact that this study was undertaken in the mountainous regions of southeast and northeast Tibet, whereas we used a different slope-correction for GLAS height extraction.This difference could also be caused by differences in the source and scale of the remote-sensing data.Although the differences imply that forest height mapping is still a challenge, our modeled forest heights show good agreement with independent field measurements.Future work should reduce the uncertainty of forest height mapping with more field plots and new LiDAR data.

Figure 1 .
Figure 1.Date acquisition of Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset in China in 2007, including years (left) and months (right).

Figure 1 .
Figure 1.Date acquisition of Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset in China in 2007, including years (left) and months (right).

Figure 2 .
Figure 2. Sample distribution for natural forests and plantations.Differences in slopes are indicated from light gray to dark gray.

Figure 2 .
Figure 2. Sample distribution for natural forests and plantations.Differences in slopes are indicated from light gray to dark gray.

Figure 3 .
Figure 3. Overall workflow of finer resolution Chinese forest height mapping.

Figure 4 .
Figure 4. Geographic distribution of 525 forest plots.Red indicates the code of each ecozone; their corresponding information are listed in the supporting TableS1.

Figure 4 .
Figure 4. Geographic distribution of 525 forest plots.Red indicates the code of each ecozone; their corresponding information are listed in the supporting TableS1.

Figure 5 .
Figure 5. Spatial patterns of forest heights in China (a).(b) is a subset in Greater Xiang'an Mountains that centered at 53.09°N, 123.19°E to show detailed height distribution and (c) corresponds to Landsat image (near infrared (NIR)-Red-Green).

Figure 5 .
Figure 5. Spatial patterns of forest heights in China (a).(b) is a subset in Greater Xiang'an Mountains that centered at 53.09 • N, 123.19 • E to show detailed height distribution and (c) corresponds to Landsat image (near infrared (NIR)-Red-Green).

Figure 6 .
Figure 6.Comparison of heights modeled using multiple sources of remotely sensed data and heights measured in the field.Down triangles and up triangles indicate field measured heights less than 5 m and greater than 20 m, respectively, the dotted line is 1:1 line.

Figure 7 .
Figure 7. Distribution of forest heights from remote-sensing data in 50 ecozones.Colors indicate different ecozones: dark yellow areas are desert, red are agricultural areas, light green areas are grassland, olive areas are forests, and blue areas represent a special ecozone in the Three Gorges Reservoir.Numbers in circle represent different ecozones, and the bar represents the standard deviation.The statistical information of forest height in 50 ecozones is listed in supplement TableS2.

Figure 6 . 17 Figure 6 .
Figure 6.Comparison of heights modeled using multiple sources of remotely sensed data and heights measured in the field.Down triangles and up triangles indicate field measured heights less than 5 m and greater than 20 m, respectively, the dotted line is 1:1 line.

Figure 7 .
Figure 7. Distribution of forest heights from remote-sensing data in 50 ecozones.Colors indicate different ecozones: dark yellow areas are desert, red are agricultural areas, light green areas are grassland, olive areas are forests, and blue areas represent a special ecozone in the Three Gorges Reservoir.Numbers in circle represent different ecozones, and the bar represents the standard deviation.The statistical information of forest height in 50 ecozones is listed in supplement TableS2.

Figure 7 .
Figure 7. Distribution of forest heights from remote-sensing data in 50 ecozones.Colors indicate different ecozones: dark yellow areas are desert, red are agricultural areas, light green areas are grassland, olive areas are forests, and blue areas represent a special ecozone in the Three Gorges Reservoir.Numbers in circle represent different ecozones, and the bar represents the standard deviation.The statistical information of forest height in 50 ecozones is listed in supplement TableS2.

Figure 8 .
Figure 8. Distribution of forest heights from remote sensing data between different forest types in natural forests and plantations.Box represents data within the range of median ± standard deviation (SD), the square represents the mean, the dash is the whisker within 5%~95%, and the cross represents minimum and maximum values.

Figure 8 .
Figure 8. Distribution of forest heights from remote sensing data between different forest types in natural forests and plantations.Box represents data within the range of median ± standard deviation (SD), the square represents the mean, the dash is the whisker within 5%~95%, and the cross represents minimum and maximum values.

Figure 9 .
Figure 9. Changes after combined PALSAR of uncertainty in 50 ecozones.Red represents the value of the uncertainty of change in each ecozone.

Figure 9 .
Figure 9. Changes after combined PALSAR of uncertainty in 50 ecozones.Red represents the value of the uncertainty of change in each ecozone.

Figure 10 .
Figure 10.The importance of 13 variables for forest height modeling in each ecozone.The value is the out-of-bag decreased error; the larger the number is or the darker the color is, the more important the variable is.Colors of numbers indicate different ecozones, with dark yellow signifying desert, red representing agriculture, light green representing grassland, olive indicating forest, and blue representing a special ecozone in the Three Gorges Reservoir.

Figure 10 .
Figure 10.The importance of 13 variables for forest height modeling in each ecozone.The value is the out-of-bag decreased error; the larger the number is or the darker the color is, the more important the variable is.Colors of numbers indicate different ecozones, with dark yellow signifying desert, red representing agriculture, light green representing grassland, olive indicating forest, and blue representing a special ecozone in the Three Gorges Reservoir.

Figure 11 .
Figure 11.Uncertainty of forest heights in China (left) and the relative error pattern (right).

Figure 12 .
Figure 12.Distribution of uncertainty (left) and relative error (right) values in 50 ecozones.Colors indicate different ecozones: dark yellow represents deserts, red is agricultural areas, light green is grasslands, olive indicates forests, and blue is special ecozone in the Three Gorges Reservoir.Numbers in circle represent the different kinds of ecozones, and the bar represents the standard deviation.

Figure 11 . 17 Figure 11 .
Figure 11.Uncertainty of forest heights in China (left) and the relative error pattern (right).

Figure 12 .
Figure 12.Distribution of uncertainty (left) and relative error (right) values in 50 ecozones.Colors indicate different ecozones: dark yellow represents deserts, red is agricultural areas, light green is grasslands, olive indicates forests, and blue is special ecozone in the Three Gorges Reservoir.Numbers in circle represent the different kinds of ecozones, and the bar represents the standard deviation.

Figure 12 .
Figure 12.Distribution of uncertainty (left) and relative error (right) values in 50 ecozones.Colors indicate different ecozones: dark yellow represents deserts, red is agricultural areas, light green is grasslands, olive indicates forests, and blue is special ecozone in the Three Gorges Reservoir.Numbers in circle represent the different kinds of ecozones, and the bar represents the standard deviation.