SRTM DEM Correction in Vegetated Mountain Areas through the Integration of Spaceborne LiDAR , Airborne LiDAR , and Optical Imagery

The Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) is one of the most complete and frequently used global-scale DEM products in various applications. However, previous studies have shown that the SRTM DEM is systematically higher than the actual land surface in vegetated mountain areas. The objective of this study is to propose a procedure to calibrate the SRTM DEM over large vegetated mountain areas. Firstly, we developed methods to estimate canopy cover from aerial imagery and tree height from multi-source datasets (i.e., field observations, airborne light detection and ranging (LiDAR) data, Geoscience Laser Altimeter System (GLAS) data, Landsat TM imagery, climate surfaces, and topographic data). Then, the airborne LiDAR derived DEM, covering ~5% of the study area, was used to evaluate the accuracy of the SRTM DEM. Finally, a regression model of the SRTM DEM error depending on tree height, canopy cover, and terrain slope was developed to calibrate the SRTM DEM. Our results show that the proposed procedure can significantly improve the accuracy of the SRTM DEM over vegetated mountain areas. The mean difference between the SRTM DEM and the LiDAR DEM decreased from 12.15 m to −0.82 m, and the standard deviation dropped by 2 m. OPEN ACCESS Remote Sens. 2015, 7 11203


Introduction
A Digital Elevation Model (DEM) is the three-dimensional representation of a terrain surface [1].It is an essential input for diverse applications in geographical analyses [2][3][4], hydrological modellings [5][6][7], ecological studies [8][9][10], and forest management [11,12], etc.The Shuttle Radar Topography Mission (SRTM) took a major step forward in the generation of the unprecedented near-global DEM product in fine spatial resolution [13].Originally, it provided 3 arc second resolution DEMs for the Earth's land surface from 56°S to 60°N and 1 arc second resolution DEMs for the United States (U.S.).In late 2014 and early 2015, the U.S. National Aeronautics and Space Administration (NASA) released an enhanced version of the SRTM DEM product, which provides 1 arc second resolution DEMs globally.Its remarkable fine spatial resolution made it become one of the most frequently used DEM products.
The accuracy of the SRTM DEM product is widely studied.Before the release of the dataset, the Jet Propulsion Laboratory (JPL) has carefully assessed and calibrated the products using continental-scale kinematic Global Position System (GPS) transects, corner reflector arrays, ocean data takes, and the National Geospatial-Intelligence Agency (NGA) and JPL ground control points [13,14].Moreover, there have been many published works focusing on assessing the accuracy of the released SRTM DEM products either globally or locally using independent ground elevation datasets, such as kinematic GPS transects, NGA ground control points, self-collected GPS data, SPOT5 DEM, and hellebore DEM [15][16][17][18][19].However, due to the challenges of conducting geodetic survey in vegetated mountain areas (e.g., the inaccessibility and the limitation of acquiring GPS signal), these studies concentrated mainly on areas with little vegetation cover and low terrain relief.The accuracy of the SRTM DEM over vegetated mountain areas may be more complex than that over non-vegetated and low-relief terrains, because the absorption and reflection effects of leaves, branches and trunks may prevent the SRTM radar phase center from reaching the land surface [20].
Light detection and ranging (LiDAR), an active remote sensing technique, provides the opportunity to evaluate the accuracy of the SRTM DEM over vegetated mountain areas.It can penetrate the forest canopy effectively due to the use of a focused short wavelength laser pulse.There have been numerous studies proving that LiDAR data can be used to generate fine-resolution DEMs (from 50 cm to tens of meters) of a vegetated mountain terrain with high accuracy [21][22][23][24][25].These highly accurate LiDAR-derived DEMs provide great potential for assessing the accuracy of the SRTM DEM in vegetated mountain environments.Carabajal & Harding (2005, 2006) [26,27] and Bhang et al. (2007) [28] used the Geoscience Laser Altimeter System (GLAS) onboard the Ice, Cloud and land Elevation Satellite (ICESat) to evaluate the SRTM DEM over vegetated areas, and found that the SRTM elevation was located somewhere between the actual land surface and canopy top.Hofton et al. (2006) [29] found a similar result by comparing the SRTM DEM with the Laser Vegetation Imaging Sensor data.Su & Guo (2014) [30] used nine airborne LiDAR flights across the U.S. to evaluate the performance of the SRTM DEM, and found that the SRTM DEM was around 10-20 m systematically higher than the actual ground in vegetated mountain areas.They moved a further step forward to correct the SRTM DEM by using accurate airborne LiDAR-derived vegetation products, and found that a regression model between the SRTM DEM error and vegetation products (e.g., tree height and leaf area index) can be used to significantly improve the SRTM DEM accuracy.However, currently, the availability of airborne LiDAR data is limited due to the high LiDAR flight mission cost.How to correct the SRTM DEM in areas without airborne LiDAR coverage is still a question that needs to be addressed.
Recently, the fusion of multisource datasets has shown the capability of mapping forest structure parameters in large spatial scale.The ICESat/GLAS, the only available spaceborne LiDAR system, provides full-waveform LiDAR data coverage in global scale.However, its footprints are spaced at 170 m along tracks and tens of kilometers across tracks, which are too sparse to generate fine-resolution DEMs and vegetation products.Conventional optical remote sensing data provide continuous land surface observations, e.g., the 30 m Landsat TM imagery and the 250-1000 m MODIS imagery.However, due to their limited penetration capabilities in forested areas, the vegetation structure parameters retrieved from them are usually fraught with uncertainty issues [31,32].By taking advantages of GLAS data and optical imagery, there have been studies that mapped the forest vegetation structures (e.g., tree height and aboveground biomass) in large spatial scale [33][34][35][36].Unfortunately, most of these current products are too coarse to be used to correct the SRTM DEM directly.
In this study, we proposed a method to correct the SRTM DEM over a vegetated mountain area in the Sierra Nevada, California, U.S. through the combination of multi-source remotely sensed datasets at different spatial resolutions, including spaceborne LiDAR data, airborne LiDAR data, Landsat TM imagery, airborne imagery, and climate surfaces (i.e., annual mean temperature, annual temperature seasonality, annual total precipitation, and annual precipitation seasonality).Firstly, algorithms were developed to estimate the forest structure parameters (i.e., tree height and canopy cover) from multi-source datasets.Then, a regression model was built to correct the SRTM DEM using the airborne LiDAR-derived DEM and obtained forest structure parameters.The corrected SRTM DEM was evaluated by independent airborne LiDAR-derived elevations and GLAS land surface elevations.

Study Area and Datasets
In this study, seven datasets, including forest inventory data, SRTM DEM, ICESat/GLAS data, airborne LiDAR data, Landsat TM imagery, and climate surfaces, were collected.Table 1 ( [37]) shows the general characteristics and data source of each dataset.Detailed descriptions of each dataset will be presented in the following sections.

Study Area
The study site is within the Plumas and Lassen National Forest, located in the northern Sierra Nevada, California, U.S. (Figure 1).It covers an area of 5022 km 2 .Its elevation ranges from 479 m to over 2500 m, and the average elevation is 1640 m.The slope of the study site can reach to 73°, and the average slope is 13°.The study site is dominated by mixed conifer forest, including incense cedar (libocedrus decurrens), lodgepole pine (pinus contorta), sugar pine (pinus lambertiana), ponderosa pine (pinus ponderosa), jeffrey pine (pinus jeffreyi), grey pine (pinus sabiniana), western juniper (juniperus occidentalis), Douglas-fir (pseudotsuga menziesii), California red fir (abies magnifica), and white fir (abies concolor).The major hardwoods within the mixed conifer stands are black oak (quercus kelloggii) and canyon live oak (quercus chrysolepis).

In-Situ Data
Within the study area, 239 plot measurements were obtained by the U.S. Forest Service based on the national forest inventory and analysis protocol [38].These plots were originally set to study the behavior of California spotted owl (Strix occidentalis occidentalis), an endangered wildlife species in the Sierra Nevada.Each plot location was randomly selected and had to be potentially suitable for the owl to nest or forage.Within the airborne LiDAR footprint, plots were densified to generate and validate vegetation products from airborne LiDAR data.Each plot was composed of four sub-plots (956 in total), which were 14.62 m in diameter.In each sub-plot, tree height and diameter at the breast height (DBH) for each individual tree with a DBH lager than 12.7 cm were measured in the growing season between 2004 and 2009 using a laser range finder and diameter tape, respectively.Note that all computations and assessments in this study were performed at the sub-plot level.The elevation occupied by these subplots ranges from 940 m to 1950 m, and the mean elevation of all subplots is 1476 m; the slope ranges from near 0° to over 44°, and mean slope is 15°.

SRTM Data
The 11-day SRTM was a joint mission conducted by the NASA and NGA, which was flown in February 2000.It utilized a 225 km wide swath from an altitude of 233 km above the Earth.A radar interferometer operating in C-band (with a wavelength of 5.6 cm) was used to generate the globally consistent DEM products, covering approximately 99.97% of the Earth's land surface from 56°S to 60°N.The 1 arc second (often quoted as 30 m) SRTM global DEM has been released in late 2014 and early 2015, and is available at the website of U.S. Geological Survey (USGS) [39].The horizontal datum of the SRTM data is WGS84, and the vertical datum is the mean sea level determined by the WGS84 Earth gravitational model geoid.In this study, to keep in accordance with all other datasets, the horizontal coordinates of the collected SRTM DEM was projected to UTM Zone 10 projection based on the ellipsoid of NAD1983, and the vertical coordinates were projected to NAVD88.The terrain slope was calculated from the SRTM DEM using the ESRI TM ArcGIS software.It should be noted that all slope data mentioned hereafter refer to the SRTM DEM-derived slope product.

Spaceborne LiDAR Data
GLAS onboard the NASA ICESat was a full-waveform LiDAR sensor, which was first launched in January 2003, and operated for seven years before being retired in February 2010.It was designed for measuring ice sheet elevations and changes in elevation through time, and obtaining global-scale cloud and aerosol height profiles, land elevation and vegetation cover, and sea ice thickness.However, these measurements were not spatially consistent.Its footprints had a nominal diameter of 65 m, and were spaced at 170 m along tracks and tens of kilometers across tracks.
In this study, we collected all available GLA01, GLA06 and GLA14 products from the ICESat/GLAS data pool [40].The GLA01 product provided the full-waveform measurements, the GLA06 product provided the geolocation and data quality for each full-waveform record, and the GLA14 product provided the land surface elevation information of each footprint.These three products were linked together using the unique ID and shot time of each laser pulse.For GLAS footprints within the study area, four filtering criteria were used to ensure the quality of GLAS measurements: (1) should be obtained under cloud-free condition; (2) should have no saturation effect; (3) should have high signal to noise ratio; and (4) should not be significantly higher (i.e., 100 m) than the land surface elevation described by the SRTM DEM.There were finally 13,716 GLAS full-waveform records left within the study site (Figure 1).For each GLAS record, four parameters were extracted from the GLAS products, including land surface elevation, waveform extent, leading edge extent, and trailing edge extent.The land surface elevation was directly obtained from the GLA14 product.The waveform extent, leading edge extent and trailing edge extent were computed from the full-waveform information.These three parameters have been proved to be highly correlated to canopy height, canopy height variability and terrain slope [33,[41][42][43][44]. Definitions of these three parameters can be found in Lefsky et al. (2007) [42], and will not be discussed in detail here.Note that the vertical datum of GLAS data is TOPEX/Poseidon, which is normally 71 cm higher than the commonly used WGS84 ellipsoid.In this study, all GLAS elevations were first subtracted by 71 cm, and then projected to NAVD88 using GEOID12B from National Geodetic Survey.

Airborne LiDAR Data
The airborne LiDAR data was collected at the bottom left of the study site (Figure 1); it covered about 5% (256.57km 2 ) of the total study area.The data was acquired between July and August 2009, using a Leica ALS50 II laser system mounted on a Cessna Caravan airplane.The LiDAR sensor allowed up to four range measurements per laser pulse, and the sensor scan angle was ±14° from nadir.To reduce the laser shadowing and increase LiDAR point density, each flight line was surveyed with an opposing flight line with a side-lap of ≥50%.The obtained LiDAR point cloud density ranged from 4.0-10.2pt/m 2 with a mean of 5.7 pt/m 2 .Ground control points, which were surveyed using a real-time kinematic GPS, were set to evaluate the positioning accuracy of the LiDAR data.The obtained horizontal and vertical accuracy of the LiDAR data were 31 cm and 3.5 cm, respectively.
Canopy quantile metrics, DEM, and canopy cover products for the airborne LiDAR footprint were directly computed from the airborne LiDAR data.Canopy quantile metrics, representing the height below X% of the LiDAR point cloud, are frequently used LiDAR products to estimate forest parameters that cannot be obtained directly from LiDAR point cloud [45,46].In this study, 11 quantile metrics, i.e. 0%, 1%, 5%, 10%, 25%, 50%, 75%, 90%, 95%, 99%, and 100%, were computed from the LiDAR point cloud at a resolution of 30 m.The DEM (at 30 m resolution) within the airborne LiDAR footprint was interpolated from the LiDAR ground returns using ordinary kriging, which has been proved to be more accurate than other interpolation schemes (e.g., inverse distance weighted and spline) for interpolating DEM from LiDAR-derived elevation points [21,23,47].The canopy cover (at 30 m resolution) was computed using a canopy height model based method, which has been proven reliable and consistent for estimating canopy cover from LiDAR data [48].

Aerial Imagery
The National Agriculture Imagery Program (NAIP) aerial imagery taken in 2009 was used to generate the canopy cover product in the study area.The NAIP imagery is at 1 m resolution, and composed by blue band, green band, red band and near-infrared (NIR) band.The NAIP program is run by the Farm Service of U.S. Department of Agriculture for the purpose of making high-resolution digital orthographies available to maintain common land units.All NAIP images were taken under permitted weather conditions, and followed the specification of no more than 10% cloud cover per quarter quad tile.The Aerial Photography Field Office (APFO) has adjusted and balanced the dynamic range of each image tile to full range of Digital Number (DN) value (0-255), and orthorectified each image file using National Elevation Dataset before releasing the data [49].

Landsat TM Imagery
The Landsat 5 TM data were collected from the USGS website for the study site during the growing season of 2009 [50].In total, nine scenes of surface reflectance data, generated using 6S (Second Simulation of a Satellite Signal in the Solar Spectrum) radiative transfer models [51], were downloaded for the tree height mapping procedure.Visual examination was performed on each scene to ensure that there was less than 5% cloud or snow coverage in the image.The cumulative normalized difference vegetation index (NDVI) and maximum value composite (MVC) were calculated from the time-series TM imagery.The use of cumulative NDVI can increase the estimation accuracy of forest parameters compared with the use of NDVI data from a single time period [52].MVC imagery has been demonstrated to be highly related to green-vegetation dynamics [53], and is capable of minimizing the problems common to single-date optical imagery, such as cloud contamination, atmospheric attenuation, surface directional reflectance, and view and illumination geometry [53].

Climate Surfaces
Four climate surfaces, namely annual mean temperature, annual temperature seasonality, annual total precipitation, and annual precipitation seasonality between 1950 and 2000 were calculated within the study area at 30 m resolution.The monthly mean temperature and monthly total precipitation across the western U.S. created by Alvarez et al. (2014) [37] were used to generate these four climate surfaces.For each year, the mean temperature and total precipitation were directly calculated from the monthly layers, and the temperature seasonality and precipitation seasonality were computed from the following equation: where Seasonality represents the temperature seasonality or precipitation seasonality of the corresponding year, and SDmonthly and Meanmonthly are the standard deviation and mean of the monthly mean temperature (in Kelvin) or monthly total precipitation (in millimeter) for the corresponding year.The annual mean temperature, annual temperature seasonality, annual total precipitation, and annual precipitation seasonality were derived from the averages of corresponding yearly products.

Methodology
Generally, the procedure of the SRTM DEM correction method can be divided into three modules (Figure 2): (1) develop methods to estimate vegetation products (i.e., canopy cover and tree height) from multi-source datasets; (2) build the SRTM DEM correction model within the airborne LiDAR footprint; (3) estimate the SRTM DEM error and therefore correct the SRTM DEM across the study site.The detailed procedures and methods to estimate vegetation parameters and correct the SRTM DEM will be described in the following sections.

Canopy Cover Estimation Procedure
The canopy cover for the whole study site was calculated from aerial imagery using a classification method.The NAIP imagery was classified into two land cover types, tree and non-tree, using the Support Vector Machine algorithm.The training samples were randomly selected, and the attribute (tree/non-tree) of each sample was determined by visual examination.Considering the similarities in the spectral reflectance characteristics of forest canopies and grass [54], we further extracted seven texture layers (including mean, variance, homogeneity, contrast, dissimilarity, entropy and second moment) from each spectral band using the gray-level co-occurrence matrix (GLCM) filtering method, and included them in the classification procedure.Because grasslands usually have a smoother texture than trees, the use of these texture parameters was expected to reduce the chance of misclassification between trees and grass [55,56].Procedures on how to calculate the GLCM matrix and extract texture parameters from the GLCM matrix can be found in Haralick et al. (1973) [57], and will not be discussed in detail here.The accuracy of the classification result was evaluated by independent and randomly-selected samples that were different from training samples.Finally, to calculate the canopy cover, the classification result in 1 m resolution was overlaid with a 30 m grid.The canopy cover for each 30 m × 30 m cell was calculated as the percentage of tree pixels within each corresponding cell.Since there was no field observed canopy cover, the estimated canopy cover product was evaluated by the airborne LiDAR-derived canopy cover, considering the strong capability of airborne LiDAR in estimating canopy forest structures [48,58].

Tree Height Estimation Method
In this study, the tree height distribution across the whole study site was calculated through the combination of multi-source datasets.It should be noted that the tree height mentioned hereafter refers to the Lorey's height.For the field measurements, the Lorey's height at the sub-plot scale (HL) was calculated from individual tree height (h) and basal area (b) using the following equation, where i is the i th tree within a sub-plot, and n is the total number of trees in a sub-plot.
Figure 3 shows the procedure to estimate tree height from multi-source datasets.Within the airborne LiDAR footprint, we firstly estimated the tree height distribution from canopy quantile metrics using the step-wise regression method.Then, this airborne LiDAR derived tree height product was used to spatially link with GLAS parameters in the airborne LiDAR footprint, and therefore build a regression model between the airborne LiDAR-derived tree height and GLAS parameters.Lefsky et al. (2007) [42] found that GLAS waveform extent and leading edge extent were linearly correlated with the airborne LiDAR-derived tree height.As can be seen in Figure 4, the R 2 for correlations between tree height and waveform extent and between tree height and leading edge extent were both higher than 0.5.Moreover, GLAS parameters, especially waveform extent and trailing edge extent, were influenced by terrain relief [42].Therefore, the multiple linear regression method was used to build the correlation among airborne LiDAR-derived tree height, GLAS parameters and slope [41,42].This regression model was applied to estimate tree heights for all GLAS footprints within the whole study site.The obtained GLAS tree heights were used as a training dataset to extrapolate the tree height estimations to the whole study site.In this study, the regression tree method Random Forest (RF) [59] was used to estimate the tree height distribution from 13 predictors, i.e., six Landsat TM MVC bands (excluding the thermal band), cumulative NDVI, four climate surfaces, elevation, and slope.The RF algorithm was selected because there have been studies showing that it outperformed other parametric regression methods in estimating forest structures from GLAS parameters [31,36,43].The RF extrapolation method was implemented using the randomForest R package [60], which included both classification and regression functions.We used manual iterative examination method to determine the optimized parameters to construct the "forest", i.e., number of trees and number of variables tried at each split [60].The number of trees and number of variables tried at each split were increased by one iteratively, and the values that produced the minimum mean-squared error were selected to extrapolate the GLAS tree height measurements.Here, based on manual iterative examination, 500 "RF trees" were included and four variables were tried at each split.The significance of variables used in the extrapolation process was evaluated by the percentage increase in the mean-squared error (%IncMSE) and the increase in node purity (IncNodePurity) [60].Finally, the canopy cover data obtained from Section 3.1 was used to mask the initial tree height estimation.Tree heights at areas without tree coverage (canopy cover equaled to zero) were assigned as zero meter in the final tree height product.The estimated tree height product across the whole study area was assessed by field plot measurements.

SRTM DEM Correction Method
Su & Guo et al. (2014) [30] found that a multiple linear regression model of the SRTM DEM error depending on vegetation parameters and terrain slope can be used to correct the SRTM DEM efficiently.However, their regression model was developed based on airborne LiDAR-derived vegetation parameters, and could not be applied to areas without airborne LiDAR coverage.In this study, we developed a similar regression method to correct the SRTM DEM based on above-mentioned vegetation products obtained from multi-source datasets.
Firstly, we evaluated the SRTM DEM accuracy using the LiDAR DEM.For each SRTM DEM pixel within the airborne LiDAR coverage, we calculated differences between the SRTM DEM and the LiDAR DEM.Then, two-thirds of these pixels were randomly selected to build the SRTM DEM calibration model.A multiple linear regression model of the SRTM DEM error depending on the estimated canopy cover and tree height products and terrain slope derived from the SRTM DEM was developed based on these randomly selected training pixels.Finally, this regression model was applied to the whole study site to estimate the SRTM DEM error from the estimated canopy cover, tree height, and terrain slope and therefore calibrate the SRTM DEM.The accuracy of the corrected SRTM DEM was assessed by the remaining one-third LiDAR DEM pixels and GLAS elevations.

Accuracy Assessment
The accuracy of the estimated canopy cover and tree height products and the corrected SRTM DEM was evaluated using coefficient of determination (R 2 ) and root-mean-squared error (RMSE), which can be computed by the following equations, 2 ) ) where n is the number of samples, xi is the i th original value, i is the i th modeled value, and ̅ is the mean of all original values.Besides, the test statistic was taken for all coefficients and constants obtained from multiple linear regression analysis at the significance level (α) of 0.05 or 0.01.The null hypothesis for the test statistic was that the obtained coefficients and constants should be zero.If the probability of obtaining this null hypothesis (known as p-value, denoted as p) was smaller than the significance level, the null hypothesis was rejected and the obtained coefficients were recognized as statistically significant.

Canopy Cover Estimation
Figure 5a shows the estimated canopy cover distribution in the study area.The average canopy cover in the study area is 53% and the standard deviation is 26%.The canopy cover is generally higher in the eastern part of the study site than in the western part.Moreover, as can be seen in Figure 5a, there is a "banding" effect associated with the aerial imagery flight lines.The airborne LiDAR-derived canopy cover was used to evaluate the accuracy of the estimated canopy cover product.The R 2 between the estimated canopy cover and airborne LiDAR-derived canopy cover is as high as 0.71, and the mean absolute difference is about 12%.The mean and standard deviation of the canopy cover from airborne LiDAR data is 60.9% and 27.0%, which are 2% lower and 4% higher than those from the estimated canopy cover product, respectively.As can be seen in Figure 5b, the estimated canopy cover is slightly higher than the LiDAR-derived canopy cover in all canopy cover groups.Both the mean and median of differences between the estimated canopy cover and the airborne LiDAR-derived canopy cover are closer to zero with the increase of canopy cover, and the range of differences decreases with the increase of the canopy cover.(b) Boxplot of differences between the airborne LiDAR-derived canopy cover and the estimated canopy cover within the airborne LiDAR footprint (airborne LiDAR-derived canopy cover minus estimated canopy cover).The blue "+" indicates the mean difference of the corresponding group.Numbers 0.1-1.0 on x axis represent the airborne LiDAR-derived canopy cover group of 0-0.1, 0.1-0.2,0.2-0.3,0.3-0.4,0.4-0.5, 0.5-0.6,0.6-0.7,0.7-0.8,0.8-0.9, and 0.9-1.0,respectively.

Tree Height Estimation
In this study, four LiDAR canopy quantile metrics, including 25%, 50%, 75% and 100%, were selected by the step-wise regression model to calculate tree height from airborne LiDAR data.The R 2 between these four canopy quantile metrics and field observations is 0.70.This airborne LiDAR-derived tree height was linked with GLAS parameters, and a multiple linear regression model was developed to calculate tree height from GLAS parameters, 2 0.17 0.41 0.01 15.43 tan( ) 15.49 ( 0.69, 0.05) where THGLAS represents the GLAS-derived tree height, WE represents the waveform extent, LE represents the leading edge extent, TE represents the trailing edge extent, and S represents the terrain slope.This equation was applied to all GLAS footprints within the study area to calculate tree height at the GLAS footprints.Finally, a RF regression model was built to extrapolate the tree height estimations from GLAS footprints to the whole study area.The RF regression model built from the selected multi-source datasets can explain 72% of the variances in tree height.Figure 6 shows the importance of variables in the built RF regression model.As can be seen, regardless of whether the evaluation was performed using %IncMSE or IncNodePuri, the absence of annual mean temperature, elevation, annual temperature seasonality, slope and cumulative NDVI significantly increased the mean-squared error and node purity, indicating a decrease in the prediction accuracy of the built RF regression tree model.The estimated tree height distribution is shown in Figure 7a.As can be seen, the tree height is generally higher in the southern part of the study site than in the northern part.It ranges from 0-55.5 m, with a mean of 18.3 m and a standard deviation of 11.8 m.The field plot measurements were used to evaluate the accuracy of the tree height estimation.As shown in Figure 7b, the R 2 between the estimated and field-measured tree height is 0.63, and the RMSE is 5.8 m.The fitted line between the predicted and field measured tree height is close to the 1:1 line.However, the proposed tree height estimation method still overestimated tree height with relatively low values (<15 m) and underestimated tree height with relatively high values (>40 m).

SRTM DEM Correction
The accuracy of the SRTM DEM was evaluated by comparing with the LiDAR DEM.As can be seen in Figure 8, the SRTM DEM is over 12 m higher than the LiDAR DEM on average, and the highest difference can reach to over 60 m.The standard deviation of differences between the SRTM DEM and the LiDAR DEM is around 11 m.Over 88% of the SRTM DEM pixels are higher than the LiDAR DEM.The systematic positive difference between the SRTM DEM and the LiDAR DEM further proves that the SRTM radar phase center can hardly reach the ground in vegetated mountain areas.The distribution of the SRTM errors is highly correlated with vegetation and terrain conditions (Table 2).Specifically, the SRTM error increases rapidly with tree height.The mean difference between the SRTM DEM and the LiDAR DEM (SRTM DEM minus LiDAR DEM) for the group with trees higher than 40 m is almost eight times as large as that for the group with trees lower than 5 m.The standard deviation of differences also increases with the tree height, but not as significantly as the mean difference.Canopy cover has a similar effect on the SRTM error.The mean difference between the SRTM DEM and the LiDAR DEM for the group with canopy cover higher than 90% is over eight times larger than that for the group with canopy cover lower than 10%, and the standard deviation is about 3 m higher.Compared to the influence of tree height and canopy cover, slope has no significant influence on the mean difference between the SRTM DEM and the LiDAR DEM.However, it has more pronounced influence on the standard deviation of differences between the SRTM DEM and the LiDAR DEM.As can be seen in Table 2, the standard deviation has an over 15 m increase from the group with slope lower than 5° to the group with slope higher than 40°.a "Mean" represents the mean difference between the SRTM DEM and the LiDAR DEM; b "STD" represents the standard deviation of differences between theSRTM DEM and the LiDAR DEM; c "N" represents the the number of pixels in each group.
As mentioned in Section 3.3, two thirds of the LiDAR DEM were randomly selected to build the regression model of the SRTM DEM error dependent on tree height, canopy cover and slope.The built regression equation is described as follows, 2 0.367 9.43 2.40 tan( ) 1.039 ( 0.52, 0.01) where ε represents the SRTM DEM error, TH represents the tree height, CC represents the canopy cover, and S represents the terrain slope.The R 2 for the built regression model is 0.52, and all coefficients in the model are statistically significant (p = 0.000 at α = 0.01).This model was then applied to the whole study area to compute the SRTM DEM errors, and therefore calibrate the SRTM DEM.The corrected SRTM DEM was evaluated by the remaining one third of the LiDAR DEM pixels and GLAS elevations, respectively.As can be seen in Figure 9a, the corrected SRTM DEM shows a very good correspondence with the LiDAR DEM.The mean difference between the corrected SRTM DEM and the LiDAR DEM has decreased significantly, which is very close to 0 m.The standard deviation of differences between the corrected SRTM DEM and the LiDAR DEM also has a ~2 m drop, compared to the original SRTM DEM.The corrected SRTM DEM also shows a good correspondence with GLAS elevations (Figure 9b).However, the corrected SRTM DEM is about 2.5 m lower than the GLAS elevation on average.The standard deviation of differences between the corrected SRTM DEM and GLAS elevations is almost identical with that between the corrected SRTM DEM and the LiDAR DEM.

Accuracy of Estimated Vegetation Parameters
In this study, canopy cover and tree height products were estimated to correct the SRTM DEM.The estimated canopy cover and tree height products showed good correspondences with either corresponding airborne LiDAR products or field measurements.However, systematic biases remain in the products.As can be seen in Figure 5b, the estimated canopy cover is systematically higher than the airborne LiDAR-derived canopy cover product, and the bias decreases with the increase of canopy cover.This result is consistent with previous studies [61,62], and may be caused by the misclassification of grass and low shrubs as trees in the aerial imagery classification procedure.At an individual pixel level, forest canopies, grass and low shrubs have very similar spectral reflectance characteristics [54].
Although the use of texture information can help to reduce the misclassification rate [55,56], it may not be used to totally remove grass and low shrubs from the tree group.The increased proportion of "tree" given by the misclassification of grass and low shrubs in areas with few trees tends to elevate canopy cover relative to that calculated from airborne LiDAR.With the increase of canopy cover, the possibility of misclassification may become smaller because grass and low shrubs are blocked by the forest canopy in aerial imagery, and therefore the estimated canopy cover is closer to the airborne LiDAR-derived canopy cover.Moreover, there is a noticeable "banding" effect in the estimated canopy cover product associated with flight lines (Figure 5a), which is possibly caused by the non-linear color-balancing problem of the NAIP imagery.The NAIP images were acquired at different time and under different weather conditions, which led the images from different flight lines to have dynamic ranges of DN values [49].Although the APFO has adjusted and balanced the images into the same range (0-255) before releasing the data, the priority goal of the process was to make the images look "pleasant" instead of radiometrically correcting the images, because the NAIP imagery was primarily for public use.Using other high resolution imagery (e.g., QuickBird and WorldView) or performing a systematic radiometric correction on the NAIP imagery may be helpful to solve this "banding" effect.
As for tree height, the estimated product inclined to slightly overestimate the tree heights in areas with small trees and underestimate the tree heights in areas with big trees (Figure 7b).The overestimation of tree heights in areas with low vegetation may also be related to the existence of grass.In this study, we used spectral reflectance characteristics from Landsat TM as a data source to extrapolate tree height estimations from GLAS footprints to spatially continuous layers.Similar to aerial imagery, Landsat TM imagery also has similar spectral reflectance characteristics between grass and trees [63].Although GLAS parameters can be used to estimate forest structures accurately within GLAS footprints, the similarity in the spectral reflectance between grass and trees may overestimate the height of grass during the extrapolation process, and therefore increase the overall tree height estimation in areas with low tree coverage.Moreover, because of the limited penetration capability in vegetated areas, estimating forest structure parameters from Landsat TM imagery usually suffered from saturation effect [64,65].The use of other ancillary datasets, e.g., climate surfaces and topographic data, can help to reduce the saturation effect in the extrapolation process, but may not be used to totally eliminate its influence on the result.Additionally, the in-situ measurements are densified in the airborne LiDAR footprint (Figure 1), and this may lead to a biased accuracy assessment result because they are less representative of the whole region.To address this issue, we further evaluated the estimated tree height product only using the plot measurements outside of the airborne LiDAR footprint.The result shows that the obtained R 2 and RMSE are 0.61 and 6.2 m, respectively, which are very close to those obtained from the accuracy assessment result using all plot measurements.

Comparisons among SRTM DEM, GLAS Elevations, LiDAR DEM and Corrected SRTM DEM
The SRTM DEM was found to be systematically higher than the actual land surface in vegetated mountain areas, which agrees with previous studies [26][27][28]30].Different vegetation conditions and topographic conditions have a significant influence on the SRTM DEM error (Table 2).Specifically, the SRTM DEM error tends to be higher in areas with tall and dense forest.This may be caused by the absorption and reflection effects of forest canopy to the SRTM radar signal.The topographic conditions have more influence on the uncertainty of the SRTM DEM bias.As can be seen Table 2, the standard deviation of differences between the SRTM DEM and the LiDAR DEM increases rapidly with the slope.Su & Guo (2014) [30] argued that the influence of slope on the SRTM DEM error depends on other factors, i.e., terrain aspect, SRTM radar signal azimuth, and SRTM radar signal incidence angle.A rough terrain can either increase or decrease the distance needed for the SRTM radar signal to reach the ground based on different combinations of terrain aspect, SRTM radar signal azimuth and SRTM radar signal incidence angle.A larger slope can increase the variation of this distance and therefore increase the uncertainty of the SRTM DEM bias.Moreover, in this study, the slope product used to correct the SRTM DEM was derived from the SRTM DEM itself, and the SRTM DEM error might propagate to the slope product.However, the SRTM DEM error is a systematic bias in vegetated mountain areas, and may not significantly change the relative terrain relief.Therefore, the derived slope product may still reflect the terrain changes of adjacent areas.As mentioned, the terrain slope can influence the variability of the SRTM DEM bias.By including the SRTM DEM-derived slope into the correction procedure, the standard deviation of differences between the corrected SRTM DEM and the LiDAR DEM dropped by two meters approximately, which further indicates that the SRTM DEM-derived slope may be a valid parameter to correct the SRTM DEM.
Outside the airborne LiDAR footprint, the corrected SRTM DEM was evaluated based on the GLAS elevations.As can be seen in Figure 9b, the corrected SRTM DEM was found to be around 2.5 m lower than the GLAS elevations.This may be caused by the fact that the GLAS elevations overestimated ground elevations in vegetated areas [66].In this study, we further compared the GLAS elevations with the LiDAR-derived DEM and the original SRTM DEM within the airborne LiDAR footprint.As shown in Figure 10, GLAS elevations are about 6 m higher than the LiDAR DEM and about 8 m lower than the SRTM DEM.Although the accuracy of the GLAS elevations is higher than the SRTM DEM, they are still systematically higher than the actual ground.The GLAS elevation error is also influenced by the vegetation conditions.Unlike the SRTM DEM error, GLAS elevation errors are not linearly influenced by the vegetation conditions.As can be seen in Table 3, in groups with tree height lower than 20 m or canopy cover lower than 50%, the mean biases of GLAS elevations are almost identical, which are around 3.5 m higher than the LiDAR DEM.However, in groups with tree height higher than 20 m or canopy cover higher than 50%, the mean biases of GLAS elevations increase significantly.This indicates that the GLAS laser pulse may still reach the land surface unless the forest density or forest height reach to a certain level.Moreover, the terrain slope has no significant influence on the mean difference of GLAS elevation errors.As shown in Table 3, the mean difference between GLAS elevations and the LiDAR DEM stays stable among different slope groups, except the group with slope higher than 30°.The large GLAS elevation bias for the group with slope higher than 30° may be caused by a relatively smaller number of samples.Besides, neither vegetation condition nor terrain slope has a significant influence on the standard deviation of GLAS elevation errors, which stays around 4-5 m for all groups (Table 3).a "Mean" represents the mean difference between GLAS elevations and the LiDAR DEM; b "STD" represents the standard deviation of differences between GLAS elevations and the LiDAR DEM; c "N" represents the the number of pixels in each group.
The SRTM DEM correction procedure shows great potential to correct the SRTM DEM in large spatial coverage.The mean difference between the corrected SRTM DEM and the LiDAR DEM is very close to zero (Figure 9a), indicating that the systematic bias of the SRTM DEM can be removed by the proposed method.By collecting more field observations and airborne LiDAR measurements covering different forest types, the proposed method in this study may even have the possibility to provide an improved version of the SRTM DEM in global scale.Hansen et al. (2013) [67] provided a global scale high-resolution forest cover product using the Earth observation satellite data.Simard et al. (2011) [36] and Lefsky (2010) [33] showed the possibility to estimate tree height distribution in global scale through the combination of MODIS data and GLAS data.Considering the availability of GLAS data, Landsat TM imagery and other ancillary datasets in global scale, it is possible to produce a global-scale tree height map in fine resolution.Moreover, Su & Guo et al. (2014) [30] demonstrated that a SRTM correction model based on one forest type is also applicable to areas with similar forest type coverage.Therefore, if we can gather airborne LiDAR data to cover more forest types, individual SRTM correction models can be developed for each forest type and therefore an improved global-scale SRTM DEM product may be achieved.

Conclusions
The SRTM DEM is systematically higher than actual land surface in vegetated mountain areas.The error of the SRTM DEM is significantly influenced by different vegetation and terrain conditions.In general, the SRTM DEM bias tends to be bigger in areas with high canopy cover and tall trees, and the uncertainty of SRTM DEM bias tends to be bigger in areas with steep slope.To correct the SRTM DEM in areas without airborne LiDAR coverage, this study first developed methods to estimate canopy cover and tree height from aerial imagery and multi-source datasets, respectively.The estimated canopy cover and tree height products show good correspondences with airborne LiDAR-derived canopy cover and field measurements, respectively.The R 2 between the estimated canopy cover and the airborne LiDAR-derived canopy cover is 0.71, and that between the estimated tree height and field observations is 0.63.These obtained vegetation products were used along with the LiDAR DEM and slope derived from the SRTM DEM to build a SRTM DEM correction model.This model significantly improves the accuracy of the SRTM DEM.The mean difference between the corrected SRTM DEM and the LiDAR DEM decreases from over 12 m to −0.8 m, and the standard deviation drops from 10.86 m to 8.86 m.Besides, this study also examined the accuracy of GLAS elevations within the airborne LiDAR footprint.The GLAS elevations are more than 5 m higher than the LiDAR DEM on average.The corrected SRTM DEM is about 2.5 m lower than GLAS elevations across the whole study site, which further demonstrates that the proposed correction model can reduce the systematic bias of the SRTM DEM.

Figure 1 .
Figure 1.The location of the study area, and the distribution of Geoscience Laser Altimeter System (GLAS) footprints, airborne light detection and ranging (LiDAR) data and in-situ measurements.Note that the elevation distribution in the figure is represented by the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM).

Figure 2 .
Figure 2. Flow chart of the method to correct the SRTM DEM over vegetated mountain areas in this study.

Figure 3 .
Figure 3.The procedure of estimating forest tree height in the study area through the combination of multi-source datasets.

Figure 4 .
Figure 4. (a) Correlation between airborne LiDAR-derived tree height and GLAS waveform extent; (b) Correlation between airborne LiDAR-derived tree height and GLAS leading edge extent.R 2 represents the coefficient of determination.The red dashed lines are fitted lines between airborne LiDAR-derived tree height and GLAS parameters.

Figure 6 .
Figure 6.Variable importance for the tree height estimation Radom Forest model, denoted by (a) the percentage increase of mean-squared error (%IncMSE) and (b) the increase in node purity (IncNodePurity).PTotal, PSeason, TMean, TSeason, AccNDVI, and TM 1-5 and 7 represent the annual total precipitation, annual precipitation seasonality, annual mean temperature, annual temperature seasonality, cumulative NDVI, and six TM spectral bands, respectively.

Figure 7 .
Figure 7. (a) Estimated tree height distribution in the study area; (b) Scatter plot between the estimated tree height and field observations.

Figure 8 .
Figure 8. Histogram of differences between the SRTM DEM and the LiDAR DEM (SRTM DEM minus LiDAR DEM).µ and σ represent the mean and standard deviation of differences between the SRTM DEM and the LiDAR DEM, respectively.

Figure 9 .
Figure 9. (a) Scatter plot between the corrected SRTM DEM and the LiDAR DEM; (b) Scatter plot between the corrected SRTM DEM and GLAS elevations within GLAS footprints.R 2 represents the coefficient of determination, and µ and σ represent the mean and standard deviation of differences between the corrected SRTM DEM and the LiDAR DEM (or GLAS elevations) (SRTM DEM minus LiDAR DEM or GLAS elevations).The red dashed line represents the 1:1 line.

Figure10.Table 3 .
Figure10.(a) Histogram of differences between GLAS elevations and the LiDAR DEM (LiDAR DEM minus GLAS elevations); (b) Histograms of differences between GLAS elevations and the SRTM DEM (SRTM DEM minus GLAS elevations).µ and σ represent the mean and standard deviation of differences between GLAS elevations and the LiDAR DEM (or the SRTM DEM).

Table 1 .
Characteristics and data sources of the seven collected datasets in this study.

Table 2 .
Statistics for differences between the SRTM DEM and the LiDAR DEM (SRTM DEM minus LiDAR DEM), classified by tree height, canopy cover and slope.