Synthesis of Leaf-on and Leaf-off Unmanned Aerial Vehicle (UAV) Stereo Imagery for the Inventory of Aboveground Biomass of Deciduous Forests

Applications of stereo imagery acquired by cameras onboard unmanned aerial vehicles (UAVs) as practical forest inventory tools are hindered by the unavailability of ground surface elevation. It is still a challenging issue to remove the elevation of ground surface in leaf-on stereo imagery to extract forest canopy height without the help of lidar data. This study proposed a method for the extraction of forest canopy height through the synthesis of UAV stereo imagery of leaf-on and leaf-off, and further demonstrated that the extracted forest canopy height could be used for the inventory of deciduous forest aboveground biomass (AGB). The points cloud of the leaf-on and leaf-off stereo imagery was firstly extracted by an algorithm of structure from motion (SFM) using the same ground control points (GCP). The digital surface model (DSM) was produced by rasterizing the point cloud of UAV leaf-on. The point cloud of UAV leaf-off was processed by iterative median filtering to remove vegetation points, and the digital terrain model (DTM) was generated by the rasterization of the filtered point cloud. The mean canopy height model (MCHM) was derived from the DSM subtracted by the DTM (i.e., DSM-DTM). Forest AGB maps were generated using models developed based on the MCHM and sampling plots of forest AGB and were evaluated by those of lidar. Results showed that forest AGB maps from UAV stereo imagery were highly correlated with those from lidar data with R2 higher than 0.94 and RMSE lower than 10.0 Mg/ha (i.e., relative RMSE 18.8%). These results demonstrated that UAV stereo imagery could be used as a practical inventory tool for deciduous forest AGB.


Introduction
Forests cover approximately 30% of the total land area and account for 80% of Earth's total plant biomass [1,2]. Forests' ecosystems contribute 75% of the terrestrial gross primary production (GPP) and contains more carbon in plants and soils than those contained in the atmosphere [2,3]. Therefore, knowledge of the distribution and biomass density of forests is critical for understanding the global carbon balance and, further, for research on climate change.
Several satellite missions aiming at the monitoring of forest aboveground biomass (AGB) have been launched or scheduled, such as the L-band synthetic aperture radar (SAR) onboard the advanced land observing satellite (ALOS) launched by the Japanese Space Agency (JAXA) in 2006 [4], the advanced land observing satellite 2 (ALOS-2) equipped with an enhanced L-band SAR (PALSAR-2) [5], the BIOMASS mission of European Space Agency specifically designed for the measurements of forest AGB and its changes [6], and so on. In addition to these satellite missions, several regional maps of forest AGB or related forest structure parameters have been released [7][8][9]. For example, Blackard et al. [10] mapped the forest AGB of the United States of America using the ground inventory, the Moderate Resolution Imaging Spectroradiometer (MODIS) data, and other ancillary data; Saatchi et al [11] mapped the total carbon stock in the live biomass of all tropical forests by extrapolating the in situ inventory and the Geoscience Laser Altimeter System (GLAS) footprints using optical and microwave imagery. However, Rodríguez-Veiga et al. [12] reported obvious discrepancies in existing forest biomass stock maps with in-situ observations in Mexico. As pointed out by Hall, F.G., et al. [13], these existing global or regional datasets were only approximations based on combining land cover types and representative values instead of measurements of actual forest aboveground biomass.
The importance of uncertainty analysis for remote sensing-derived forest AGB estimates has been recognized in last decade [14,15]. The high-quality ground reference data is the prerequisite of the mapping of forest AGB and the uncertainty analysis [16,17]. As pointed out by Lu et al. [16], the identification of sensitive variables should be based on correlation analysis between ground reference data and potential variables; the estimation model was developed by relating ground reference data with selected variables; the model evaluation and uncertainty analysis also relies on the ground reference data.
The collection of ground reference data is always carried out through field measurements over forest plots. The structure parameters of each tree within the plots are recorded, including diameters at breast height, tree species, and the heights of selected representative trees. This collection of ground reference data is time-consuming, labor-intensive, and the number of sampling plots is always limited due to the cost. Moreover,uncertainties associated with this kind of data collection also exist. For example, the uncertainties caused by the inaccurate positioning of sample plot centers or corners due to the low signal quality of global positioning systems (GPS) [18][19][20]; the measurement of tree heights is always interrupted by the complex forests vertical structures preventing surveyors targeting tree canopy tops correctly from the ground [21]; another uncertainty that appears when relating with the remote sensing dataset is that the boundary of the sampling plot cannot exactly coincide with the boundary of remote sensing pixels. Although field measurements are associated with these uncertainties, they are considered to be the most accurate source of data we can obtain. Therefore, we cannot completely eliminate field measurements, but we can reduce the number of sample plots and make the data collection and biomass estimation more efficient and effective with the help of other tools.
The stereo imagery acquired by optical sensors onboard an Unmanned Aerial Vehicle (UAV) could be one choice for the collection of reference data of forest AGB. Studies on the measurements of forest spatial structures using stereo imagery have been booming in recent years due to the rapid development of UAV platforms and the automatic processing algorithms of stereo imagery [22][23][24][25][26][27]. The stereo images acquired by consumer-grade cameras are good enough for stereoscopic processing using algorithms from computer vision [26]. This facilitated the application of low-cost UAV with low payload capability in the acquisition of stereo imagery.
The application of UAV stereo imagery as a practical sampling tool of forest AGB is hindered by the unavailability of ground surface elevation under forest canopies. The results derived from stereo imagery are the digital surface model (DSM) of the forest canopy top. A digital terrain model (DTM) of the ground surface under forest from other data sources is needed to extract the forest's vertical structures. Many studies on the extraction of forest spatial structures using airborne stereo imagery Remote Sens. 2019, 11, 889 3 of 16 used the DTM from lidar data [28][29][30]. In this way, the application of UAV stereo imagery is limited within the area where the lidar DTM is available.
Some studies reported that point cloud of stereo imagery acquired in leaf-off conditions could see the ground surface through the deciduous forest canopy [26,31]. Our previous studies evaluated the impact of forward overlaps and image resolutions for the mapping of forest three-dimensional structures using UAV stereo imagery [32], and also reported the estimation of the forest leaf area index using height and canopy cover extracted from UAV stereo imagery [33]. This study reported our new results on the inventory of forest aboveground biomass over deciduous forest through the synthesis of leaf-on and leaf-off UAV stereo imagery by taking lidar data as references.

Test Sites
The study area is located at the Genhe forest bureau in the Inner Mongolia Autonomous Region of China (50 • 56 N, 121 • 29 E) on the northwest slope of Daxing'an Mountains. This area is on the southern border of boreal forest and belongs to the frigid-temperate zone of the coniferous forest vegetation. The elevation of this area ranges between 650 m-1269 m. The annual mean temperature is −5.4 • C, while the minimum temperature is −52.6 • C. This area is referred to as the cold polar of China. The annual mean precipitation is 450~550 mm, 60% of which occurs during July and August. The soil type is brown coniferous forest soil with a thickness of 30 cm~40 cm. 83.7% of the area is covered by forest. The dominant tree species is larch (Larix gmelinii). The main broadleaf deciduous species include white birch (Betula platyphlla) and aspen (Populous davidiana). In the study area, the maximum height of the Dhurian larch is about 35 m and the maximum diameter at breast height (DBH) is about 40 cm.

Inventory of Ground Reference
The field sampling was carried out between 10 August and 15 August 2013 by a team under previous project using a plot size of 45 m × 45 m [34]. Each plot was divided into nine subplots with a size of 15 m × 15 m and subplots were numbered serially as shown in Figure 1a. The diameter at the breast height (DBH) of each tree was measured by DBH tape. The heights of the canopy tops and canopy bottoms of each tree were measured by laser range finder. The crown width of each tree at the directions of south to north and east to west were measured by taps. The positions of plot corners were measured by GPS with differential correction, while the relative position of each tree (with DBH > 5.0 cm) within the plots was measured by total stations. In total, seven plots were measured and there were two species of trees located within the plots, i.e. white birch and larch. The biomass of each tree was calculated using the measured DBH and tree heights and the published allometric equations, as shown in Table 1 [35]. The forest AGB of each subplot was the summation of aboveground biomass of all trees divided by the corresponding spatial area. The maximum, minimum, standard deviation, and average forest AGB of all subplots were 184.44 Mg/ha, 21.84 Mg/ha, 39.44 Mg/ha, 74.97 Mg/ha, respectively. estimation model of forest AGB. Figure 2a shows that the determination coefficient (R ) between mean forest heights and field-measured forest AGB was 0.80, with a root mean square error (RMSE) of 24.4 Mg/ha, while Figure 2b shows that the R 2 between field measurement and model predicted forest AGB of the validation plots was 0.74 with a RMSE of 16.9 Mg/ha. The MCHM map with a resolution of 15 m was produced in the same way as the calculation of the mean forest height of each plot. Then the forest AGB map produced using the developed model and MCHM, with a resolution of 15 m, is shown in Figure 2c. Figure 1. The shape and spatial distribution of field sampling plots within the coverage of lidar data. The background true color image was from GoogleEarth TM . (a) the shape and size of field sampling plots; (b) the spatial distribution of field sampling plots, the white polygon was the area covered by Unmanned Aerial Vehicle (UAV) stereo imagery.

Forest AGB Maps from Lidar Data
The lidar system onboard Yun-5 aircraft was the Leica ALS60 (Leica Geosystems, Heerbrugg, Switzerland) system working at 1064 nm and a 166 kHz pulse rate at 1800 m above ground level (AGL). The Yun-5 aircraft carried both the GPS receivers and Inertial Navigation System (INS) simultaneously for recording accurate positions and attitudes of the platform. Three returns were recorded, i.e. first, second and last. The positioning accuracy was estimated as vertical error <15 cm and horizontal error <50 cm by comparing with the ground control points (GCPs) from Real Time Kinematic (RTK) measurements. The Lidar data used in this study was collected from 30 August to 14 September 2012 with a point density of about 2~4 points/m 2 . Please refer to [36] for more details on Lidar data acquisition.
The lidar point clouds were classified as ground, vegetation, and others using TerraScan software (TerraSolid Ltd, Helsinki, Finland, http://www.terrasolid.com/home.php) [37]. The DTM of the ground surface with a pixel size of 0.5 m was produced through the rasterization of the ALS ground points. The first return of the ALS point cloud was rasterized to produce a digital surface model (DSM) with a pixel size of 0.5m. If only one point was located within a pixel, the elevation of the point was used as the pixel value. If there were more than one point within a pixel, the mean height of vegetation points was used as the pixel value. The difference between DSM and DTM was the mean canopy height model (MCHM). The pixels with heights of <2.0 were set as 0.0 m to exclude shrubs and non-woody vegetation. The mean forest height of each plot was calculated by averaging all MCHM pixels, including both bare ground and vegetation pixels. The mean forest height of each subplot was correlated with the corresponding biomass density to build an estimation model of forest AGB. There were, in total, 63 subplots from 7 sampling plots. All subplots were numbered from 1 to 63 according to their serial number within each plot and the serial number of each plot. The 32 subplots with odd numbers were used to develop an estimation model of forest AGB and the rest of the 31 subplots were used for model validation. Considering that the forest AGB is linearly correlated with forest stocking volume, which is correlated with forest height in the power form as indicated in [38], the power function was chosen to develop an estimation model of forest AGB. Figure 2a shows that the determination coefficient (R 2 ) between mean forest heights and field-measured forest AGB was 0.80, with a root mean square error (RMSE) of 24.4 Mg/ha, while Figure 2b shows that the R 2 between field measurement and model predicted forest AGB of the validation plots was 0.74 with a RMSE of 16.9 Mg/ha. The MCHM map with a resolution of 15 m was produced in the same way as the calculation of the mean forest height of each plot. Then the forest AGB map produced using the developed model and MCHM, with a resolution of 15 m, is shown in Figure 2c.
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 15 The acquisition system of stereo imagery was built on a six-rotor helicopter [27]. The camera mounted on the helicopter pointing at nadir was a Sony NEX-5T, which could collect images with 16.10 megapixels. The ground station associated with the UAV system enabled to automatically program the flying routes and the positions to take photos according to system settings, including focal lengths of camera, flying heights, forward overlaps, side overlaps, and flying speeds. The duration of each flight was determined by the battery capacity. Taking the battery of 16000 mAh used in this study for example, it was about 18-20 minutes flying at about 300 m AGL with a flying speed of 10 m/s. The forward overlap used in this study was about 90% while the side overlap was about 60%. The stereo imagery used in this study was collected by flying along the same routes on 25 August 2014 and 5 May 2015, corresponding to leaf-on and leaf-off seasons, respectively. The focal length and shutter speed used in image collection was 16 mm and 1/600 s, respectively. 776 images stored in the RGB format were used in this study. The spatial resolution or the ground sample distance (GSD) of the stereo imagery was about 8.6 cm.

Stereoscopic Processing
The stereo images were processed using the Agisoft Photoscan software package (version 1.2.2, build 2294, 64 bit). The processing algorithm was procedures of structure from motion (SfM). The three-dimensional modeling equations used in SfM only need relative positions and orientations of the camera, which can be calculated using identified matching points. The point cloud in local relative coordinate systems can be determined based on the numerous overlapping stereo images [39]. The real three-dimensional model was produced by transforming the relative point cloud into geo-referenced coordinates using GCP. Correspondingly, there were three basic steps involved in the stereo processing: (a) Image alignments, i.e. the relative positions and orientations of images were estimated using the key points extracted from images through analyses of image textures; (b) dense point cloud generation, i.e. based on the estimated relative positions and orientations of images-the depth information for each camera was calculated and then combined into a single dense point cloud. Five levels of quality of point densification are provided in the software, including lowest, low, medium, high, or ultra-high. The ultra-high level was used in this study to get the densest point cloud. (c) Coordinates transformation, i.e. the geo-referenced coordinates of point cloud was calculated from the relative coordinates using GCP. Three options were provided in step (b) to determine the filtering degree, including "mild," "aggressive," and "moderate." For forested areas, the "mild" was recommended because the sparse single trees were always filtered out as noise by the other two settings. The default values of other parameters were used in this study.
The field measurement of GCP using GPS with differential correction was time-consuming in field work. The longitude, latitude, and elevation of GCPs used in this study were directly collected on the very height resolution (VHR) images in Google Earth™. The same GCPs were used for the stereo imagery of both leaf-on and leaf-off to minimize their divergence. The spatial distributions

Collections of UAV Stereo Imagery
The acquisition system of stereo imagery was built on a six-rotor helicopter [27]. The camera mounted on the helicopter pointing at nadir was a Sony NEX-5T, which could collect images with 16.10 megapixels. The ground station associated with the UAV system enabled to automatically program the flying routes and the positions to take photos according to system settings, including focal lengths of camera, flying heights, forward overlaps, side overlaps, and flying speeds. The duration of each flight was determined by the battery capacity. Taking the battery of 16000 mAh used in this study for example, it was about 18-20 minutes flying at about 300 m AGL with a flying speed of 10 m/s. The forward overlap used in this study was about 90% while the side overlap was about 60%. The stereo imagery used in this study was collected by flying along the same routes on 25 August 2014 and 5 May 2015, corresponding to leaf-on and leaf-off seasons, respectively. The focal length and shutter speed used in image collection was 16 mm and 1/600 s, respectively. 776 images stored in the RGB format were used in this study. The spatial resolution or the ground sample distance (GSD) of the stereo imagery was about 8.6 cm.

Stereoscopic Processing
The stereo images were processed using the Agisoft Photoscan software package (version 1.2.2, build 2294, 64 bit). The processing algorithm was procedures of structure from motion (SfM). The three-dimensional modeling equations used in SfM only need relative positions and orientations of Remote Sens. 2019, 11, 889 6 of 16 the camera, which can be calculated using identified matching points. The point cloud in local relative coordinate systems can be determined based on the numerous overlapping stereo images [39]. The real three-dimensional model was produced by transforming the relative point cloud into geo-referenced coordinates using GCP. Correspondingly, there were three basic steps involved in the stereo processing: (a) Image alignments, i.e., the relative positions and orientations of images were estimated using the key points extracted from images through analyses of image textures; (b) dense point cloud generation, i.e. based on the estimated relative positions and orientations of images-the depth information for each camera was calculated and then combined into a single dense point cloud. Five levels of quality of point densification are provided in the software, including lowest, low, medium, high, or ultra-high. The ultra-high level was used in this study to get the densest point cloud. (c) Coordinates transformation, i.e., the geo-referenced coordinates of point cloud was calculated from the relative coordinates using GCP. Three options were provided in step (b) to determine the filtering degree, including "mild," "aggressive," and "moderate." For forested areas, the "mild" was recommended because the sparse single trees were always filtered out as noise by the other two settings. The default values of other parameters were used in this study.
The field measurement of GCP using GPS with differential correction was time-consuming in field work. The longitude, latitude, and elevation of GCPs used in this study were directly collected on the very height resolution (VHR) images in Google Earth™. The same GCPs were used for the stereo imagery of both leaf-on and leaf-off to minimize their divergence. The spatial distributions and the positioning errors of GCPs were shown in the results section.

Extraction of Canopy Height Model
The direct results of the stereoscopic processing of images acquired in the leaf-off season are point clouds located on the vegetation and ground surfaces. The prerequisite of DTM extraction is to remove vegetation points. A simple automatic algorithm based on iterative median filtering (IMF) is used to identify ground points from the point cloud of the stereo imagery acquired on leaf-off season. The algorithm is described as follows: (1) Point cloud is rasterized to produce an elevation image with a given pixel size, such as 5.0 m; (2) The elevation image is filtered by median filter with a given window size, such as 13 × 13; (3) The filtered elevation image serves as a template to screen out point cloud; points above the template are discarded if their vertical distance to the template is larger than a given threshold, such as 1.5 m; (4) the remained point cloud is processed by repeating steps (1)-(3) until the number of discard points is smaller than a given threshold, such as 1%.
The DSM of the forest canopy top could be generated by rasterizing the point cloud of leaf-on with a resolution of 0.5 m in the same way as the rasterization of lidar point cloud. The DTM could be produced by the rasterization and interpolation of the filtered point cloud of leaf-off. Therefore, the MCHM could be extracted by subtracting DTM from DSM.

Mapping of Forest AGB Using MCHM of UAV Stereo Imagery
Considering the fact that the UAV stereo imagery should be available before the field work of tree measurements, this study proposed to determine positions of field sampling plots on the MCHM of UAV stereo imagery and ortho-rectified mosaic images, because they can provide information about the spatial distribution of forest canopy heights, forest densities, tree species and so on. For the coverage of as full a dynamic range of forest AGB as we can, four types of sampling plots were selected, including low and sparse forests (referred to as type A), low and dense forests (referred to as type B), two-layer forests with sparse higher trees (referred to as type C) and high and dense forests (referred to as type D). Six field sampling plots were selected for each type. The time-cost of field measurements for each plot is directly determined by the size of sampling plots. More time is needed for a larger sampling plot because more trees located within plots should be measured. What is the proper size of sampling plots of the field measurements for the mapping of forest AGB using UAV stereo imagery? In order to answer this question, considering the plot size of field measurement used in this study as in Section 2.2, three sizes of sampling plots were used in this study, i.e. concentric square plots with sizes of 15 m × 15 m, 30 m × 30 m and 45 m × 45 m.
The field sampling plots described in Section 2.2 were not located within the coverage of UAV stereo imagery, as show in Figure 1b The three forest AGB maps of UAV stereo imagery were further generated using the UAV MCHM maps and the corresponding prediction models. The three forest AGB maps of UAV stereo imagery were evaluated by taking that of lidar data as reference. The evaluation was carried out using the pixel by pixel comparison between the forest AGB maps of UAV and those of lidar.  Figure 3a showed as yellow lines. The coverage area was about 1.52 km 2 with a 1.6 km length and 0.95 km width. Both the leaf-on and leaf-off stereo imagery were collected along the same trajectory within one flight with an effective duration of flight of about 12 minutes. One obvious difference between leaf-on and leaf-off was that more roads were visible in Figure 3b than in Figure 3a. There were no harvesting activities that occurred between the two flights, but the snow covered the ground, and the leaves were off during the collection of early spring images.

UAV Stereo Imagery
The accuracy of GCP was shown in Table 2. The standard deviation of GCP in leaf-off season along the direction from west to east was 0.59 m, while that along the direction from south to north was 0.48 m. The vertical accuracy was 0.8 m. For GCP in leaf-on season, the vertical accuracy was 0.83 m, while the horizontal accuracy was 0.34 m and 0.32 m along the direction from west to east and the direction from south to north, respectively.
The accuracy of GCP reported in Table 1 indicated the relative co-registration errors between the UAV imagery of Leaf-on and Leaf-off. The absolution positioning error is evaluated by making an accurate co-registration between the DSM of Leaf-on and that of lidar using the method proposed in [40]. The results show that the horizontal offset between the two DSM is about 3.63 pixel (3.63 pixel * 0.5 m/pixel = 1.815 m) and 0.67 pixel (0.67 pixel * 0.5 m/pixel = 0.335 m) along the X and Y directions, respectively. For the field plot with a size of 15 m × 15m or even larger, the positioning error was only about 1/10 of plot size, which is acceptable. 1.52 km 2 with a 1.6 km length and 0.95 km width. Both the leaf-on and leaf-off stereo imagery were collected along the same trajectory within one flight with an effective duration of flight of about 12 minutes. One obvious difference between leaf-on and leaf-off was that more roads were visible in Figure 3b than in Figure 3a. There were no harvesting activities that occurred between the two flights, but the snow covered the ground, and the leaves were off during the collection of early spring images. The accuracy of GCP was shown in Table 2. The standard deviation of GCP in leaf-off season along the direction from west to east was 0.59 m, while that along the direction from south to north was 0.48 m. The vertical accuracy was 0.8 m. For GCP in leaf-on season, the vertical accuracy was 0.83 m, while the horizontal accuracy was 0.34 m and 0.32 m along the direction from west to east and the direction from south to north, respectively.
The accuracy of GCP reported in Table 1 indicated the relative co-registration errors between the UAV imagery of Leaf-on and Leaf-off. The absolution positioning error is evaluated by making an accurate co-registration between the DSM of Leaf-on and that of lidar using the method  Figure 4 shows the results of the extraction of ground surfaces using leaf-off stereo imagery by the IMF at different aspects. The number and percentage of points removed in each iteration is shown in Figure 4a. It is clear that the vegetation points are mostly removed in the first two iterations. The number of removed points decreased gradually after each iteration and was lower than 1% after the fifth iteration. Figure 4b shows the identification of ground points along a profile over flat terrain conditions. The profile is a subset of the point cloud along the X direction (from west to east) with a length of 115 m. It could be seen that the identified points gradually approached the ground surface. The difference between iterations becomes very weak after the third iteration. Figure 4c depicts the vertical distribution of points over a forest stand of 30 m × 30 m. The vertical distribution of the remaining points concentrates gradually on the ground surface. Figure 4 showed that the elevation of ground surface can be extracted well by the IMF algorithm in both flat and mountainous areas. Table 2. Accuracy of the GCP in the stereoscopic processing of UAV stereo imagery (in meter). over flat terrain conditions. The profile is a subset of the point cloud along the X direction (from west to east) with a length of 115 m. It could be seen that the identified points gradually approached the ground surface. The difference between iterations becomes very weak after the third iteration. Figure 4c depicts the vertical distribution of points over a forest stand of 30 m × 30 m. The vertical distribution of the remaining points concentrates gradually on the ground surface. Figure 4 showed that the elevation of ground surface can be extracted well by the IMF algorithm in both flat and mountainous areas.   The MCHM extracted by the synergy of point cloud of leaf-on stereo imagery and the filtered point cloud of leaf-off stereo imagery is shown in Figure 5. Figure 5a,b are the leaf-on UAV MCHM and lidar MCHM, respectively. The two images look nearly identical at a first glance. These results demonstrated that the elevation of ground surface extracted from the point cloud of leaf-off stereo imagery by IMF was enough. In addition, the point cloud of leaf-on stereo imagery and leaf-off stereo imagery has been co-registered accurately enough by the GCPs; otherwise, there would be terrain-related errors remained in Figure 5a caused by their mis-registration.

GCP# X Error Y Error Z Error GCP# X Error Y Error Z Error
The comparison between the leaf-on UAV MCHM and lidar MCHM is shown in Figure 5c,d along two profiles which are identical to Figure 4d,e respectively. The changing trends of forest heights along profiles of leaf-on UAV MCHM and lidar MCHM were consistent. The pixels where the leaf-on UAV MCHM was higher than the lidar MCHM could be attributed to the bad penetration abilities of stereo imagery over higher tree neighboring those pixels. The pixels where the leaf-on UAV MCHM was lower than lidar MCHM were caused by a failure to catch the canopy tops of trees neighboring those pixels.  Figure 4d and Figure 4e respectively. The changing trends of forest heights along profiles of leaf-on UAV MCHM and lidar MCHM were consistent. The pixels where the leaf-on UAV MCHM was higher than the lidar MCHM could be attributed to the bad penetration abilities of stereo imagery over higher tree neighboring those pixels. The pixels where the leaf-on UAV MCHM was lower than lidar MCHM were caused by a failure to catch the canopy tops of trees neighboring those pixels.

Model Development and Validation of Forest AGB Maps of UAV
The positions of 24 sampling plots are shown in Figure 6a. The symbols of yellow dots, blue dots, yellow stars, and blue stars represent the sampling plots of type A, type B, type C, and type D, respectively. Three concentric square plots with different sizes as shown in Figure 6b are setup at each position to evaluate the effects of plot size on the accuracy of prediction models based on UAV stereo imagery. The MCHM images of four 45 m × 45 m sampling plots representing four stand categories are shown in Figure 7.  The prediction models of forest AGB using mean forest heights from UAV stereo imagery take the form of power functions as that of lidar. The model coefficients and accuracy are shown in Table  3. The scattering plot of the forest AGB from lidar against that predicted by the prediction models is given in Figure 8. Both Table 3 and Figure 8 show that the differences of prediction accuracy from the models developed at different plot sizes are not obvious in terms of R 2 . The accuracy differences in terms of RMSE are also weak, which is about 1.  The prediction models of forest AGB using mean forest heights from UAV stereo imagery take the form of power functions as that of lidar. The model coefficients and accuracy are shown in Table  3. The scattering plot of the forest AGB from lidar against that predicted by the prediction models is given in Figure 8. Both Table 3 and Figure 8 show that the differences of prediction accuracy from the models developed at different plot sizes are not obvious in terms of R 2 . The accuracy differences in terms of RMSE are also weak, which is about 1. The prediction models of forest AGB using mean forest heights from UAV stereo imagery take the form of power functions as that of lidar. The model coefficients and accuracy are shown in Table 3. The scattering plot of the forest AGB from lidar against that predicted by the prediction models is given in Figure 8. Both Table 3 and Figure 8 show that the differences of prediction accuracy from the models developed at different plot sizes are not obvious in terms of R 2 . The accuracy differences in terms of RMSE are also weak, which is about 1.  Figure 9 shows that the forest AGB of sampling plots scattered evenly among the dynamic range from 0 to 140 Mg/ha. The results show that the dynamic range of forest AGB can be covered by sampling plots selected on MCHM of UAV and ortho-rectified mosaic image.
Three maps of forest AGB from UAV are produced by the three prediction models developed at sampling plot sizes of 15 m × 15 m, 30 m × 30 m and 45 m × 45 m. Figure 9 shows the validation of forest AGB maps by taking those of lidar as a reference. The three forest AGB maps of the UAV stereo images are highly correlated with those of the lidar data with correlation coefficients (R 2 ) larger than 0.94 and the root mean square errors (RMSEs) are smaller than 10 Mg/ha. No obvious differences or trends of the estimation accuracy of forest AGB are observed along with the changes of sampling plot sizes. The results indicate that the field sampling plots with a size of 15 m × 15 m are large enough, at least over the research area of this study, for the mapping of forest AGB using UAV stereo imagery. Table 3. Prediction models of forest AGB using a mean forest height of UAV stereo imagery at different plot sizes, taking the form of y = ax b , where x is mean forest height and y is forest AGB.  Figure 9 shows the validation of forest AGB maps by taking those of lidar as a reference. The three forest AGB maps of the UAV stereo images are highly correlated with those of the lidar data with correlation coefficients (R 2 ) larger than 0.94 and the root mean square errors (RMSEs) are smaller than 10 Mg/ha. No obvious differences or trends of the estimation accuracy of forest AGB are observed along with the changes of sampling plot sizes. The results indicate that the field sampling plots with a size of 15 m × 15 m are large enough, at least over the research area of this study, for the mapping of forest AGB using UAV stereo imagery. Table 3. Prediction models of forest AGB using a mean forest height of UAV stereo imagery at different plot sizes, taking the form of y = ax , where x is mean forest height and y is forest AGB.  Figure 9 shows the validation of forest AGB maps by taking those of lidar as a reference. The three forest AGB maps of the UAV stereo images are highly correlated with those of the lidar data with correlation coefficients (R 2 ) larger than 0.94 and the root mean square errors (RMSEs) are smaller than 10 Mg/ha. No obvious differences or trends of the estimation accuracy of forest AGB are observed along with the changes of sampling plot sizes. The results indicate that the field sampling plots with a size of 15 m × 15 m are large enough, at least over the research area of this study, for the mapping of forest AGB using UAV stereo imagery. Table 3. Prediction models of forest AGB using a mean forest height of UAV stereo imagery at different plot sizes, taking the form of y = ax , where x is mean forest height and y is forest AGB.

Extraction of Ground Surface
Four parameters are needed in the proposed IMF algorithm, i.e., the pixel size used in the rasterization of point cloud, the window size used in median filtering, the threshold of vertical distance to the template, and the number of iterations. The requirement for pixel size is not rigorous. The ranges from 1.0 m to 10.0 m are acceptable for stereo imagery with a resolution smaller than 0.1 m, as used in this study. There will be no sufficient number of points for the interpolation if the pixel size is too small, while the small terrain features will be missed if the pixel size is too large.
The IMF algorithm is also not sensitive to the filtering window size. The ranges from 3 × 3 to 15 × 15 are all acceptable. The difference between different window sizes is the work efficiency. More iterations are needed for smaller window sizes because most features of forests' vertical structures are retained. A filtering widow size that is too large is not suggested because detailed terrain features will be missed, as in the case of the large pixel size used in the rasterization.
The threshold of vertical distance to the template is a critical parameter of IMF because points having a vertical distance larger than the threshold are directly discarded. The larger threshold will slow down the filtering process while a smaller threshold will remove the point of the ground surface on the slope. The minimum of the threshold should be the integrated consideration of terrain slopes, pixel size, and filtering window size to guarantee that the threshold is larger than the elevation fluctuation of ground surfaces. In this study, the fixed threshold was used because the elevation dynamics over the coverage area are narrow. In future studies, it would be worthwhile to adjust the threshold adaptively for application over larger areas with broader dynamics of ground surface elevations.
The last parameter of IMF is the number of iterations. The results in this study showed that the filtering should be stopped when the number of points removed is lower than 1% of the original point number. This parameter, in fact, is not particularly important because the intermediate results can be saved in the filtering process. Users can determine which one is the best result by comparing these intermediate results.

The applicable Areas of UAV Stereo Imagery
The applicable area of this study is limited to deciduous forests. This limitation is mainly due to the availability of ground surface elevations under forest canopies because the leaf-off conditions increase the probability to see through the forest canopies over the deciduous forest. In fact, the description of forest spatial structures mainly comes from the UAV stereo imagery of leaf-on. The UAV stereo imagery is anticipated to also have good performance over evergreen forest areas if a high quality elevation dataset of the ground surface under the forest canopy is available or over sparse forest areas like savanna or forest shelterbelts.

Uncertainties
It must be noted that some kinds of uncertainties existed in this study. The most obvious uncertainty is the mis-match of acquisition dates of data used in this study. The lidar data was collected in 2012, the field measurements were carried out in 2013, and the UAV stereo imagery was collected in 2014 and 2015. Considering that the forest structure is, in fact, derived from leaf-on stereo imagery, the time interval between lidar and UAV stereo imagery is about two years. No obvious or severe human disturbances occurred in these two year. The forest growth is slow because the growing season of this area is only about four months from late May to early September. It could be anticipated that the correlation between forest biomass from lidar and that from UAV stereo imagery might be improved or at least might not be decreased if these two datasets were acquired in the same period.
Another uncertainty is the different point density. The point density of lidar data is about 24 points/m 2 while that of the UAV stereo imagery is higher than 30 points/m 2 . The holes within forest canopies can be observed on the DSM of the lidar data, with a resolution of 0.5 m. The point cloud of UAV stereo imagery was rasterized into the DSM with a resolution of 0.5 m to reduce the uncertainty of unmatched point densities.
In this study, the forest AGB measurements for the developments of prediction models of UAV stereo imagery was collected on the forest AGB maps of lidar data, due to the unavailability of field measurements within the coverage of UAV stereo imagery. Although the analysis was carried out on the forest AGB, the comparison was in fact equivalently made on the vertical forest structures detected by UAV stereo imagery and those detected by lidar data. The uncertainties, exhibited on fine resolutions such as 15 m × 15 m, as shown in Figure 9, could be attributed to their different performances of the detection of forest vertical structures. For example, the small gaps between crowns could be seen by lidar data but missed by UAV stereo imagery, as shown in Figure 5c,d. Inversely, the lidar data was easily affected by the gaps within crowns, while the UAV stereo imagery could describe crowns with good geometrical shapes by overpassing those gaps within crowns. In future studies, the UAV stereo imagery should be further evaluated against the field measurements of trees.

Conclusions
The application of UAV stereo imagery as a practical sampling tool of forests is limited by the ground surface elevation under the forest canopy. Most studies on the extraction of forest spatial structures using UAV stereo imagery used the DTM from lidar data. Therefore, they are mostly limited within the area where the lidar DTM is available. In this study, we proposed a new way to extract the deciduous forest canopy height without the help of the lidar DTM. The central idea is that the synthesis of UAV stereo imagery acquired under leaf-on and leaf-off condition. The point cloud derived from the leaf-off UAV stereo imagery was firstly processed by the proposed algorithm of iterative median filtering to remove points from forests. The digital terrain model (DTM) was produced by the rasterization of filtered point cloud. The digital surface model (DSM) of the forest canopy top was generated by rasterizing the point cloud derived from the leaf-on UAV stereo imagery. The mean forest canopy height model (MCHM) was derived from the DSM subtracted by the DTM. The results showed that the terrain information in the DSM could be removed in the MCHM.
The extracted MCHM was further used for the mapping of forest AGB. Results showed that forest AGB maps from UAV stereo imagery were highly correlated with those from the lidar data with R 2 higher than 0.94 and RMSE lower than 10.0 Mg/ha. The underestimation of AGB by UAV data was observed. The reason for this underestimation should be explored over more test sites in the future.