Tree Crowns Cause Border Effects in Area-Based Biomass Estimations from Remote Sensing

: The estimation of forest biomass by remote sensing is constrained by different uncertainties. An important source of uncertainty is the border effect, as tree crowns are not constrained by plot borders. Lidar remote sensing systems record the canopy height within a certain area, while the ground-truth is commonly the aboveground biomass of inventory trees geolocated at their stem positions. Hence, tree crowns reaching out of or into the observed area are contributing to the uncertainty in canopy-height–based biomass estimation. In this study, forest inventory data and simulations of a tropical rainforest’s canopy were used to quantify the amount of incoming and outgoing canopy volume and surface at different plot sizes (10, 20, 50, and 100 m). This was performed with a bottom-up approach entirely based on forest inventory data and allometric relationships, from which idealized lidar canopy heights were simulated by representing the forest canopy as a 3D voxel space. In this voxel space, the position of each voxel is known, and it is also known to which tree each voxel belongs and where the stem of this tree is located. This knowledge was used to analyze the role of incoming and outgoing crowns. The contribution of the border effects to the biomass estimation uncertainty was quantiﬁed for the case of small-footprint lidar (a simulated canopy height model, CHM) and large-footprint lidar (simulated waveforms with footprint sizes of 23 and 65 m, corresponding to the GEDI and ICESat GLAS sensors). A strong effect of spatial scale was found: e.g., for 20-m plots, on average, 16% of the CHM surface belonged to trees located outside of the plots, while for 100-m plots this incoming CHM fraction was only 3%. The border effects accounted for 40% of the biomass estimation uncertainty at the 20-m scale, but had no contribution at the 100-m scale. For GEDI- and GLAS-based biomass estimates, the contributions of border effects were 23% and 6%, respectively. This study presents a novel approach for disentangling the sources of uncertainty in the remote sensing of forest structures using virtual canopy modeling.


Introduction
Biomass maps derived from remote sensing methods are an effective way to conduct detailed, spatially explicit carbon accounting of vegetation over large areas. They are crucial for the understanding of the global carbon budget and for quantifying the CO 2 sequestration by forests [1]. In recent years, considerable effort has been made to improve forest biomass mapping approaches and to generate maps with increasing spatial resolutions. A frequently applied method for aboveground biomass (AGB) estimation uses remote sensing data from active sensor systems, such as light detection and ranging (lidar) or synthetic aperture radar [2]. Such data are used to generate metrics of canopy height, which are linked to AGB via regression models [3]. Canopy height metrics and AGB are calculated in an area-based way, i.e., representing the whole forest over a certain area, and not at the individual tree level [4]. The area sizes used depend on the size of the available ground-truth plots for the calculation of the reference AGB, the footprint size of the lidar system, or the desired resolution of the produced AGB map [5].
The AGB of single trees in the ground-truth plots is commonly geolocated at their stem positions. This is done mostly for practical reasons, but it can also be justified structurally because most of a tree's AGB is contained in the stem. The canopy height metrics, on the other hand, are calculated from the remote sensing signal, which results from reflections from the tree crowns. The tree crowns cover a certain spatial extent around the stem positions. Hence, there is a spatial mismatch between the remote sensing metrics and the ground-truth [6]. As long as the entire tree crown is contained within the considered plot area, this mismatch does not pose a problem. However, at the borders of plots, tree crowns can reach out of the plot (outgoing crowns) and crowns from outside trees can reach into the plot (incoming crowns). These outgoing and incoming crowns affect the values of the canopy height metrics calculated for the plot, while for the ground-truth AGB calculation the plot borders are strict borders, and only tree stems inside the plot are considered, independent of their crown extension [6,7]. The uncertainty in the heightbiomass relationship caused by incoming and outgoing tree crowns is referred to as a 'border effect'. Other studies have also called it an 'edge effect', but because this term in forest ecology is often understood as the effects leading to altered forest dynamics near forest edges [8], we chose here to use the term 'border effect' to avoid misunderstandings.
The proportion of incoming and outgoing tree crowns is expected to be larger the smaller the plot sizes are. In general, it has been found that uncertainties in biomass estimation at the single-plot level increase with decreasing plot size, which is usually observed as an increase in the root mean square error and a decrease in the R 2 for biomass estimates [5,6,[9][10][11]. This means that area-based biomass maps cannot be produced at arbitrarily fine resolutions without generating large uncertainties at the level of the single estimates. Border effects are thought to be one major contribution to this uncertainty. In order to better understand their contribution, it is important to have an idea about the magnitude of the incoming and outgoing crown volume and surface. Quantifying these proportions with empirical data is difficult because it would require a large number of crown measurements. Alternatively, the proportions can be analyzed using allometric assumptions and simulations in combination with forest inventory data [7,11,12]. Simulations have the additional advantage that the border effects can be switched off in the biomass estimation in order to quantify the residual uncertainty in the height-biomass relationship in an ideal situation where border effects do not exist. This can be achieved by enforcing periodic boundary conditions on the simulated tree crowns. Periodic boundaries are a common concept in ecological modeling, which are used to avoid edge effects in simulations [13]. They imply that the modelled area corresponds to a torus, which cannot be left. If something leaves the modelled area on one side it reappears on the opposite side. For the simulated tree crowns in this study, this means that the crowns which leave the plot on one end reappear on the opposite side, while no crowns from trees which are outside the plot are allowed to enter. With this approach, idealized analyses can be conducted, wherein the simulated remote sensing signal is generated exclusively (no incoming crowns) and entirely (if outgoing, then reappearing) by the crowns of trees which have their stem position inside the plot. This idealized approach helps to investigate the pure influence of the border effect on lidar-derived biomass estimates.
In this study, such simulations were conducted for a tropical rainforest in Panama using different plot sizes and shapes, including the circular footprints of the spaceborne lidars ICESat GLAS (65 m diameter) [14,15] and GEDI (23 m diameter) [16], in order to answer the following two questions: (1) How large are the proportions of the incoming and outgoing crown volume and canopy surface at different scales? (2) What is the contribution of border effects to the uncertainty in AGB estimations at different scales?
The crown diameter CD was calculated from DBH using a power law Equation (2).
The parameters for Equations (1) and (2) were derived by fitting regression models to a dataset of 12,348 neotropical trees with measured attributes [25] (subset: functional type "Angio", biogeographic zone "Neotropic", biome "Tropical forests"). In order to analyze how the variability in the crown diameter allometry might affect the results, we repeated parts of the analysis with randomly altered crown diameters according to Equation (3).
The random variability ε was modelled as a normal distribution, for which the standard deviation (σ = 0.39) was derived from the residuals of Equation (2) with respect to the 12,348 trees in the dataset [25]. The normal distribution was truncated at the 95% prediction intervals (±1.96·σ) in order to avoid negative and unrealistically large crown diameter values. The crown length was calculated as 0.4 times tree height [11]. For each tree, a leaf area index of 2 was assumed, which-in combination with the crown volume-allowed the calculation of the leaf area density. These assumptions about crown length and leaf area density have proven suitable for forest and lidar simulations for BCI before [11]. The aboveground biomass of each single tree (AGB tree ) was calculated using a generic tropical tree biomass allometry (Equation (4) corresponding to "Model 5" from [26]) with the DBH, estimated tree height H, and wood density WD of the species [27] as inputs.

Voxel Forest Generation
The geometrical information of every tree was used to construct a three-dimensional representation of the forest stand assuming spheroid tree crowns (equally elongated in the X and Y directions but different in the Z direction) based on the inventory dataset. The spatial resolution of the voxels in 3D space was 1 m × 1 m × 1 m. The voxel forest was stored as a data table with rows representing tree crown voxels. The columns contained the X, Y and Z coordinates of the voxels, the leaf area density, and additional information about the tree to which the voxel belongs (e.g., tree ID, X and Y position of the stem). Knowing, for each voxel, the coordinates and also the position of the associated stem was a requirement for the later analysis of incoming and outgoing crowns.
A subset of the data table was stored representing only the upper surfaces of the tree crowns which are visible from above (i.e., the voxel with the maximal Z value for each unique combination of X and Y). Raster representations of the upper canopy surface are commonly called canopy height models (CHM). They can be derived from lidar point clouds, photogrammetry, or radar. CHMs from airborne lidar are commonly derived with a simple point-to-raster method for dense point clouds, or via interpolation methods for less dense point clouds [28]. Hence, rasterizing the maximal Z value for each unique X and Y combination in our 3D voxel space leads to a simulated CHM of 1-m resolution, which is entirely based on the inventory data and allometric relationships. Because, for the simulated lidar, there is no geolocation error with respect to the inventory, the pure border effect can be studied.

Crown Space Quantification
For different plot sizes, it was analyzed which proportion of the tree crowns over a plot is incoming (i.e., belongs to trees that stand outside the plot area) and which proportion of tree crowns of the trees on the plot is outgoing (i.e., falls outside the plot area). For this purpose, the 50-ha plot was divided into subplots with side lengths of 10, 20, 50 and 100 m, respectively. For each plot size, each voxel was labeled with (a) a plot ID of the voxel's X and Y position, and (b) a plot ID of the corresponding tree stem's X-and Y-position. This allowed us to easily distinguish between voxels which are located over the same plot as their tree stems and voxels which are located over a plot other than the plot of their tree stems.
For each plot at each scale, the following quantities were computed by counting the respective numbers of 1-m 3 voxels: (1) the total crown volume of all of the trees standing inside a plot (CV t ); (2) the total crown volume that is exactly above the plot (CV p ); (3) the crown volume that fulfills both of the previous conditions, i.e., the voxel is above the plot and the tree stem is inside the plot (CV t&p ); (4) the incoming crown volume (CV in = CV p − CV t&p ); (5) the outgoing crown volume (CV out = CV t − CV t&p ); (6) the total canopy surface of all trees standing on a plot (CS t ); (7) the total canopy surface that is exactly above the plot (CS p ); (8) the canopy surface that fulfills both of the previous conditions, i.e., the surface is above the plot and the tree stem is inside the plot (CS t&p ); (9) the incoming canopy surface (CS in = CS p − CS t&p ); (10) the outgoing canopy surface (CS out = CS t − CS t&p ). The means and ranges of these quantities were calculated for each spatial scale. Figure 1 illustrates some of these quantities. commonly called canopy height models (CHM). They can be derived from lidar point clouds, photogrammetry, or radar. CHMs from airborne lidar are commonly derived with a simple point-to-raster method for dense point clouds, or via interpolation methods for less dense point clouds [28]. Hence, rasterizing the maximal Z value for each unique X and Y combination in our 3D voxel space leads to a simulated CHM of 1-m resolution, which is entirely based on the inventory data and allometric relationships. Because, for the simulated lidar, there is no geolocation error with respect to the inventory, the pure border effect can be studied.

Crown Space Quantification
For different plot sizes, it was analyzed which proportion of the tree crowns over a plot is incoming (i.e., belongs to trees that stand outside the plot area) and which proportion of tree crowns of the trees on the plot is outgoing (i.e., falls outside the plot area). For this purpose, the 50-ha plot was divided into subplots with side lengths of 10, 20, 50 and 100 m, respectively. For each plot size, each voxel was labeled with a) a plot ID of the voxel's X and Y position, and b) a plot ID of the corresponding tree stem's X-and Y-position. This allowed us to easily distinguish between voxels which are located over the same plot as their tree stems and voxels which are located over a plot other than the plot of their tree stems.
For each plot at each scale, the following quantities were computed by counting the respective numbers of 1-m³ voxels: (1) the total crown volume of all of the trees standing inside a plot (CVt); (2) the total crown volume that is exactly above the plot (CVp); (3) the crown volume that fulfills both of the previous conditions, i.e., the voxel is above the plot and the tree stem is inside the plot (CVt&p); (4) the incoming crown volume (CVin = CVp − CVt&p); (5) the outgoing crown volume (CVout = CVt − CVt&p); (6) the total canopy surface of all trees standing on a plot (CSt); (7) the total canopy surface that is exactly above the plot (CSp); (8) the canopy surface that fulfills both of the previous conditions, i.e., the surface is above the plot and the tree stem is inside the plot (CSt&p); (9) the incoming canopy surface (CSin = CSp − CSt&p); (10) the outgoing canopy surface (CSout = CSt − CSt&p). The means and ranges of these quantities were calculated for each spatial scale. Figure 1 illustrates some of these quantities.

Biomass Estimation Using Square Plots and CHM
The area-based biomass estimation from simulated lidar CHMs was tested at the different plot sizes. The AGB per plot was calculated as the AGB sum of all of the trees standing inside each square-shaped plot of 10, 20, 50 and 100 m side length, and was rescaled to t ha −1 units. The subset of voxels which represent the canopy surface was interpreted as a 1-m resolution CHM (as typically derived from small footprint discrete return lidar). The mean top-of-canopy height (TCH) was calculated by averaging the heights of all of the CHM pixels over each plot. Power law regression models describing AGB as a function of TCH (Equation (5)) were fitted using maximum likelihood estimation in R [29].

Biomass Estimation Using Large-Footprint Lidar
Large-footprint lidar waveforms were simulated to derive the mean canopy profile height (MCH) and link it to the AGB in the footprint. Unlike in the case of the CHM, in which only the voxels of the canopy surface were considered, for the waveform simulations all of the canopy voxels and also the ground voxels falling into the circular footprint areas were considered. The contribution of a voxel to the simulated waveform depended on three factors: (1) the number of crown voxels above, to account for light extinction in the canopy; (2) the distance to the footprint center, to account for the Gaussian energy distribution within a laser beam; and (3) the fact of whether the voxel was located in the canopy or on the ground. Figure 2 illustrates the waveform simulation for an example transect across a 65-m diameter footprint. Equation (6) describes the calculation of the lidar intensity I of each voxel (i.e., its contribution to the lidar waveform) mathematically. A voxel at the canopy surface and in the center of the footprint is assumed to contribute the highest possible intensity. If a voxel is further down in the canopy or more distant from the footprint center, its contribution to the reflected signal is reduced.
Light extinction was modelled using Beer-Lambert's law, with e being the base of the natural logarithm. The light extinction coefficient k was set to 0.2 [11]. LAI is the leaf area index above the voxel, calculated as the sum of the leaf area densities of all of the voxels lying directly above (same X and Y coordinate). The Gaussian energy distribution was modelled by the quotient term in the exponent of Equation (6), where d is the distance of the voxel to the footprint center in the XY-plane, and SD is the standard deviation of the Gaussian distribution, which was set to 0.5 times the footprint radius [30]. The reflectance of the forest ground voxels was down-weighted by dividing by 2.5 in order to account for the lower reflectivity of the ground vs. vegetation [31]. The reflected intensities of all of the voxels in each 1-m height layer were summed up and the waveforms were standardized to have a total sum of one ( Figure 2).
Waveforms were simulated for footprints of 23 m (GEDI) and 65 m (ICESat GLAS) diameters. A total of 171 footprints of each type were placed at 50 m regular spacing inside the 50-ha plot ( Figure A1). The AGB per footprint was calculated as the AGB sum of all of the trees standing inside the footprint circle. The mean canopy profile height (MCH) was calculated as the intensity-weighted mean height of the waveform. Power law regression models describing AGB as a function of MCH (Equation (7)) were fitted using maximum likelihood estimation in R. AGB = a · MCH b (7)  Waveforms were simulated for footprints of 23 m (GEDI) and 65 m (ICESat GLAS) diameters. A total of 171 footprints of each type were placed at 50 m regular spacing inside the 50-ha plot ( Figure A1). The AGB per footprint was calculated as the AGB sum of all of the trees standing inside the footprint circle. The mean canopy profile height (MCH) was calculated as the intensity-weighted mean height of the waveform. Power law regression models describing AGB as a function of MCH (Equation (7)) were fitted using maximum likelihood estimation in R.

Uncertainty Quantification
The uncertainty in the AGB estimation was quantified by comparing predictions (AGB pred ) derived from the power law regression models against observations (AGB obs ). AGB obs were the plot-level sums over all of the AGB tree (Equation (4)) values of the trees inside a plot. The AGB obs were rescaled to t ha −1 units for all of the plot sizes. The calculated statistics were the coefficient of determination (R 2 ), the root mean square error (RMSE, Equation (8)) and the normalized root mean square error (nRMSE, Equation (9)), with i representing each plot and n representing the total number of plots.

Periodic Boundary Conditions
In order to quantify the contribution of border effects to the uncertainty in AGB estimation, periodic boundary conditions were applied to the plots ( Figure 3). Under these conditions, all of the original incoming voxels, i.e., the voxels belonging to trees which have their stem outside of the focal plot, were removed. On the other hand, outgoing voxels-i.e., voxels belonging to trees which have their stem inside the focal plot, but are themselves outside the plot-were shifted to the opposite side of the plot. Hence, the new incoming crown voxels are the ones which are outgoing on the other side. As a visual example, consider the large orange colored tree located in the middle of the 1-ha forest area displayed in Figure 3c. Its stem (and crown center) is located in the central 20-m plot, but parts of its crown are outgoing, i.e., they extend into three other plots (the left, top and top-left neighbor). After enforcing periodic boundaries, these crown parts reappear in the focal plot (from the right and bottom side and bottom-right corner, respectively; Figure 3d). For the straight borders of the square-shaped plots, the establishment of the periodic boundaries was realized by adding (left or bottom border) or subtracting (right or top border) one plot width from the X or Y coordinates of the outgoing crown voxels, respectively. For the curved borders of circular footprints there is no exact correspondence. The following solution was applied in order to generate quasi-periodic boundaries for the circular footprints: for all of the trees with stems inside the circle, virtual twins were created by copying them to exist a second time outside of the circle at the position opposite to the circle center and with the same distance to the boundary, but outwards. Then, the crown voxels falling into the circle area were selected to simulate the waveforms, as described in Section 2.5 and Figure 2.
The uncertainty in the biomass estimation was quantified for the different plot sizes and shapes. Firstly, the uncertainty was quantified under normal conditions, where incoming and outgoing crowns disturb the height-biomass relationship. Secondly, the uncertainty was quantified under periodic boundary conditions, where border effects are non-existent. The difference in the nRMSE between the normal and periodic boundary conditions was used to quantify the contribution of the border effect. When this difference is divided by the overall nRMSE, the relative contribution of the border effect can be obtained. Please note, both quantities-nRMSE and the relative contribution of the border effect to the nRMSE-are given in %-units.

Proportions of Incoming and Outgoing Crowns
The analysis showed that, in small forest plots, a considerable proportion of the crown space was either incoming or outgoing. As expected, this proportion decreased with an increasing plot size. Table 1 summarizes the observed mean and maximum values for different proportions at the investigated spatial scales (standard deviations are given in Table A1). The outgoing crown volume is presented as a proportion of the crown volume of all of the trees with stems on the plot (CV out /CV t ). The incoming crown volume is likewise presented as proportion of the crown volume of all of the trees with stems on the plot (CV in /CV t ). This ratio can be larger than one, and can take on quite high values, because (unlike CV out ) CV in is not a subset of CV t . Thus, it is more meaningful to quantify the incoming volume as a proportion of the crown volume which is exactly located above the plot (CV in /CV p ). The canopy surface areas CS in and CS out are both presented as proportions of the plot area.
There was a close similarity between the proportions of the incoming/outgoing crown volume and the incoming/outgoing canopy surface. At the 10-m scale,~30% of CV and CS were incoming and outgoing, respectively. The proportions decreased to half the size (~15%) when doubling the plot side length (20 m) and decreased to one tenth (~3%) for the ten-fold plot side length (100 m, 1-ha plots).
The maxima of the incoming/outgoing crown volume were quite large. At the 10-m scale, in the most extreme cases, CV in contributed up to 95% of CV p and exceeded CV t by a factor 16. Such cases occur if the focal plot contains only very small trees with a small total crown volume and large trees from a neighboring plot dominate the canopy over the focal plot.
In the case of the canopy surface, at a 10-m scale, the CHM of the plot can be entirely formed by the incoming surface, and the outgoing canopy surface can exceed the plot size by a factor 6 due to the fact that single-tree crown projection areas can be far larger than the plot size (the maximum crown diameter 28.8 m corresponding to a 652 m 2 crown projection area). At the 100-m scale, the maxima of all of the investigated proportions were ≤7%. When the variability in the crown diameters was included in the analysis (using Equation (3) instead of (2)), most of the proportions listed in Table 1 slightly increased ( Table 2). This was expected, because introducing variability to crown diameters will lead to more crown volume overall, as the three-dimensional volumes are calculated based on the one-dimensional diameters. There are two concerns about including the variability: Firstly, it represents the variability of neotropical broadleaf rainforest trees in the database by [25], which is expected to be higher than the variability on BCI alone. Secondly, the variability in the crown shape is largely driven by competition for space among tree individuals, which was not accounted for in the simulations. For all of the further analyses, the crown diameters from the average allometry (Equation (2)) were used.

Biomass Estimation Results Using Square Plots and CHM
The biomass estimation uncertainty decreased strongly with the increasing plot size. The contribution of border effects also decreased with increasing plot size (Figure 4). At the 10-m scale, the overall nRMSE was 121%, with a contribution of 53% caused by border effects, and a residual nRMSE of 68% when border effects were excluded by periodic boundary conditions. At the 20-m scale, the overall nRMSE was 48%, of which 19% could be attributed to border effects and 29% was residual uncertainty. At the 50-m scale, the overall nRMSE was 17%, with 5% being contributed by border effects and 12% being residual uncertainty. At the 100-m scale, border effects became negligible, with an overall nRMSE of 7%, which did not decrease under periodic boundary conditions. Hence, the relative contributions of the border effect to the overall uncertainty were 44%, 40%, 29% and 0% at the 10-, 20-, 50-and 100-m scales, respectively ( Table 3). The R 2 values increased from 0.49 to 0.86 with the increasing scale, while under periodic boundary conditions they were ≥0.85 at all of the scales. Table 3. Summary of all of the normalized root mean square errors (nRMSE) and R 2 -values for the different plot sizes and footprints with and without border effects. The difference between the nRMSE with and without borders is called ∆nRMSE. If the ∆nRMSE is divided by nRMSE with border effects, the relative contribution is obtained.

Biomass Estimation Results Using Large-Footprint Lidar
The effect of spatial scale could also be observed for the waveform-based biomass estimations in the circular footprints. A larger footprint size resulted in a smaller overall uncertainty in the biomass estimation, as well as a smaller contribution of the border effect ( Figure 5). At the scale of a GEDI footprint (23 m), the overall nRMSE was 52%, with 12% being attributable to border effects and 40% being residual uncertainty. At the scale of an ICESat GLAS footprint (65 m), the overall nRMSE was 15%, with only 1% being attributable to border effects and 14% being residual uncertainty. Hence, the relative contributions of the border effect to the overall uncertainty were 23% at the GEDI scale and 6% at the GLAS scale ( Table 3). The R 2 values were 0.54 and 0.7 at the GEDI and GLAS scales, respectively, while under periodic boundary conditions they were ≥0.72 in both cases.

Biomass Estimation Results Using Large-Footprint Lidar
The effect of spatial scale could also be observed for the waveform-based biomas estimations in the circular footprints. A larger footprint size resulted in a smaller overal uncertainty in the biomass estimation, as well as a smaller contribution of the border effec ( Figure 5). At the scale of a GEDI footprint (23 m), the overall nRMSE was 52%, with 12% being attributable to border effects and 40% being residual uncertainty. At the scale of an ICESat GLAS footprint (65 m), the overall nRMSE was 15%, with only 1% being attribut able to border effects and 14% being residual uncertainty. Hence, the relative contribu tions of the border effect to the overall uncertainty were 23% at the GEDI scale and 6% a the GLAS scale ( Table 3). The R² values were 0.54 and 0.7 at the GEDI and GLAS scales respectively, while under periodic boundary conditions they were ≥0.72 in both cases. The scaling behavior of the nRMSE for the different approaches is summarized in Figure 6. The distances between the black filled and grey open symbols show the contri butions of the border effect to the AGB estimation at the different spatial scales, and wher the waveform-based estimates lie in comparison to the CHM-based estimates at simila scales. The scaling behavior of the nRMSE for the different approaches is summarized in Figure 6. The distances between the black filled and grey open symbols show the contributions of the border effect to the AGB estimation at the different spatial scales, and where the waveform-based estimates lie in comparison to the CHM-based estimates at similar scales.  Figure 6. Summary of the ways in which the normalized root mean square error (nRMSE) scales for biomass estimations from CHM (squares) and large-footprint waveforms (circles) with border effects (black filled) and without border effects (grey open). The curves represent the theoretical scaling behavior according to which the errors decline by (plot area) −1/2 or (plot side length) −1 [32]. The curves were drawn to match the nRMSE at the 20-m scale exactly, and they approximate the observed nRMSE at the other scales well.

Effects of the Gaussian Energy Distribution
The Gaussian horizontal energy distribution within the laser beam has the effect the tree crowns in the margins of the footprints contribute less to the signal than the crowns in the center. In order to test the magnitude of this effect, the simulations w also conducted without Gaussian weighting ( Figure A2). Without Gaussian weight the relative contributions of the border effect to the overall uncertainty were 28% (ins of 23%) for GEDI and 17% (instead of 6%) for GLAS (Table 3).
Please note that, under periodic boundary conditions, the R 2 values were alm equal across all of the scales. Hence, despite the observed scale dependence of the un tainties (nRMSE), the strengths of the relationships between the predicted and obser biomass were scale independent in the absence of border effects. These R 2 values w larger for the CHM-based estimates on square plots (~0.85, Figure 4b,d,f,h) than for large-footprint waveform-based estimates (~0.72, Figure 5b,d). The weaker R 2 for waveform-based estimates can be explained by the fact that, for CHM-based estim the positions of the trees per se do not play a role. The CHM is a continuous and hom neous sample of the canopy across the whole plot area. On the contrary, in the cas Gaussian energy distribution within a large lidar footprint, the signal contribution of e tree strongly depends on its position. Trees in the center of the footprint contribute m to the signal than trees near the border of the footprint. These differences are not con ered in the calculation of the ground-truth AGB. This interpretation is in line with the that the R 2 became higher when Gaussian weighting was switched off (0.75 and 0.87, ure A2b,d). Hence, in biomass estimation from large-footprint lidar, the Gaus weighting has two effects: it reduces the uncertainty due to border effects, but incre the uncertainty due to the unequal signal contribution from the central and marginal p of the footprint. This explains why, at the small scale of GEDI footprints, where bo effects are strong, the nRMSE increased slightly when Gaussian weighting was switc off (from 52 to 53%, Table 3). At the larger scale of the GLAS footprint, at which bo effects are weak, the nRMSE decreased slightly when Gaussian weighting was switc off (from 15 to 12%, Table 3) because of the equal signal contribution from the entire f print area. Figure 6. Summary of the ways in which the normalized root mean square error (nRMSE) scales for biomass estimations from CHM (squares) and large-footprint waveforms (circles) with border effects (black filled) and without border effects (grey open). The curves represent the theoretical scaling behavior according to which the errors decline by (plot area) −1/2 or (plot side length) −1 [32]. The curves were drawn to match the nRMSE at the 20-m scale exactly, and they approximate the observed nRMSE at the other scales well.

Effects of the Gaussian Energy Distribution
The Gaussian horizontal energy distribution within the laser beam has the effect that the tree crowns in the margins of the footprints contribute less to the signal than the tree crowns in the center. In order to test the magnitude of this effect, the simulations were also conducted without Gaussian weighting ( Figure A2). Without Gaussian weighting, the relative contributions of the border effect to the overall uncertainty were 28% (instead of 23%) for GEDI and 17% (instead of 6%) for GLAS (Table 3).
Please note that, under periodic boundary conditions, the R 2 values were almost equal across all of the scales. Hence, despite the observed scale dependence of the uncertainties (nRMSE), the strengths of the relationships between the predicted and observed biomass were scale independent in the absence of border effects. These R 2 values were larger for the CHM-based estimates on square plots (~0.85, Figure 4b,d,f,h) than for the large-footprint waveform-based estimates (~0.72, Figure 5b,d). The weaker R 2 for the waveform-based estimates can be explained by the fact that, for CHM-based estimates, the positions of the trees per se do not play a role. The CHM is a continuous and homogeneous sample of the canopy across the whole plot area. On the contrary, in the case of Gaussian energy distribution within a large lidar footprint, the signal contribution of each tree strongly depends on its position. Trees in the center of the footprint contribute more to the signal than trees near the border of the footprint. These differences are not considered in the calculation of the ground-truth AGB. This interpretation is in line with the fact that the R 2 became higher when Gaussian weighting was switched off (0.75 and 0.87, Figure A2b,d). Hence, in biomass estimation from large-footprint lidar, the Gaussian weighting has two effects: it reduces the uncertainty due to border effects, but increases the uncertainty due to the unequal signal contribution from the central and marginal parts of the footprint. This explains why, at the small scale of GEDI footprints, where border effects are strong, the nRMSE increased slightly when Gaussian weighting was switched off (from 52 to 53%, Table 3). At the larger scale of the GLAS footprint, at which border effects are weak, the nRMSE decreased slightly when Gaussian weighting was switched off (from 15 to 12%, Table 3) because of the equal signal contribution from the entire footprint area.

Discussion
The role of border effects in forest remote sensing was investigated for a tropical forest in Panama. We quantified what proportion of the tree crown volume and observed canopy surface does not belong to trees standing inside plots, and how this proportion depends on the plot size. While at small scales, e.g., 10 m, on average, large proportions of the canopy volume and surface were incoming and outgoing (~30%), these average proportions became small at the 100-m scale (~3%). This is explained by the smaller perimeter-to-area ratio of the larger plots.
We further analyzed the ways in which border effects affect biomass estimation based on a CHM or large-footprint lidar waveforms. The incoming and outgoing tree crowns influence the quality of the derived relationship between the canopy height and the biomass. Hence, the smaller the proportion of incoming and outgoing crowns, i.e., the larger the plot area, the smaller the uncertainty in the biomass estimations should be. The observed uncertainties in the biomass estimations decreased with increasing plot sizes. While at 10and 20-m scale, the border effect was responsible for ≥40% of the overall uncertainty, it had a contribution of 29% at the 50-m scale, and it had zero contribution at the 100-m scale. For the spaceborne large-footprint lidar systems, the border effect was responsible for 23% of the uncertainty in GEDI-based biomass estimates, and for 6% of the uncertainty in the ICESat GLAS-based biomass estimates.
The trend of increasing uncertainty with decreasing plot size was reported in previous studies [5,6,[9][10][11]. It has partly been attributed to geolocation errors, which have smaller impacts if the plots are large [7,33], but this source of error can be ruled out in controlled simulation experiments. It is partly an effect of spatial averaging, i.e., the variability of individual trees becomes less important when aggregating over larger areas. It was found that the perimeter-to-area ratio of the plots/footprints, rather than their area alone, is a major factor in this trend [34]. The explanation behind this is the border effect, which is supported by our results.
The strengths of the border effects are influenced by several factors, including the size and shape of plots/footprints and the energy distribution within them. Under the assumption of an equal square plot and footprint size, it is expected that the border effects are smaller for the large-footprint lidar for two reasons: (1) the smaller perimeter-to-area ratio of circles compared to squares, and (2) the Gaussian horizontal energy distribution within the laser beam, which lowers the contribution of the canopy space in the footprint margins. In this study, CHM-based and waveform-based estimation were not compared directly using equal plot sizes, but the described effect becomes obvious when comparing the results of the 50-m plots (2500 m 2 area) to the ones of the much smaller GEDI footprints (415 m 2 area) and the only slightly larger GLAS footprints (3318 m 2 area). The overall biomass uncertainty between 50-m square plots and GLAS was similar (18% and 15% nRMSE) but the relative contribution of border effects was much stronger for the 50-m square plots (29% vs. 6%). Even for the much smaller GEDI footprints, for which the overall uncertainty was three times higher (52% nRMSE), the relative contribution of the border effect was smaller (23%) than for the 50-m square plots (29%). Switching off the Gaussian energy distribution increased the contributions of border effects (from 23% to 28% for GEDI; from 6% to 17% for GLAS). Thus, the Gaussian energy distribution can be considered beneficial in the sense that it reduces the magnitude of border effects, but at the same time the unequal signal contribution of trees inside the footprint (central vs. marginal position) leads to a weaker height-biomass relationship, as was shown by the R 2 values. These findings can provide guidance about choosing the proper scales for minimizing border effects in biomass mapping. However, choices are constrained, because wall-to-wall maps by definition have square-shaped pixels (or in rare cases hexagonal [35], but never circular pixels) and laser beams, by design, have circular cross-sectional areas with Gaussian energy distribution.
It should be mentioned that the results of this study were derived using general allometries for neotropical trees [25]. Because these trees can grow to very large crown sizes, and because the site is an old growth forest, the reported incoming and outgoing crown fractions and the border effect contributions are probably at the high end of values when compared with other forest types worldwide. In temperate and boreal forestsparticularly those with coniferous trees, which often have more slender crowns-these fractions and contributions may be smaller. On the other hand, the experiment regarding the introduction of variability into the crown allometry has shown that border effects might even be slightly stronger than those derived by using the average allometric relation.
The finding that border effects no longer play a role at scales ≥1 ha is in accordance with previous work. Mascaro et al. [6] drew similar conclusions using a different approach, wherein they introduced the concept of crown-distributed-instead of stem-localizedbiomass for the reference ground truth. For high-resolution discrete lidar data, border effects could be avoided by using biomass estimation methods based on individual tree crown delineation. Such approaches allow us to produce very high (e.g., 1-m) resolution biomass maps from lidar that closely resemble crown-distributed biomass maps from ground inventory [36]. However, individual-based approaches are not applicable over large areas with current technology, due to limitations in data acquisition and distribution and computational capacities [37]; in particular, the waveforms produced by spaceborne lidar systems require area-based interpretations.
Simulation studies, like the one presented here, can help to disentangle the contributions of various sources of uncertainty in remote sensing interpretation, including the border effect, but also others. Previous studies using similar approaches have already analyzed geolocation-induced uncertainty [7] and allometry-induced and structure-induced uncertainty [38]. Future studies could include all of these aspects simultaneously in order to analyze the combined uncertainty and then partition it into its components. More generally, such simulations may serve as a method for the fusion of remote sensing data with dynamic forest models [39], which enables new interpretations of remote sensing products for forest structure [40] and new possibilities for model and allometry calibration [12,41].

Conclusions
Quantifying the fractions of incoming and outgoing canopy space over forest plots is difficult, and hence their contributions to biomass estimation uncertainty are hard to separate from other sources of uncertainty. In this study, these fractions were determined for a tropical rainforest in BCI, Panama, using a bottom-up simulation approach, in which remote sensing data were simulated from field inventory and allometric assumptions. The largest fractions of incoming and outgoing crown surfaces (30%) were found at the smallest investigated scale (10-m), and the smallest fraction (3%) was found at the largest investigated scale (100-m). The contributions of border effects to biomass estimation uncertainty were quantified by applying periodic boundary conditions to the individual forest plots. At the 10-m scale, the border effects accounted for 44% of the total RMSE, while they had no influence (0% of RMSE) at the 100-m scale. For ICESat GLAS (65 m) and GEDI (23 m), the contributions of border effects to the footprint-level biomass estimates were 6% and 23%, respectively. The differences in the strengths of border effects between CHM-based biomass estimates for square-shaped plots and waveform-based biomass estimates for circular footprints are caused by two factors: (1) the different perimeter-to-area ratios of squares and circles, and (2) the low signal contribution of the canopy in the margins of a footprint due to the Gaussian energy distribution in the laser beam. The values obtained for plots of different sizes and for data of different sensor types (CHMs and large-footprint waveforms) provide guidance for the choice of biomass mapping resolutions and for the understanding of the compositions of uncertainties. The simulation approach bears the potential to analyze other sources of uncertainties in the future.  Data Availability Statement: A publicly available forest inventory dataset was analyzed in this study. It can be found on Dryad: https://datadryad.org/stash/dataset/doi:10.15146/5xcp-0d46. private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Forest Global Earth Observatory (ForestGEO), a global network of large-scale demographic tree plots. We thank the three anonymous reviewers and the editors for their constructive comments on the manuscript.    in = incoming; out = outgoing; p = exactly above the plot; t = belonging to trees standing inside the plot.  . The right hand side shows the relationships derived from the data with the border effects switched off, using periodic boundaries (b,d).