Deriving Tree Size Distributions of Tropical Forests from lidar

: Remote sensing is an important tool to monitor forests to rapidly detect changes due to global change and other threats. Here, we present a novel methodology to infer the tree size distribution from light detection and ranging (lidar) measurements. Our approach is based on a theoretical leaf–tree matrix derived from allometric relations of trees. Using the leaf–tree matrix, we compute the tree size distribution that fit to the observed leaf area density profile via lidar. To validate our approach, we analyzed the stem diameter distribution of a tropical forest in Panama and compared lidar-derived data with data from forest inventories at different spatial scales (0.04 ha to 50 ha). Our estimates had a high accuracy at scales above 1 ha (1 ha: root mean square error ( RMSE ) 67.6 trees ha -1 / normalized RMSE 18.8% / R ² 0.76; 50 ha: 22.8 trees ha -1 / 6.2% / 0.89). Estimates for smaller scales (1-ha to 0.04-ha) were reliably for forests with low height, dense canopy or low tree height heterogeneity. Estimates for the basal area were accurate at the 1-ha scale ( RMSE 4.7 tree ha 1 , bias 0.8 m² ha -1 ) but less accurate at smaller scales. Our methodology, further tested at additional sites, provides a useful approach to determine the tree size distribution of by integrating information on tree allometries.

To conserve forests, their current status and future development must be monitored. In the past, the state of a forest has been determined using field measurements of sample plots (e.g., 1 ha). Recent studies, in turn, gathered large-scale plot data (e.g., 50 ha), regularly collecting detailed information such as the stem diameter or the species and location of each tree [13]. Over recent decades, remote sensing has become an increasingly used tool to monitor the state of forests [14,15].
Often, remote sensing observations have been analyzed by computing correlations with forest attributes (e.g., lidar metrics correlated with biomass or basal area) [23][24][25][26][27][28]. Such approaches have a high predictive power, but typically focus on predicting single forest attributes (like biomass) and require field-data-intensive calibration of site-specific parameters in the statistical relationships. Studies also combined statistical methods with information on tree geometry [29][30][31]. A good example for an approach that integrates tree geometry is Spriggs et al. [30], who combined allometric relations with a Bayesian optimization approach.
Recent studies have followed the approaches of tree segmentation [32][33][34][35][36] by detecting single tree crowns from airborne lidar to estimate tree size attributes and frequencies.
Those techniques interpret lidar point clouds and allow the detection of tree geometry relations (in particular for large trees). Similar approaches also use information from terrestrial lidar [37] with its strength to detect detailed tree leaves and branches, particularly in the lower canopy (often constrained to small forest plots). Both approaches require high-resolution lidar point clouds to have sufficient information on the trees.
Here, we present a novel approach for inferring forest structure from lidar measurements that is complementary to the recent approaches. Our approach integrates the available information on tree geometry in a theoretical model. Through its combination with the measured vertical lidar profile (e.g., derived from waveforms or lidar point clouds), we calculated how many trees of a certain size could occur in a forest to match the measured profile (Figure 1). This allowed us to estimate the tree size distribution of a forest (i.e., number of trees in different stem diameter classes) based on lidar measurements. Points are colored according to their height (with a gradient from the top with blue colors to the bottom with green colors). (b) The vertical leaf area profile can be derived from the lidar point cloud by accounting for decreasing lidar returns with decreasing height due to the tree leaf density (ground returns of height < 3 m are not shown). The vertical leaf area profile is then combined with the tree geometry model illustrated in (c). By calculating how many trees of specific size (illustrated by different shadings) can contribute tree crown leaves to the lidar -derived leaf area profile, we can estimate (d) the number of trees (per ha) in specific stem diameter classes (here, logarithmic y-axis, 50 ha plot of BCI, 20 cm diameter class width). See methods for details.
We tested our approach on a 50-ha forest plot in Panama. After determining the leaftree matrix, we derived the stem diameter distribution from lidar by applying classical methods of mathematics (matrix inversion). We validated the estimated stem diameter distribution by using field inventory data and demonstrated the accuracy of our approach at different spatial scales (50, 25, 5, 1, 0.25, and 0.04 ha). We analyzed how reliably we can estimate tree numbers per stem diameter class. In addition, we derived estimates for the total stem number and basal area from the lidar measurements.

Study Site
We applied our approach to a tropical forest located on Barro Colorado Island in Panama (BCI, 9°9′ N, 79°51′W) [38][39][40][41], where a 50-ha plot of old-growth tropical moist forest has been monitored every five years since 1980. The study site is located in a natural reserve area in the center of the island. In each census, each tree was tagged and mapped, its species was identified and its stem diameter was measured at breast height. In our study, we used the census of the year 2010, considering all trees of stem diameter ≥ 1 cm. The analyzed forest census included 244,269 trees and 301 tree species (1.5% of the originally recorded trees were disregarded because they were dead or lost, see Supplementary S1). The forest inventory data can be requested from the ForestGeo Global Earth Observatory Network [42] or downloaded for free from Dryad [43].
To relate the height and crown dimension of a tree to its stem diameter, we used three allometric equations: (a) tree height h (m) is related to the stem diameter d (m) by an asymptotic function [24,44] (b) tree crown radius cr (m) is related to the stem diameter d (m) by a power law [23,45] cr = 9.08 × d 0.68 (2) and (c) vertical tree crown length cl (m) is linearly related to the tree height by [23] The tree crown radius, length and the assumed shape were used to calculate the tree crown volume, which is multiplied with a density factor of ρ = 0.44 m²/m³ to estimate the tree leaf area [23]. The tree height and crown length were used to estimate the leaf area of a tree at different heights within the forest canopy (we assumed that trees' leaves are homogenously distributed within their crown). Tree allometries have been derived from independent field studies [23,24,44,45]. Here, we averaged tree allometries across occurring tree species, which has been shown to be a valid approach when studying the forest structure [46]. In the standard case, we assumed that tree crowns have an ellipsoidal shape, but we tested different crown shapes (spheres and cylinders) as well. We further tested the sensitivity of different allometries for tree height (non-asymptotic power law of tree height [45]) and of leaf density within crowns (factor of ρ = 1 m²/m³).
Airborne lidar data were available for the 50-ha forest plot for the year 2009 [47]. The point densities ranged from 0 to 103 m -2 (in flight swath overlaps) with a median of 19 m -2 . The point cloud was terrain-normalized and thinned using random subsampling to obtain a near homogeneous point density of four returns per m². The latter was achieved by iterative subsampling to different point densities and inspection of the resulting density rasters until no further density differences from the flight pattern were visible. The thinned point cloud contained 19% of the returns from the original point cloud. For details on the instrumentation and processing of the lidar data, see Lobo and Dalling [47] (Materials and Methods). The lidar dataset is publicly available and can be requested from J.W. Dalling (see statement on data accessibility in [47]).

Derivation of Leaf Area Profiles from lidar
The lidar data (i.e., lidar profile) includes information on the site-specific forest structure as lidar signals get reflected by the leaves and branches of trees. We assumed that the point density PD (1/m²) of the lidar profile in a specific height layer (h(i-1),hi) of constant width Δh = 1 m is a result of (a) the tree crown density (LAD) at which lidar signals could get reflected and (b) the probability (W) that the lidar signal could penetrate into the respective height layer i (i.e., has not already been reflected in the layers above): The parameter l represents a density factor (here, l = 1 m²/m³) that combines different aspects: the density of lidar shots sent to the forest canopy, the fraction of tree leaves (relative to tree branches) at which signals can be reflected and the density of successful lidar signals returned at the top of the forest (due to gaps in the canopy). The observed lidar point density PDi in (m -²) is then defined as with the plot area A (m²), the total number of lidar points K and The sum of the function δik yields the number of lidar points with the return height zk (m) in the respective height layer i. To compute the probability W, we used the Beer-Lambert law of light transmission and extinction [48].
where ΣLADj is the cumulative leaf area density (m²/m³) from the top of the forest (j = n) down to the height layer j = i+1. The parameter k in Equation (8) represents the average light extinction coefficient (k = 0.2 for near-infrared or infrared signals with wavelengths of 600-1400 µm) [48]. Based on Equation (5), we can then recursively estimate the forest's leaf area profile (LAD lidar ) via = × starting with a height layer above the forest canopy (i.e., LADn = 0 results in Wn = 1). We did not include ground returns in our approach and considered only height layers ≥ 3 m. The values of the parameters k and l are relevant for the derived leaf area profile, but had only minor influence on the estimation of the stem diameter distribution in our study ( Figure A1).

'Leaf-tree Matrix' of the Tree Geometry Model
To estimate the number of trees for different stem diameter classes (here, assumed to be unknown) from the lidar-derived leaf area profile, we developed a theoretical model that integrates the available information on tree geometry and allometry.
For this, we introduced the 'leaf-tree matrix'. The matrix summarizes the consequences of the assumed tree allometries (see Section 2.1) in a virtual forest in which exactly one tree is present for each stem diameter class. Figure 2 visualizes this matrix, which contains for each stem diameter class (columns) and height layer (colored rows) the corresponding leaf area per tree. White cells indicate that trees in the corresponding stem diameter class do not have leaves in the respective height layer. Height layers are defined in 1-m width, and stem diameter classes (of increasing width) correspond to the respective height layers (based on the assumed stem diameter-height relation, Section 2.1). Each tree has a maximum height (upper boundary of the height layer, based on the tree height allometry and diagonal entries of the matrix). Crowns reach only into the lower height layers (according to the tree crown allometry). Leaf area was assumed to be proportional to the tree crown volume (based on the tree crown allometry) and distributed uniformly among the crown-covered height layers (Section 2.1). See Figure A2 for the leaf-tree matrix with different assumed crown shapes and model assumptions.
As a consequence, the vertical leaf area distribution for a forest with only one tree per stem diameter class can be derived by summing up all column entries (i.e., stem diameter classes) per row (i.e., height layers) in the leaf-tree matrix. Figure 2. Visualization of the leaf-tree matrix (left) and an enlarged cut-out (right). Each column in the matrix reflects the leaf area contribution of one tree to different height layers. Columns represent different stem diameter classes. Colors in the matrix show leaf area values with a gradient from light green (low values) to dark green (high values) (white indicating no leaf area). On the right, an exemplary tree of 10 cm stem diameter is highlighted in the cut-out with its crown leaf area distributed to four height layers. Note the logarithmic x-axis of the stem diameter (cm).
We denote the leaf-tree matrix by F = (fij) with i,j = 1,…,n and the number of trees for each stem diameter class i by the vector N = (Ni)i=1,...,n (with n being the number of stem diameter classes, here n = 55). Multiplying the leaf-tree matrix F with the vector N results in the leaf area profile of a forest plot (LAD in m²/m³), in which we normalize by the plot area A (m²) and height layer width Δh (m) We aim to develop an approach to infer the stem diameter distribution based on lidar measurements. Therefore, we assumed that sufficient information about forest structure was included in the lidar-derived leaf area profile (LAD lidar of Equation (9)), which was used for the LAD (of Equation (10)). Rearranging Equation (10) yields the stem diameter distribution (unknown vector N).

Linear Equation Solving to Derive Forest Structure from lidar Profiles
Equation (10) represents a system of n linear equations with n unknowns. To rearrange this equation for inferring the unknown stem diameter distribution (vector N), the inverse of the leaf-tree matrix F is required. Since, by construction, the height of trees in the n-th stem diameter class never exceeds the n-th height layer, all above-diagonal matrix entries are zero. This makes the leaf-tree matrix F a lower triangular matrix, which is invertible (n x n matrix) [49]. Rearranging Equation (10) for the vector N then results in Though Equation (11) yields the number of trees (N) for each stem diameter class, some of the computed values can be negative. Therefore, we developed an iterative numerical backward approach for solving the linear equation system of Equation (10) (using LAD lidar to represent LAD). See Appendix A Table A1 for a comparison of results between the direct and numerical calculation and Appendix B for details on the algorithm of numerical calculation.

Analysis and Statistics of Results
We evaluated the accuracy of our approach by comparing the estimated stem diameter distribution with forest inventory data (i.e., stem numbers per ha and stem diameter class). To compare the stem diameter distribution observed in the field ( ) with that estimated from lidar data and our tree geometry model ( ), we performed a logarithmic regression analysis, setting ln( ) = + × ln( ) with the intercept I and the regression slope s. We analyzed the regression slope s and the coefficient of determination R² for stem diameter classes with stem numbers larger than zero only (in both inventory and estimates). Optimal values of R² = 1 and s = 1 would reflect the highest achievable accuracy of the lidar -derived tree numbers. We further calculated the root mean square error RMSE (in trees per ha) and its normalized counterpart nRMSE (in %, using the range of censused stem numbers across all stem diameter classes to normalize) = × max − min (13) with being the plot area in ha. We calculated both RMSE and nRMSE for (non-logarithmic) stem numbers aggregated to 10 cm stem diameter classes and for trees with stem diameter d ≥ 10 cm.
We further estimated the total tree density (N) and basal area (BA) of the forest based on the lidar-derived stem diameter distribution (using stem diameter classes of the leaftree matrix and class mid values for basal area estimates) and compared them with the field inventory. Again, we focused on classes with stem diameter above 10 cm. In addition, we compared the tree density and basal area estimates also for each 10-cm aggregated stem diameter class (j) with the inventory To assess the accuracy of our approach on different spatial scales, we varied the size of the analyzed plot. First, we applied our approach at the scale of 50 ha (entire forest plot). In a second step, we divided the forest plot into subplots of equal sizes of, respectively, 25, 5, 1, 0.25, and 0.04 ha. Subplot sizes of 25 × 25 m or lower are especially interesting as they correspond to the size of the footprint of satellite-based lidar observations (GEDI [17]). For a specific plot size, we calculated, for each subplot, the nine statistical measures (regression slope, R², RMSE, nRMSE, tree density, and basal area in total and absolute differences per 10-cm stem diameter class). For each spatial scale, we calculated the arithmetic mean, standard deviation, and range of the statistical measures across the respective subplots. For the mean values of R², regression slope, RMSE, and nRMSE, we used linear regression to identify scaling relations expressing these statistical quantities as functions of the plot size. To compare the estimated tree density and basal area with inventory data, we used linear regression, computed the bias (average difference of field from lidar-derived attribute), RMSE and nRMSE (normalized by the mean of the observed attribute).
We further correlated the R² values we obtained for the estimated stem diameter distribution (per subplot) (a) with the other obtained statistical measures (RMSE, nRMSE, and regression slope for the stem diameter distribution), (b) with characteristics of the lidar-derived leaf area profile (e.g., median profile height), and (c) forest attributes (e.g., tree density and basal area). For a full list of attributes used for the correlations, see Table  1. We evaluated each correlation by calculating Spearman's rank correlation coefficient r.  1 Median height of the vertical leaf profile derived from lidar (weighted by the derived leaf area density LAD lidar per height layer). 2 Variance of height of the vertical leaf profile derived from lidar (weighted by the derived leaf area density LAD lidar per height layer). 3 Including trees of stem diameter d ≥ 1 cm. 4 Including trees of stem diameter d ≥ 10 cm.

Stem Diameter Distribution Derived at the 50-ha Scale
The stem diameter distribution of the 50-ha forest plot shows a typical heavy-tailed pattern, as often observed in old-growth forests, with few large trees and numerous small ones (Figure 3a, linear appearance on log-log axes). To test our approach at the 50-ha scale, we derived the stem diameter distribution from lidar and compared the estimated tree numbers per stem diameter class with the observed numbers from the inventory (Figure 3). The stem diameter distribution was estimated from lidar with a high accuracy (RMSE = 22.8 tree per ha, nRMSE = 6.2%, R² = 0.89, Figure 3, Table A1).
The number of mid-sized and large trees was estimated accurately (10 cm < stem diameter ≤ 50 cm with RMSE = 39.4 trees per ha, stem diameter > 50 cm with RMSE = 1.2 trees per ha) while trees smaller than 10 cm in stem diameter were overestimated (RMSE = 590.5 trees per ha, Figure A3a). Based on the lidar-derived stem diameter distribution, the estimated tree density of 516.7 trees per ha (stem diameter d ≥ 10 cm) slightly overestimated the censused density of 447.3 trees per ha (bias of -69.4 trees per ha). The basal area estimates showed in turn, a slight underestimation (inventory: 30.1 m²/ha, lidar: 23.6 m²/ha, bias: 6.5 m²/ha, Figure A3a). A sensitivity analysis revealed only minor influences of differently assumed crown shapes, tree height allometry or the leaf density within crowns (Table 2, Figure A2).  Table 3. Table 2. Sensitivity analysis of different model assumptions (see also Figure A2). The changed model assumptions for each scenario are written in bold.

Small-Scale Derivations of Stem Diameter Distributions
We tested our approach also for smaller plot sizes (25, 5, 1, 0.25, and 0.04 ha, Table 3, Table A2). For most plot sizes (e.g., at the 1-ha and 0.25-ha scale), the stem diameter distributions were estimated well from the lidar data (Figure 4a Table 3, Table A2). Similar results can also be observed for the regression slope ( Figure A4). Interestingly, the R² values scaled with plot size (Figure 5, similarly also RMSE, nRMSE, and regression slope) and correlated non-linearly with the other statistical quantities (Table A3; however, the correlations of R² with RMSE or nRMSE were less strong, r = 0.09 and 0.05 at 1-ha scale).
Estimates on the tree density and basal area showed good results for the 1-ha scale with nRMSEs of 51.1% and 15.7%, respectively (bias of -168.2 trees/ha and 0.8 m²/ha, Figure 6, Table 3). The estimation capability for the tree density and basal area per stem diameter class did not change with decreasing plot size (mostly for small-sized classes) but showed larger variations for smaller plots (Figure A3c,d). Interestingly, the forest basal area was best estimated at the 1-ha scale ( Figure 6, Figure A5, Table 3, underestimated for  larger plot sizes, Table A2), while the tree density was best estimated at the 50-ha scale ( Figure 6, Table 3). Both attributes were overestimated by lidar at smaller spatial scales ( Table 3, Table A2).  Table 3 and Figure A4. Table 3. Comparison of forest attributes derived from lidar and from the forest inventory for different plot sizes. Results are shown for the regression analysis (slope and R²), RMSE and nRMSE (overall and for three different categories of small, mid-sized and large trees grouped according to stem diameter d) and aggregated forest attributes (tree density and basal area, for stem diameter d ≥ 10 cm). For each plot size, the mean ± standard deviation (and in brackets, minimum and maximum) of the respective attribute is given. The plot size is given in ha with the number of plots and plot dimension in m × m. Forest attributes are compared between lidar and inventory also in terms of the bias, RMSE, and nRMSE (%) (see methods for details). See Table A2 Table 3 and Table A2 for details).  Table 3. Please note the different axis ranges and sample sizes for the different spatial scales (see also Table 3).
To understand how well stem numbers can be derived from lidar using our approach (e.g., in case no inventories are available), we correlated the R² values of comparing inventory-and lidar-derived stem diameter distributions with other measures characterizing each plot for the spatial scales of 1, 0.25, and 0.04 ha. Those measures describe either the lidar profile or the forest structure (Table 1). We found correlations for the median height of lidar returns WMPH (weighted median profile height, weighted by the derived leaf area density LAD lidar , r = -0.19 to -0.31, Table A3) and total tree density (stem diameter d ≥ 1 cm, r = 0.12 to 0.27, Table A3, Figure 7).
The WMPH became more important with increasing spatial scale (as well as variance of leaf area density), while the total tree density showed less strong correlation at larger scales. Interestingly, the tree density, only for stem diameter d ≥ 10 cm, revealed an opposite trend with stronger correlation at the 1-ha scale than at smaller scales (Table A3). This suggests that forest plots characterized by a low and dense canopy could enable a more reliable estimation of the stem diameter distribution from lidar data. In addition, the forest heterogeneity (standard deviation of tree height) also plays a relevant role (r = -0.16 to -0.28, Figure 7c, Table A3). Surprisingly, a reliable estimation of the stem diameter distributions (in terms of high R² values) showed weak correlations with the number of lidar returns, median tree height, and forest basal area (Table A3).

Discussion
We presented here a novel approach for estimating the tree size distributions from lidar measurements. Our approach is solely based on the vertical profile of lidar data and infers the number of trees per stem diameter class by integrating the available tree size allometries. On the example of a tropical forest in Panama, we demonstrated that the presented method was able to estimate the stem diameter distribution with high accuracy not only at the 50-ha scale (RMSE of 22.8 trees per ha, nRMSE of 6.2%, R² of 0.89), but also at the 1-ha scale (average RMSE of 67.6 trees per ha, mean R² of 0.76). The basal area was estimated well at the 1-ha scale (bias 0.8 m² ha -1 ), but the bias increased at smaller scales (-8.0 m² ha -1 at 0.25-ha scale). We identified the most decisive factors for a good estimation of stem diameter distributions at smaller spatial scales as (a) the mean profile height (WMPH), (b) tree density, and (c) forest heterogeneity (in terms of tree height).
The change from a still good estimation of stem diameter distributions at larger spatial scales (50 ha to 0.25 ha) toward slightly biased estimates at the 0.04-ha scale probably could be related to the crown overlapping of large trees and edge effects (crowns from trees of neighboring plots). This might also affect the identified correlations between the goodness of fit and forest attributes, which are not strong (maximum values of 0.3 or -0.3, respectively). Forest inventory data including explicit tree locations allowed us to calculate the overlapping crown parts from neighboring plots and, thus, enabled us to quantify and to account for such effects. Furthermore, we demonstrated our approach on the example of an old-growth tropical forest site (at BCI, Panama). The results of small-scale estimations and the identified correlations could differ for forests of other biomes (like temperate or boreal forests) or for disturbed or managed sites (e.g., logged forests or forest plantations).

Strengths and Limitations of the Presented Approach
Our approach complements recent methods [23][24][25][26][27][28][29][30][31][32][33][34][35][36] as it (i) integrates already available information on tree geometry (captured in the leaf-tree matrix) and (ii) estimates the full range of size classes in tree size distributions. Tree size distributions can be used to derive different forest attributes (e.g., tree density or forest basal area) and allow us to estimate a forest's successional state or disturbances [50]. By this, our approach extends statistical methods that often focus on single forest attributes. In comparison to approaches that also integrate tree geometry with statistics, our approach obtained, for almost 85% of the analyzed plots at the 0.25-ha scale, an R² > 0.5 (e.g., Spriggs et al. [30] reached 73% of 0.25-ha plots with an R² > 0.5 in temperate forests). The presented approach complements tree segmentation methods. Such methods show their strength in predicting the size and geometry of large trees, while our approach effectively predicts small-and mid-sized trees in the forest, also at larger spatial scales.
Our approach is based on information about tree allometries, which are available for a wide range of different forest types and biomes (e.g., [44,51], information can also be found in national forest inventories). Tree allometries are assumed a priori in our approach and can differ across regions and forest types. This approach may not be suitable especially for large trees, leaning trees or trees with a complex geometry. We identified only minor influences of these model assumptions on the estimated stem diameter distribution for our study site ( Figure A2, Table 2). We demonstrated a good estimation of the forest basal area and total tree density based on lidar-derived tree numbers for stem diameters larger than 10 cm.
To improve the allometries, crown size relations can also be derived by tree crown detection algorithms based on lidar point clouds [32,33,52]. For example, Ferraz et al. [33] reported tree and crown allometries based on detected single trees and their respective uncertainties. This approach requires high-resolution lidar point clouds, particularly for correctly detecting small-and mid-sized trees (with small crowns) in the understory where the lidar point density is normally lower. While our approach shows its strength in estimating tree numbers of small-and mid-sized trees (based on the vertical lidar profile), tree crown detection approaches are especially interesting for applications at smaller spatial scales. Methods based on crown delineation and classification approaches or using terrestrial lidar measurements will, in the future, provide further knowledge on the crown shape of single trees (e.g., National Ecological Observatory Network (NEON)) [53,54].
Our approach includes only average tree size allometries across occurring tree species. We already demonstrated, in a previous study [46], that forest size structure can be explained by using average tree allometry. However, forest sites where crown shapes, in particular, (and the amount of leaf area per crown volume) differ more between tree species will be more difficult to analyze and will require a closer look. In this case, our method could be extended toward a linear combination of multiple leaf-tree matrices (one for each dominant tree species) if information on the species composition is available (e.g., by optical sensors) [55,56]. A recent study [31] demonstrated how stem diameter distributions of different plant functional types can be derived based on site-specific calibration.
Our method is based on vertical lidar profiles only (instead of the full lidar point cloud) and, thus, appears promising for applications to other large footprint lidar data. Previous studies demonstrated the comparability of profiles derived from lidar point clouds and airborne large-footprint waveforms (Land, Vegetation, and Ice Sensor (LVIS)) [57,58]. Two parameters of our methodology (the extinction parameter k and density factor l) for deriving the vertical leaf profile from lidar profiles, could depend on the used remote sensing technique.
While the extinction coefficient k reflects the light extinction in the forest canopy, which depends on the wavelength (e.g., optical vs. near infrared) [48], the density factor l is more phenomenological and might correlate with the used tree allometries. Terrestrial lidar could contribute to the estimation of l for forest plots by reconstructing the detailed tree crown architecture of branches and leaves. We identified only minor influences of both parameters (k and l) on the estimated stem diameter distribution for our study site ( Figure A1).
Other methods to derive leaf density profiles from lidar data include similar parameters in their approaches [59][60][61][62]. For example, the method used by Stark et al. [60] and Harding et al. [61] is based on the (logarithmic) ratio of point densities of two adjacent height layers [63] and includes an extinction coefficient. Tang et al. [59] combined the MacArthur-Horn-method with a gap probability approach [64]. The approach of Detto et al. [62], in contrast, is based on a stochastic radiative transfer model and accounts for multiple lidar returns. Methods, like small-footprint full-waveform lidar analyses, could improve estimates on leaf density profiles in the future [65].

Future Applications and Challenges
Airborne campaigns of lidar measurements have the advantage of providing forest measurements for hundreds of hectares, but are limited to the regional scale and are not able to cover continents. Satellite missions can provide full-waveform lidar data but only provide samples of space (e.g., GEDI has a footprint of 25 m diameter and spacings of 60 m along track and 600 m across track) [17]. Assuming that such spaceborne-generated profiles are comparable with profiles derived from point clouds or airborne-generated waveforms [57], the presented approach could be tested for different remote sensing data and for forest sites at which inventory and spaceborne lidar data is available.
This could be particularly interesting for exploring the site-dependency of tree allometries (and the parameters k and l) in our approach and testing its applicability for a wide range of forest types (especially regions characterized by a heterogeneous forest structure). As our results showed lower quality at smaller spatial scales (of e.g., 0.04 ha), questions on sampling efforts also arise. Do the results improve if we use several plots distant to each other (equivalent to spaceborne lidar footprints)? More specifically, it would be interesting to explore if we can aggregate discontinuously sampled lidar shots over an area to estimate the stem diameter distribution. How many lidar shots are required to be sampled to obtain good estimates of a specific forest site?
Recent studies (based on individual-based forest growth models) [31,66,67] demonstrated how remote sensing information can be combined with forest models to provide additional information about forests. Forest models simulate the long-term dynamics of forests (based on single trees) and can estimate forest successional states by matching simulated and remotely sensed vertical leaf profiles [66]. Our approach-which is based on related methodological assumptions-thus, also appears promising for estimating the successional state (caused, e.g., by disturbances).
As we demonstrated that stem diameter distributions in forest plots with a low dense canopy and low tree height differences can be estimated reliably, we expect that disturbances that occurred previously could be well identified. Challenges for spaceborne lidar could make such an analysis difficult as cloud cover can limit observations and repeated shots will not be located at the same position as before. The latter, however, would be interesting for the tracking of previously logged or disturbed forests. The combination of lidar measurements with other remote sensing techniques (e.g., radar data from the Tan-DEM-X satellite), thereby, seems promising to allow for an interpolation between lidar shots for large-scale analyses [68]. lidar measurements covering continents and combined with maps of forest cover (e.g., [3]) could enable the calculation of the forest structure and stem diameter distributions at large scale by using the suggested approach if the tree crown allometry is known. Various forest attributes can then also be derived from the estimated lidar-based stem diameter distributions (e.g., forest height, biomass, or basal area) and compared not only with local inventories but also with already existing information, for example, for the Amazon in terms of the basal area or forest biomass [69]. The approach proposed here can, thereby, contribute to the in-depth mapping and monitoring of forests and for supporting sustainable management, conservation, and the protection of forests.

Conclusion
We presented a novel approach for predicting the stem diameter distribution of forests based on lidar measurements. Our approach is profile-based and structure-based. This method requires only a vertical lidar profile (instead of point clouds) and integrates the available tree size allometries to infer the number of trees from the profile. By this, our approach complements previous methods based on statistical correlations between lidar and forest attributes or individual tree crown detection methods based on high-resolution lidar point clouds.
We demonstrated, in this study, a test of our approach with good accuracy for an oldgrowth tropical forest. Further tests of our method for other forest sites (at which tree allometries, lidar data and, for comparison, inventories are available) could help to comprehensively understand the impact of forest stand characteristics and plot size on the accuracy of the estimated stem diameter distributions from lidar.    Figure A2. Visualization of the leaf-tree matrix and the resulting comparison of the measured tree numbers per stem diameter class (from inventory) with the estimated tree numbers (from lidar) for different sensitivity analysis. Upper graphics: each column in the matrix reflects the leaf area contribution of one tree to different height layers. Different columns represent different stem diameter classes. The leaf area values are shown using a color gradient from light green (low leaf area) to dark green (high leaf area) (white represents no leaf area). Lower graphics: green points show the estimated tree numbers from lidar for each stem diameter class and the grey line shows the observed stem numbers in the inventory. Values of R² and RMSE are given (see Table 3 for further results and a description of changed model assumptions in each scenario).  Figure A3. Evaluation of uncertainties in the lidar-derived tree density (in trees per ha) and lidarderived basal area (in m²/ha) for three spatial scales: (a) 50 ha, (b) 1 ha, and (c) 0.04 ha. Uncertainties are calculated as the absolute difference of stem numbers or basal area (logarithmic y-axes) per stem diameter class (x-axes) between the results from lidar and field data (points show the mean and the solid line shows the range from minimum to maximum). Note that zero differences are not displayed.

Numerical backward calculation to solve the linear equation system (Equation (10))
We start from the leaf area density profile LAD lidar derived from the lidar observations (see methods). We then define L = A × LAD lidar as the leaf area across all height layers, with A as the plot area in m².
Starting from the forest top, down to the forest floor, we calculate, for each height layer i, in the leaf profile L: 1. If the leaf area Li in height layer i is larger than the corresponding entry fii in the leaftree matrix, we calculate the number of stems in the corresponding stem diameter class by Ni = ⌊Li/(fii)⌋.
1b. The leaf area corresponding to the calculated number of trees Ni in the respective stem diameter class i is then subtracted from all height layers j (below layer i) in which those trees also reach in with their crown: Lj = Lj -Ni × fij.
2. If the leaf area Li in height layer i is lower than the corresponding entry fii in the leaftree matrix F, we set the number of stems in stem diameter class i to zero (i.e., Ni = 0).
Step 1 and 2 are repeated iteratively for the lower height layers. See also Supplementary S1 for the R-code.