Crown Structure Metrics to Generalize Aboveground Biomass Estimation Model Using Airborne Laser Scanning Data in National Park of Hainan Tropical Rainforest, China

: Forest aboveground biomass (AGB) is an important indicator for characterizing forest ecosystem structures and functions. Therefore, how to effectively investigate forest AGB is a vital mission. Airborne laser scanning (ALS) has been demonstrated as an effective way to support investigation and operational applications among a wide range of applications in the forest inventory. Moreover, three-dimensional structure information relating to AGB can be acquired by airborne laser scanning. Many studies estimated AGB from variables that were extracted from point cloud data, but few of them took full advantage of variables related to tree crowns to estimate the AGB. In this study, the main objective was to evaluate and compare the capabilities of different metrics derived from point clouds obtained from ALS. Particularly, individual tree-based alpha-shape, along with other traditional and commonly used plot-level height and intensity metrics, have been used from airborne laser scanning data. We took the random forest and multiple stepwise linear regression to estimate the AGB. By comparing AGB estimates with ﬁeld measurements, our results showed that the best approach is mixed metrics, and the best estimation model is random forest ( R 2 = 0.713, RMSE = 21.064 t/ha, MAE = 15.445 t/ha), which indicates that alpha-shape may be a good alternative method to improve AGB estimation accuracy. This method provides an effective solution for estimating aboveground biomass from airborne laser scanning.


Introduction
Forest biomass is a fundamental parameter for characterizing forest ecosystem structures and functions, which is the data of basic features for studying forest ecosystems [1]. Forest aboveground biomass (AGB) plays a vital role in the carbon cycle and the greenhouse effect reduction, as the energy base and material source for forest ecosystem works [2]. Especially, the rainforest is the most resistant and stable ecosystem on earth, with hot perennial climate, rich rainfall, rapid biological community succession, and abundant biodiversity [3,4]. Therefore, its biomass surveys are important for their water cycle, climate regulation, and organic matter conversion.
Although traditional surveys have a high precision on the ground, they are destructive to forest environments and there are some uncertainties in the selection of sample plots; there can also be residual variability, parameter estimation errors in the calculations, and a large labor force is required [5][6][7]. Remote sensing has the advantages of large scale, fast operation and low cost, and has become an important technique for investigating and monitoring forest resources [8]. estimation models with field measurements and some variables extracted from the point cloud, and compared the advantages of crown metrics and other metrics for performing AGB estimation. Research from this study will provide insights into the rainforest situation at Hainan Island that improve the estimation accuracy in understanding forestry resources for scientific management and better development.

Study Area
The study area is located in the National Park of Hainan Tropical Rainforest in Hainan Island, China (108 • 36 ~109 • 57 E, 18 • 23 ~19 • 11 N, Figure 1), which is located in the northwestern part of the South China Sea. The island has dense tropical rainforests, and the hydrothermal conditions are superior. Forest coverage exceeds 50% and the composition of vegetation is rich and diverse with a tropical monsoon maritime climate. Hainan Island is rich in rainfall, rivers, and hydrological resources. Therefore, monitoring and recording an inventory of aboveground biomass of the island is beneficial to sustaining biodiversity, forest management, forest resources development, and the tourism industry [19,20]. The study area covers about 4900 km 2 , which is one-seventh of the area of Hainan Island. The dominant tree species in the plantation forest are mixed broad-leaved forest, Hevea spp., Eucalyptus spp., Acacia spp., Cunninghamia lanceolata, and Foreign Pines in the study area [21]. It is located on a hilly landform at an altitude of 100-1876 m. The annual rainfall is about 1759 mm with an average temperature of 21.6 • C, and the relative humidity is 80%. May to November are the rainy months and December to April are usually dry.
construct parameters for each tree in this paper. We took the National Park of Hainan Tropical Rainforest in southern China as the research area. In this forest, we established AGB estimation models with field measurements and some variables extracted from th point cloud, and compared the advantages of crown metrics and other metrics for per forming AGB estimation. Research from this study will provide insights into the rainfores situation at Hainan Island that improve the estimation accuracy in understanding forestry resources for scientific management and better development.

Study Area
The study area is located in the National Park of Hainan Tropical Rainforest in Hai nan Island, China (108°36′~109°57′ E, 18°23′~19°11′ N, Figure 1), which is located in th northwestern part of the South China Sea. The island has dense tropical rainforests, and the hydrothermal conditions are superior. Forest coverage exceeds 50% and the composi tion of vegetation is rich and diverse with a tropical monsoon maritime climate. Hainan Island is rich in rainfall, rivers, and hydrological resources. Therefore, monitoring and recording an inventory of aboveground biomass of the island is beneficial to sustaining biodiversity, forest management, forest resources development, and the tourism industry [19,20]. The study area covers about 4900 km 2 , which is one-seventh of the area of Hainan Island. The dominant tree species in the plantation forest are mixed broad-leaved forest Hevea spp., Eucalyptus spp., Acacia spp., Cunninghamia lanceolata, and Foreign Pines in th study area [21]. It is located on a hilly landform at an altitude of 100-1876 m. The annua rainfall is about 1759 mm with an average temperature of 21.6 °C, and the relative humid ity is 80%. May to November are the rainy months and December to April are usually dry

ALS Data Acquisition
The airborne laser scanning data were collected by National Forestry and Grassland Administration from March 2020 to February 2021, using a RIGEL VQ1560i-DW lase scanning system carried on a Cessna 208B aircraft on two sorties. The data format wa LAS, and the average point cloud density of the plantation was 10 points/m 2 . We collected ALS data for 217 routes with 71 h flight duration. The detailed scanning parameters o ALS are shown in Table 1.

ALS Data Acquisition
The airborne laser scanning data were collected by National Forestry and Grassland Administration from March 2020 to February 2021, using a RIGEL VQ1560i-DW laser scanning system carried on a Cessna 208B aircraft on two sorties. The data format was LAS, and the average point cloud density of the plantation was 10 points/m 2 . We collected ALS data for 217 routes with 71 h flight duration. The detailed scanning parameters of ALS are shown in Table 1. The raw ALS data coordinates were calculated using position and orientation system (POS) and continuously operating reference stations (CORS). A POS was integrated on the LiDAR sensor. The POS model was Applanix AP60 (produced by Trimble Inc., Sunnyvale, CA, USA), which has an integrated global navigation satellite system (GNSS) receiver (that performs georeferencing of the data) and an inertial measurement unit (IMU) (which measures the drone's multi-directional movements and orientation and assists in increasing the accuracy of data georeferencing). The continuously operating reference stations (CORS) data were provided by Hainan Administration of Surveying Mapping and Geoinformation. Following this we used LiDAR360 (version 4.1 developed by GreenValley Co., Ltd., Beijing, China) to process within strip mosaic, point cloud classification, noise reduction, height normalization, individual tree detection, and individual tree segmentation. Among them, the individual tree segmentation algorithm used Li et al.'s [22] development algorithms, which were integrated into LiDAR360.

Inventory Data
The field data include sample-plot survey results collected during the ALS data acquisition period. On account of the topographic relief, accessibility of sample plots location, and the operability of setting sample plots, some plots were set outside the study area but the tree species were the same as within the study area. Meanwhile, these plots were recorded by airborne laser scanning and had corresponding point cloud data. As a result, we obtained 166 circular plots (Figure 1), each with a radius of 15 m, with real-time kinematic (RTK) used to measure the position of the center of each plot. The laser altimeter (Haglof Vertex Laser developed by Haglöf Sweden) was used to measure the height of each tree and the DBH of each tree was measured with a diameter tape. The crown width of the north-south and west-east directions of each tree were measured with tapes. The distribution of sample plots is shown in Figure 1. The specific sample plot information is shown in Table 2. Note: m ± n, m is the median of the tree parameters for each tree species, n is the maximum value by which this parameter fluctuates up or down.

AGB Calculation
The AGB of all tree species were calculated by using the AGB formula in [23]. Each of the AGB models is shown in Table 3.

Crown Features Extraction
Alpha-shape is a classical algorithm of point cloud that outlines extraction, in which polyhedral generated precision was controlled from a parameter of α [24,25]. Selecting an appropriate α value to construct alpha-shape for a given point cloud data is able to return the original shapes approximately [26]. An alpha-shape polyhedral of a plot is shown in Figure 2.

AGB Calculation
The AGB of all tree species were calculated by using the AGB formula in [23]. Each of the AGB models is shown in Table 3.

Crown Features Extraction
Alpha-shape is a classical algorithm of point cloud that outlines extraction, in which polyhedral generated precision was controlled from a parameter of α [24,25]. Selecting an appropriate α value to construct alpha-shape for a given point cloud data is able to return the original shapes approximately [26]. An alpha-shape polyhedral of a plot is shown in Figure 2.

Feature Variables Extraction
According to the structural features of point cloud data, considering the forest stand characteristics, canopy index, and ecological index, the four categories of feature parameters were extracted from LiDAR data for each field plot.  measurements. Otherwise, 10 density variables were also extracted at different elevations. The point cloud elevation is divided into 10 layers with the same height interval from lowest to highest within each plot. The proportion of echoes in each height slice is the corresponding density metric. As a result, we extracted 56 statistical parameters related to elevation. The depiction of elevation metrics is shown in Table 4.

(b) Intensity metrics
The intensity metrics are similar to the elevation metrics, but they are different from the elevation information of point cloud. A total of 42 intensity variables were calculated and the variable descriptions are shown in Table 4.
(c) Alpha-shape metrics Alpha-shape algorithm was constructed in MATLAB, and we extracted 72 variables corresponding to tree crown with alpha-shape algorithm and statistical metrics derived like Table 4. These variables are shown in Table 5.

(d) Stand metrics
For the forest stand feature, four feature parameters were calculated, including canopy density, leaf area index, gap fraction and density of trees [27]. Canopy cover is the percentage of the vertical projection of forest canopy to forest land area, which plays an important role in forest ecology and resource management [28]. It is also an essential factor for estimating forest aboveground biomass. The leaf area index (LAI) is one of the most fundamental parameters for forest canopy structure and is a composite indicator of optical energy utilization and tree crown structures [29]. Additionally, the physiological and physical processes of the vegetation were reflected from the LAI, therefore it is closely related to forest biomass. Gap fraction can reflect illumination in forest stand and growth of the understory layer [1]. The competition of soil, water, and fertilizer among plants is also reflected indirectly in the gap fraction. The density of trees can indicate the utilization level of the space occupied by trees. As a result, the features of Tables 4-6 were calculated in LiDAR360. The four variables are shown in Table 6. Table 4. ALS-derived tree height metrics, density metrics, and intensity metrics.

Variable Abbreviation Description Reference
H max (I max ) Maximum tree height (intensity) [30] H min (I min ) Minimum tree height (intensity) H mean (I mean ) Mean tree height (intensity) H med (I med ) Median tree height (intensity) H var (I var ) Variance of tree heights (intensity) H std (I std ) Standard deviation of tree heights (intensity) H aad (I aad ) Average absolute deviation of tree heights (intensity) Kurtosis of tree heights (intensity) Skewness of tree heights (intensity) H mm (I mm ) Median of median absolute deviation of tree heights (intensity)  Table 5. Tree crown metrics of alpha-shape complexity derived.

Abbreviation Description
CV max , CV min , CV mean , CV std , CV var , CV med , CV cv , CV kurt , CV skew , CV iq The volume as the alpha-shape complexity, unit: m 3 CSA max , CSA min , CSA mean , CSA std , CSA var , CSA med , CSA cv , CSA kurt , CSA skew , CSA iq The surface area as the alpha-shape complexity, unit: m 2 The width of the X-axis as the alpha-shape complexity, unit: m The width of the Y-axis as the alpha-shape complexity, unit: m The length of the Z-axis as the alpha-shape complexity, unit: m The volume of the 3-D alpha-shape Sur f The surface area of the 3-D alpha-shape Note: Alpha-shape metrics statistics for each plot as same as height and intensity metrics of point cloud data, maximum (max), minimum (min), average (mean), standard deviation (std), variance (var), median (med), coefficient variation (cv), kurtosis (kurt), skewness (skew), interquartile range (iq), respectively. Table 6. Forest stand metrics.

Variable Abbreviation Description References
LAI Leaf Area Index, half of the surface area of all leaves projected on the surface area of a plot [28,29]

CC
Canopy cover, the ratio of the first vegetation echoes to the total number of first echoes GF Gap fraction, the ratio of ground points to total points in a plot TD Density of trees, the ratio of tree numbers of individual tree segmented from point cloud in each plot area

Regression Modeling of AGB
In this paper, random forest and stepwise linear regression were used for establishing AGB models. The random forest and linear regression models were established by R programming.
Random forest (RF) is a robust machine learning algorithm that is constructed by combining the results of various decision trees and bagging the original dataset to select samples. Meanwhile, random forest is also a common feature variable selection method, and the features importance was sorted by increasing mean square error (IncMSE) [31,32], whose formula is shown in Equation (1): where: ntree is the number of random forest trees; OOB (out of bag) is a randomly selected sampling dataset; OOB error is the error of OOB when the sampling dataset is not changed; and OOB error is the error of OOB when the sample set is changed. Multiple stepwise linear regression (MSLR) considers the contribution of all independent variables to the dependent variable, step-by-step iterative establishment of a regression model, and finally the selection of independent variables to be built ultimate model [33]. We used the selected independent variables in random forest to populate the equation. The MSLR formula is shown in Equation (2): y = a 1 x 1 + a 2 x 2 + a 3 x 3 + · · · · · · + a n x n + ε where: a 1 , a 2 , a 3 , · · · , a n are constants, x 1 , x 2 , x 3 , · · · , x n are independent variables, and ε is the error term.

Precision Assessment
The correspondence of the estimates with the reference data was evaluated by the coefficient of determination R 2 , root mean squared error (RMSE), and mean absolute error (MAE). However, R 2 value will always increase when models add independent variables in MSLR, so we use adjusted R 2 to assess the models' precision. The four assessment indicators were calculated according to: where:ŷ i is predicted values; y is the average of predicted values; y i is observed values, n is the number of samples, k is the number of independent variables. Where:ŷ i is predicted values, y is average of predicted values, y i is observed values; n is the number of samples; and k is the number of independent variables.

Feature Selection
We set the proportion of training samples and testing samples as 60%:40%. Among the 174 feature variables extracted from the point cloud data, the top 15 feature variables importance was sorted from height, intensity, alpha-shape, and mixed metrics, respectively, and four features importance were sorted by stand metrics. The importance rankings selected are shown in Figure 3. Figure 3 shows the most important variables according to increasing the mean square error (IncMSE). The variable with the maximum IncMSE is tree density (TD), which is a suitable variable in forest stand features, followed by the surface area of 3-D alpha-shape (Surf ), I mm , AI H 60th , which are extracted from alpha-shape metrics, intensity metrics, and height metrics, respectively. Furthermore, Figure 3e showed the top 15 features of all variables. Among them, the number of height metrics is the largest, followed by the alpha-shape, stand, and intensity metrics, respectively, of the whole variables. It indicates that the tree height and alpha-shape metrics are dominant, and they are more important to AGB.
As can be seen from the variables, the percentiles and accumulative interpercentiles are more important than the descriptive statistical variables in the height metrics and intensity metrics. There are significant IncMSE for in stand metrics, the four variables IncMSE > 1000. It is apparent that tree density, canopy cover, gap fraction, and LAI are important to AGB. The difference in the extremal value range of IncMSE between alpha-shape metrics and other metrics is very wide, besides, the descriptive statistical variables occupy the main part of the top 15 important features.   Figure 3 shows the most important variables according to increasing the mean square error (IncMSE). The variable with the maximum IncMSE is tree density (TD), which is a suitable variable in forest stand features, followed by the surface area of 3-D alpha-shape (Surf), , , which are extracted from alpha-shape metrics, intensity metrics, and height metrics, respectively. Furthermore, Figure 3e showed the top 15 features of all variables. Among them, the number of height metrics is the largest, followed by the alpha-shape, stand, and intensity metrics, respectively, of the whole variables. It indicates that the tree height and alphashape metrics are dominant, and they are more important to AGB.
As can be seen from the variables, the percentiles and accumulative interpercentiles are more important than the descriptive statistical variables in the height metrics and intensity metrics. There are significant IncMSE for in stand metrics, the four variables IncMSE > 1000. It is apparent that tree density, canopy cover, gap fraction, and LAI are important to AGB. The difference in the extremal value range of IncMSE between alphashape metrics and other metrics is very wide, besides, the descriptive statistical variables occupy the main part of the top 15 important features.

Correlation Analysis
The correlation coefficient analysis was carried out between AGB and the importance features. The bar charts in Figure 4 show the correlation between variables of each group and AGB.
Among the 15 height features of the point cloud data, their correlation coefficient value was significantly and positively correlated with AGB (r > 0.70, p < 0.01). Overall, elevation percentiles features and AGB are the most correlated. However, the correlation between variables in intensity metrics, alpha-shape metrics, and stand metrics are negatively and positively correlated with AGB (p < 0.01), as shown in Figure 4b-d. Among all variables, the surface area is the most correlative to AGB. From the bar charts, it can be seen that the absolute extreme difference of the correlation coefficient is large. In addition, the variables associated with the Z-axis of alpha-shape are positively correlated the same as height metrics. While the descriptive statistical variables of CSR and crown width are negatively correlated with AGB.

Correlation Analysis
The correlation coefficient analysis was carried out between AGB and the importance features. The bar charts in Figure 4 show the correlation between variables of each group and AGB.   Figure 5 and Table 7 show the aboveground biomass models of random forest parameters. First, the data were normalized to remove the effect of dimension, then the selected variables were set to establish the AGB models.

AGB Estimation Models
The results of the random forest models are shown in Figure 5, where it can be seen that mean square error (MSE) decreases and R 2 increases gradually as the number of trees (ntree in random forest algorithm) increases. It is obvious that the trend is stable when the value of ntree reaches approximately 200. Combining the MSE and R 2 shows that after stabilization the error of intensity metrics is the largest, and R 2 is the smallest. There is a clear characteristic that R 2 < −1 when a few the ntree are applied, indicating that the fitting  Among the 15 height features of the point cloud data, their correlation coefficient value was significantly and positively correlated with AGB (r > 0.70, p < 0.01). Overall, elevation percentiles features and AGB are the most correlated. However, the correlation between variables in intensity metrics, alpha-shape metrics, and stand metrics are negatively and positively correlated with AGB (p < 0.01), as shown in Figure 4b-d. Among all variables, the surface area is the most correlative to AGB. From the bar charts, it can be seen that the absolute extreme difference of the correlation coefficient is large. In addition, the variables associated with the Z-axis of alpha-shape are positively correlated the same as height metrics. While the descriptive statistical variables of CSR and crown width are negatively correlated with AGB.  Table 7 show the aboveground biomass models of random forest parameters. First, the data were normalized to remove the effect of dimension, then the selected variables were set to establish the AGB models. shape metrics; (d) Stand metrics; (e) Mixed metrics.  Table 7 show the aboveground biomass models of random forest parameters. First, the data were normalized to remove the effect of dimension, then the selected variables were set to establish the AGB models.

AGB Estimation Models
The results of the random forest models are shown in Figure 5, where it can be seen that mean square error (MSE) decreases and R 2 increases gradually as the number of trees (ntree in random forest algorithm) increases. It is obvious that the trend is stable when the value of ntree reaches approximately 200. Combining the MSE and R 2 shows that after stabilization the error of intensity metrics is the largest, and R 2 is the smallest. There is a clear characteristic that R 2 < −1 when a few the ntree are applied, indicating that the fitting is poor. The alpha-shape metrics have the smallest MSE and the largest R 2 . This indicates that the accuracy of random forest regression based on variables of alpha-shape is the best, followed by height, stand, and intensity metrics.  The results of the random forest models are shown in Figure 5, where it can be seen that mean square error (MSE) decreases and R 2 increases gradually as the number of trees (ntree in random forest algorithm) increases. It is obvious that the trend is stable when the value of ntree reaches approximately 200. Combining the MSE and R 2 shows that after stabilization the error of intensity metrics is the largest, and R 2 is the smallest. There is a clear characteristic that R 2 < −1 when a few the ntree are applied, indicating that the fitting is poor. The alpha-shape metrics have the smallest MSE and the largest R 2 . This indicates that the accuracy of random forest regression based on variables of alpha-shape is the best, followed by height, stand, and intensity metrics.
In addition, the model started training with a small ntree and the MSE and R 2 performance were unstable, while the performance tends to be stable when the number of trees weas increasing and taking into account the removal R 2 < −1 of intensity metrics model. Secondly, Excessive ntree will lead to overfitting. Therefore, the optimal parameters were found by counting the variation in MSE and R 2 between 8 to 200 trees. Table 7 lists the optimal parameters for random forest regression of each variables group. It can be seen from the ntree in Table 7 that the ntree with the best MSE and R 2 of height variables was the same, while there is a great difference for the other three groups of variables. Overall, the most stable model is the height metric-based model, while the most accurate model is the alpha-shape based model. To compare the difference between min MSE and max R 2 , the optimal number of trees is 26 for alpha-shape variables. Table 8 illustrates the aboveground biomass models of MSLR. The AGB model of each linear regression was established by using the selected important variables based on the four categories of feature parameters. The table below illustrates some of the main characteristics of the regression coefficients and important independent variables. The larger the absolute value of the regression coefficient, the stronger explanation of the variable for AGB, and the higher the number of independent variables, the better the relationship between the response of AGB and LiDAR signal. It is apparent from this table that the mixed metrics model is the same as the alpha-shape metrics model. We used the Durbin-Watson test, F statistics and R 2 adj to assess the MSLR models, which showed that the surface area of the crown was a better and important variable to estimate the AGB, followed by the height, stand, and intensity metrics.

Regression Model Precision Assessment
Figures 6 and 7 summarize the accuracy of the evaluation results of AGB estimation models. Figure 6i,j show that the result using random forest based on mixed metrics is closer to the red line (the 1:1 line) (R 2 = 0.713, RMSE = 21.064 t/ha, MAE = 15.445 t/ha), compared to others results, followed by alpha-shape, height, stand, and intensity metrics. In the model of alpha-shape metrics based, the training R 2 > 0.7 and the testing R 2 are closed to 0.7. It shows that the performance is more stable than height, stand and intensity metrics.
As can be seen from Figure 7, there was an optimal relationship between measured AGB and alpha-shape metrics in stepwise linear regression, the same as random forest. The R 2 adj > 0.6 of the training model (R 2 adj = 0.607) and testing model (R 2 adj = 0.711) derived by alpha-shape variables, indicating that variables are more related to individual trees for the great estimation effect, followed by height, stand and intensity, but the conspicuous difference between RMSE and MAE in training and testing model. However, in the testing datasets, intensity metrics performed extremely badly.      Figures 3 and 4 show the relationship between AGB and the variables extracted from ALS data. In stand metrics, IncMSE all variables are greater than 1000, indicating that tree density (TD), leaf area index (LAI), canopy cover (CC) and gap fraction (GF) are important to AGB. Tree density delineated not only the number of trees in the stand, but also an important indicator for structure of the ecological system and biological cycling [34,35]. Canopy cover is also an indicator of stand density the same as tree density. Furthermore, it is an important parameter in forest management and estimating forest volume [36][37][38].

The Important Roles of Variables from ALS for AGB Estimation at Stand Level
Since the study area is in a tropical rainforest with dense trees, obvious vertical stratification and complex structure, the competition among tree crowns for light and living space is extremely strong, and the growth of the understory and forest floor are affected. Therefore, the biomass is also affected, and additionally, the leaf area index is intimately related to canopy and a feature of forest health; it can affect many physiological and physical processes in vegetation, such as photosynthesis, respiration, transpiration, and carbon cycling and precipitation retention [39,40]. The gap fraction can reflect the characteristics of canopy structure and spatial distribution of biomass [41,42].
Tree height as one of the most important quantitative forest observation parameters, especially in tropical rainforests, has a strong influence on tree growth, understory, and photosynthesis [11,43]. While it is important to summarize observations in terms of max, min, median, etc., it is more comprehensive when it is combined with percentiles [44]. The quantile is a measurement used to compute the position of the data, and the percentile provides information about the distribution of all data between the min and max, so the percentile step can be also used as an indicator to depict the degree of discrete distribution of data. The value of the percentile is fairly stable when there are many observations. Therefore, it is helpful to analyze stand height characteristics and aboveground biomass of the stand by combining elevation metrics of point cloud data and quantiles. Accumulative interpercentile height and height percentile performed better in AGB estimation.
The intensity reflects the intensity of laser pulse echoes, which are affected by the length of the laser path, scanning angle, distribution of branches and leaves, and the terrain in the forest [45][46][47]. Moreover, the intensity values are intimately related to tree species, and living and dead standing wood [48]. The median absolute deviation (MAD) is a robust measurement of sample differences in univariate datasets. MAD is a robust statistic, which is more flexible than standard deviation in dealing with outliers in data sets and can greatly reduce the impact of outliers on datasets [49,50]. Hence, the robust statistical metrics of intensity are benefit to vertical structure and AGB of forest stand.

Effects on Structure of Individual Trees Based on Alpha-Shape Analysis
The height, intensity and forest stand metrics delineated stand features at plot level; however, the principal part of aboveground biomass was made of individual trees on a plot and consequently, some characteristics were extracted by using the alpha-shape algorithm at tree-level, which is of benefit to describe the relationship between AGB and the variables extracted by the point cloud data; thereby we can make full use of point cloud data and take advantage of airborne laser scanning. We extracted surface area and volume of crown by using alpha-shape, which are important parameters describing the crown structure, responding the size and shape of crown directly and reflecting the photosynthetic ability and biomass accumulation indirectly [1].
The crown fullness ratio (CFR) is not only a quantitative index reflecting the crown size, but also indicating the growth of trees and the ability to occupy growth space [51,52]. Therefore, the size of the crown fullness ratio has a great influence on the growth of trees [53]. As a result of the laser scanning data acquisition and the high crown density in tropical rainforest, the segmented single tree point clouds cannot fully express details of the individual tree, and the alpha-shape polyhedron contains the tree height information. We use the ratio of average crown width to tree height as the evaluation factor of crown fullness ratio. Furthermore, corresponding variables of the length of the Z axis (C Z ) also characterized tree height the same as height metrics. Therefore, the extraction of individual tree structure metrics based on alpha-shape after single tree point clouds segmentation is beneficial to estimating forest aboveground biomass.

Regression Models of AGB Estimation
We used two regression methods to estimate AGB. In both them, multiple stepwise linear regression is the parametric regression technique, which made clear the relationship between AGB and LiDAR derived metrics. Nevertheless, there were some assumptions for the data, such as sample independence, normal distribution, and equal variance [54,55]. On the other hand, the forest environment is an ecosystem that is too complex and nonlinear to be depicted by linear regression. Besides, the individual wood is affected by site quality, stand competition, and other factor interactions, and it is difficult to express highly complex systems by using linear regression [56]. Moreover, there were possibilities of missing treetops during laser scanning in this study area, and the crown of each tree was less accurately depicted by the variables that extracted ALS data.
However, random forest is a common non-parametric regression method that makes no assumptions about the distribution of input data but also analyzes the relationship between AGB and LiDAR metrics well, deep mining valuable information of forest parameters and establishing a better regression model [57,58]. The RF model had a good inversion effect on the AGB in all kinds of metrics. Moreover, the random forest model was good at dealing with nonlinear regression and multicollinearity, such as parameters of the forestry environment in particular [56]. It is easy to improve performance and decrease overfitting. Additionally, it has some drawbacks, as the hyperparameters could not be adjusted well inside the RF model [59], therefore it may not perform well for other similar forests as it did in this study.
And the focus is not to discuss the regression method in this paper. In terms of regression strategies, more machine learning methods can be used and compared, for instance, support vector machine (SVM), artificial neural network (ANN), k-nearest neighbor (k-NN) etc., but the sample size and model generalization, which are also important problems to be considered. Figures 6 and 7 show that RF and MSLR models produced biases when AGB > 200 t/ha. The biases showed that the random effects of forestry environment for AGB. In addition, AGB has the characteristics of spatial autocorrelation and heterogeneous in forest [36,60,61]. The spatial effects will lead to the error of AGB estimation without considering the spatial autocorrelation and heterogeneity, but spatial regression model is a great method, the same as geographically weighted regression (GWR). In addition to considering the analyzed attributes and the weight of spatial distance, the spatial variance can be better characterized [62].

Other Factors for Characterizing AGB Estimation
Another advantage of ALS is the ability to generate a high spatial resolution digital elevation model (DEM). The stand characteristics were also affected by terrain, such as slope, aspect, relief, and curvature [50]. Moreover, topography has a significant impact on species' richness, hydrothermal conditions, and soil nutrient supply, especially in tropical areas [9,63,64]. In this paper, we did not consider topographic factors, or more importantly, the relationship between structure characteristics of stand or individual trees and aboveground biomass. The quality of generating DEM by ALS is affected by forest structure, off-nadir angle and interpolation algorithm [64,65]. Hence, high resolution DEM is used to extract topographic factors, which are used as independent variables for modeling in future research.
In terms of tree species, the shapes of conifer and broad-leaved trees species are quite different, which also produces different responses for laser scanning signals. Accuracy of the AGB estimation model can be improved by distinguishing tree species [66]. Due to the limitation of conditions in situ, the number of sample plots for each dominant tree species is different, with merely a few sample plots for individual tree species.

Conclusions
In this research, the aim was to evaluate aboveground biomass of explanatory potential using different kinds of features extracted from airborne laser scanning data, such as the individual tree crown features that can play an important role in AGB estimation. The random forest model performed better than multiple stepwise linear regression. The multicollinearity and nonlinear were avoided, but also other variables related to AGB were chosen by random forest. The findings of this study suggest that the shape of the point cloud clusters representing tree crowns can be geometrically reconstructed by alpha-shape algorithm, which corresponds to some extracted features that are able to estimate AGB well. There was a fine relationship between AGB and the surface area, and the volume of crowns along with the crown shape ratio and tree height extracted from the alpha-shape polyhedron. In future work, the method may be extended to distinguish tree species mapping at stand level that could improve estimates of regional aboveground biomass.