LIDAR-Based Forest Biomass Remote Sensing: A Review of Metrics, Methods, and Assessment Criteria for the Selection of Allometric Equations

: The increasing level of atmospheric carbon dioxide and its effects on our climate system has become a global environment issue. The forest ecosystem is essential for the stability of carbon in the atmosphere as it operates as a carbon sink and provides a habitat for numerous species. Therefore, our understanding of the structural elements of the forest ecosystem is vital for the estimation of forest biomass or terrestrial carbon stocks. Over the last two decades, light detection and ranging (LIDAR) technology has signiﬁcantly revolutionized our understanding of forest structures and enhanced our ability to monitor forest biomass. This paper presents a review of metrics for forest biomass estimation, outlines metrics selection methods for biomass modeling


Introduction
The total land area of the Earth's surface is largely occupied by 30% of forest [1].The forest represents a source of biodiversity and covers about 80% of plant biomass on the planet Earth [2].Biomass is considered as an important climate parameter [3], and quantifies the mass of dead or live organic matter usually presented as dry weight per unit surface area.Quantifying forest carbon and forest biomass is relevant for detailed understanding of the relative impact of land use on climate change [4].Biomass information is also vital to programs such as the United Nations Agenda on Biodiversity, Sustainable Development Goals, United Nations Convention on Forests, and United Nations Forum to eradicate Desertification.Information on the spatio-temporal dynamics of biomass change is essential for effective planning to mitigate and adapt interventions, as well as the implementation of sustainable policies.Government officials, decision makers, and forest managers who oversee forest protection activities are typical stakeholders of the forest conservation agenda in this context.Information on forest biomass critically influences researchers' efforts to describe the Earth's climate changes [5].Forest biomass prediction over a broad domain or area has been determined through different methods, including measurements interpolated from a few plots purposely for ecological research and biogeochemical modeling [6].Among the biomass estimation techniques, the remote sensing method is desirable since it provides an explicit estimate of the plant biomass at the location of each pixel, rather than estimating the total plant biomass in a particular inventory unit [7].
Forest biomass estimation via remotely sensed data has been significantly influenced by key variables like the vegetation indices, spectral reflectance, leaf area index, and vegetation cover, or combinations of them [8].Optical images and radio detecting and ranging (radar) were conventionally utilized as remotely sensed data [9].The major setback of radar and optical data is the saturation or insensitivity of the emitted signals at medium to high biomass quantities.The complexities associated with vegetation characteristics and biophysical environments, including the species constituents, phenology, health, and growth stages, affect plant spectral signatures.Hence, biomass prediction models derived from optical spectral data cannot be extended to diverse catchment zones for biomass inventory tracking [10].The presence of cloud on the acquired images especially in tropical areas is another challenge limiting its application to these areas [11].
LIDAR technology is an innovative approach for estimating forest biomass.This is due to the fact that the LIDAR approach can detect forest biomass at very high levels (>1000 t/ha) and does not significantly saturate [9].LIDAR data acquisition uses active sensors that radiate near-infrared energy at high frequencies [12].LIDAR remote sensing techniques with their active systems have proven to be capable of giving accurate measurement of forest patterns such height, basal area, vertical profile, crown size, and stem volume [13].These are all connected to biomass estimation.In addition, LIDAR technology provides the means for distant measurement of forest patterns from spaceborne/terrestrial and airborne platforms.
To help prospective users of LIDAR technology for forest biomass assessment and monitoring, this study provides an overview of LIDAR remote sensing technology and its application to forest biomass mapping.The main objectives of this study are to (1) conduct a review of metrics for forest biomass estimation and the metrics selection methods for biomass modeling using LIDAR data, and (2) examine the various assessment criteria for the selection of appropriate allometric equations for plot-level biomass estimation.This study has been prepared to make significant contributions to the following topics: LIDAR technology for biomass studies, LIDAR measurement accuracies, height metrics for biomass models, biomass estimation methods, biomass model assessments, and uncertainty analysis.

Criteria for Literature Review
To conduct this review, relevant literature was gathered from Scopus, Google Scholar, and Web of Science using various search terms related to forest biomass remote sensing.Articles in the English language published between January 1999 and April 2023 were considered, as shown in Table 1.Search items (1) and (2) were tailored to meet the two objectives of this paper.Additionally, important materials categorized as book chapters and internet reports were also included in this review.After downloading, a comprehensive analysis of the title and abstract of each paper was conducted.One hundred publications were selected from the initial 280 based on specific criteria that meet the objectives of this paper.These criteria required that the publication (1) employs LIDAR technology to measure forest structures; (2) include height metrics for biomass models; and (3) includes assessment criteria for selecting allometric equations.After removing irrelevant publications, the remaining papers were divided into three categories.Table 2 shows that the first group, comprising 31% of the selected publications, focuses on LIDAR technology in estimating forest biomass.The second group, which comprises 47% of the selected papers, discusses methods and techniques for selecting LIDAR metrics to estimate biomass.Lastly, about 22% of the papers relate to the assessment criteria for selecting allometric equations.

LIDAR Technology for Biomass Studies
LIDAR technology employs active sensors to capture details about the terrain and physical features like forests [14].This technology is categorized into waveform or discrete return signals; small or large footprint sizes; profiling or scanning patterns; and airborne, terrestrial, or spaceborne platforms.

Waveform or Discrete Return Signal
Waveform LIDAR devices typically capture the interval transit time of the returning beam for each laser beam to measure height distribution of the illuminated surfaces [15].This device performs sampling at a relatively coarser spatial resolution between 10 m and 100 m, and often in combination with a complete digitized vertical spatial resolution, yielding full vertical profiles at a sub-meter scale.Full waveform LIDAR device are primarily used for research purposes [12].
Discrete-return LIDAR devices quantify either a few multiple-return signals or single return of features.The focus is on measuring the return signal where significant peaks present distinct features on the signal path [15].Multiple-return laser systems can capture between one and five signal returns from a single laser beam [16], and when the laser beam is obscured by a feature, signals are reflected to the receiver end, which is captured by the system as the first return.In many cases, the LIDAR pulse may be reflected by objects that are closer to the ground resulting in second-or third-return signals.The final-return signal is usually reflected from the ground surface.This phenomenon is frequently observed in forested areas where the tree canopies are close together [17].

Profiling or Scanning Pattern
The profiling system records data specifically along a narrow transect whilst the scanning chamber also captures data across a broad swath width [15].It systematically captures location footprints at a given pace along the trajectory defined by the sensor relative to the ground [18].The operational rule suggests that laser beams are emitted with higher frequency as the aircraft moves forward while the emitted energy strikes the ground along the line of the flight's direction [19].Implementing this approach ensures extensive sampling of the designated areas.However, to measure biophysical variables in plots that have missing data, distinct procedures are required [18].
Profiling LIDAR systems obtain data along a swath equivalent to the diameter of the target using one pulse, while scanning LIDAR systems disperse laser pulses throughout a swath of different dimensions, which rely on various parameters such as scan angle, pulse density, and flying altitude.For normal forestry operations, swath widths ranging from 500 to 1000 m are frequently used.However, compared to profiling LIDAR systems, a single trajectory of scanning LIDAR stored information will produce a vertical profile across a particular swath width, giving an increased attribute suite for a stand-level characterization [20].

Small or Large Footprint Size
The interface sensed by the laser transmitted by the LIDAR system is contextually referred to as the LIDAR footprint.The footprints sizes are estimated by the LIDAR instrument target and divergence of the laser beam.Regardless of whether the LIDAR footprint is several meters (for satellite platforms), several centimeters (for airborne sensors), or several millimeters (for terrestrial laser systems), its fundamental principle remains the same for all types of laser systems [18].The size of LIDAR footprint (not necessarily divergence) is a contributing factor in forestry application.When looking for the appropriate footprint size, two opposing ideas frequently emerge which are (1) striking tree apices as often as possible and (2) achieving a high spatial resolution and high penetration rate.The first objective imposes a bigger LIDAR footprint while the second demands a small LIDAR footprint.It is less likely to strike tree apices of smaller footprint size as it usually accounted for with very high energy or pulse return densities [21].
Dubayah and Drake [22] noted that a small LIDAR footprint size may not be optimal especially when used for forest structure mapping.They attributed this to the fact that small-diameter laser beams often oversample crown shoulders and eventually miss the top of the trees.Unless several shots are taken, the true canopy formation may be regenerated statistically.Moreover, mapping large regions with small-diameter laser beams may require extensive flying.Regardless, Dubayah and Drake [22] elaborated on several advantages that large LIDAR footprint size systems may have over small LIDAR footprint size systems.Firstly, the biases which are often encountered by small-footprint sensors are wiped out by sensors with large footprint sizes.Secondly, mapping related expenditure from largefootprint sensors with regards to bigger forest areas is less costly compared to smallfootprint systems.These may be due to the wider image swath areas covered by largefootprint sensors.Finally, digitization of signal returns occurs in large-footprint systems to enhance data provision of vertical structures from the crown to the ground.

Spaceborne, Airborne, or Terrestrial Platform
In recent times, there are unprecedented opportunities to measure forest structure from different space technologies as shown in Figure 1.For example, most current and forthcoming Earth observation missions like NASA-ISRO SAR, ICESat-2, NASA's GEDI, and the upcoming ESA BIOMASS [23] will provide information that is essential to 3D vegetation biomass and vegetation structure [24].Spaceborne LIDAR technology operates based on a large footprint with a radius ranging from 25 m to 80 m and moves across undulating topography with sensor beams or orbital tracks sampling various forest structures in an organized manner [25].For this reason, the intensity of sampled points increases with respect to the motion of the satellite across the surface of the Earth [26].
Airborne LIDAR measurements with sub-meter level accuracy have a significant impact on 3D imaging of forest patterns on the Earth's surface.Many projects have employed airborne LIDAR systems with small footprints for mapping forest and vegetation cover for different purposes like biomass studies, forest inventory, and habitat modeling.The measurements taken by airborne systems with a small footprint (<1 m) can be classified as a discrete return or waveform return when the sensor is operating at 1064 nm infrared wavelengths and moving at low heights depending on the LIDAR measurement requirements and the presence of clouds.Small-footprint LIDAR systems record multiple sampled datapoints for specific sampled areas with good accuracies and this allows detailed sampling of forest structural components.Airborne LIDAR systems are commonly used in tropical areas to obtain valuable information over large areas to produce inventory samples for national and regional carbon quantification [26], or as wall-to-wall coverage [27].
Terrestrial LIDAR systems or terrestrial laser scanning (TLS) systems have proven to be effective for the assessment of canopy tree structures [6].Data captured with TLS exhibit high-level details of the 3D structure of trees and forests.This enhances prediction and generalization to national and regional scales using remote sensing techniques [28].Various projects have efficiently utilized TLS from its traditional surveying applications to tropical forest mapping [29] and extracted tree attributes ranging from diameter to height-to-crown width [30].Terrestrial laser systems have restrictions such as short or limited working range as compared to other laser technologies.They are mostly designed for capturing objects above a 50 m range.Hence, their ability to map canopy trees is minimal due to occlusion.Usually, occlusion occurrence is realized in the uppermost canopy, thereby limiting terrestrial laser scanning ability to capture remote details due to occlusion occurrence caused by leaves, twigs, needles, or tree branches.Its demerits include the high price per unit area of data processing and acquisition [31].
for different purposes like biomass studies, forest inventory, and habitat modeling.The measurements taken by airborne systems with a small footprint (<1 m) can be classified as a discrete return or waveform return when the sensor is operating at 1064 nm infrared wavelengths and moving at low heights depending on the LIDAR measurement requirements and the presence of clouds.Small-footprint LIDAR systems record multiple sampled datapoints for specific sampled areas with good accuracies and this allows detailed sampling of forest structural components.Airborne LIDAR systems are commonly used in tropical areas to obtain valuable information over large areas to produce inventory samples for national and regional carbon quantification [26], or as wall-to-wall coverage [27].
Terrestrial LIDAR systems or terrestrial laser scanning (TLS) systems have proven to be effective for the assessment of canopy tree structures [6].Data captured with TLS exhibit high-level details of the 3D structure of trees and forests.This enhances prediction and generalization to national and regional scales using remote sensing techniques [28].Various projects have efficiently utilized TLS from its traditional surveying applications to tropical forest mapping [29] and extracted tree attributes ranging from diameter to height-to-crown width [30].Terrestrial laser systems have restrictions such as short or limited working range as compared to other laser technologies.They are mostly designed for capturing objects above a 50 m range.Hence, their ability to map canopy trees is minimal due to occlusion.Usually, occlusion occurrence is realized in the uppermost canopy, thereby limiting terrestrial laser scanning ability to capture remote details due to occlusion occurrence caused by leaves, twigs, needles, or tree branches.Its demerits include the high price per unit area of data processing and acquisition [31].We noted in this review that, in most cases, forest structure measurements used by most authors for biomass estimation analysis were obtained from airborne LIDAR.About 84% of the reviewed papers employed airborne LIDAR measurements, with only 12% of We noted in this review that, in most cases, forest structure measurements used by most authors for biomass estimation analysis were obtained from airborne LIDAR.About 84% of the reviewed papers employed airborne LIDAR measurements, with only 12% of them utilizing spaceborne LIDAR data.About 4% of the authors employed terrestrial LIDAR data in their biomass studies.Furthermore, it was observed that studies were primarily conducted on a local scale, with few studies carried out on a regional or global scale.Plot-level analysis dominated forest biomass studies at approximately 76% compared to tree-level analysis, which accounted for approximately 24%.

Errors and Accuracy of LIDAR Measurements
The error sources of LIDAR measurement are laser-induced and filtering-induced.Errors related to laser-induced measurements are commonly caused by grain noise and changes in height for the estimated points on the terrain surface (i.e., ridges and ditches) at narrow angles.Global positioning system/inertial navigation unit/inertial measurement unit (GPS/INU/IMU) errors are caused by variances in the measurements and initialization errors.Filtering errors are related to excessive or incomplete removal of laser points.Moreover, false readings from ground-based features like water bodies can cause LIDAR measurement errors [32].
LIDAR accuracy level is usually estimated by statistical comparison between the measured LIDAR points and surveyed (known) points.It is typically measured as the standard deviation (σ 2 ) and root mean square error (RMSE) [33].According to Evans et al. [33], procedures for estimating and reporting horizontal and vertical accuracies of discrete return LIDAR data should consistently follow standards as described in National Geodetic Survey [33], National Oceanic and Atmospheric Administration [33], and Federal Geographic Data Committee 1998 [33].This standard stipulates that the LIDAR data collected should be in conformity to a minimum accuracy of value smaller than 0.55 m horizontal and 0.15 m vertical RMSE for unvegetated ground with less steep slopes [32].For example, a smallfootprint airborne LIDAR device used to acquire data for forest biomass estimation across the lowland tropical forest in Indonesia achieved ± 0.15 m as absolute vertical accuracy with ±0.50 m RMSE [34].The geospatial accuracy standard for vertical and horizontal accuracy of spatial products introduced by the FGDC is largely based on the calculation of RMSE (Table 3).

Formula
Description Reference ∆ is the difference between an in situ checkpoint measurement and measurement obtained from remote sensing at the same site n is the total number of tested checkpoints According to Jensen [35], the vertical equation and the horizontal accuracy equation presume the error distributions for x and z are normally distributed with n > 20.The accuracy of the checkpoints should be higher than that of the remote sensing derived product under investigation.
Table 4 presents a summary of LIDAR measurement accuracies retrieved from a range of publications reviewed.It was observed that the majority of the authors did not provide information on the horizontal and vertical accuracies of the airborne LIDAR data utilized for their research.Of those reported, the horizontal accuracies of the airborne LIDAR data were generally consistent with standards set by NGS-58 (NOAA, 1997) and FGDC-STD-007 with minimum horizontal accuracy ranging from ±0.10 m to ±0.50 m.In contrast, most of the reported vertical accuracies did not meet these same standards.Only one author reported a vertical accuracy of ±0.15 m with the rest ranging from ±0.18 m to ±0.30 m.

Height Metrics for Biomass Model
LIDAR height metrics used in constructing forest biomass models are essential in the model's broad application [26].These metrics are extracted via individual tree-based or area approaches [41].The individual tree-based method involves tree feature identification like crown boundary, treetop, or crown radius [38].An individual tree-based approach is used for single-tree level information extraction with the accessibility of highly dense LIDAR data.In the past few years, numerous researchers have used different types of semi-automatic and automated algorithms for single-tree-level attribute extraction.These algorithms include the local curvature approach [42], the local maximum-based approach [43], the watershed approach [44], and many others.
In the area-based approach, statistical variables or metrics are generated via the canopy height model or laser returns [11].This method produces LIDAR metrics, such as dominant height, mean height, 3D point cloud density in various height percentiles, canopy cover, kurtosis, and skewness for various plots by analyzing LIDAR data from the point cloud and the echo heights [40].Examples of LIDAR-based metrics retrieved from relevant literature for this review paper are presented in Table 5.These metrics retrieved from point cloud data are used to estimate stand-level attributes via comparison with plot-level data [45].For instance, Pascual et al. [36] sampled LIDAR-based metrics such as standard deviation, mean, and median of airborne laser scanning return heights obtained from grid cells which match the pixel dimensions of the remotely sensed data over a test plot in central Spain.
In a similar case in Canada, Matasci et al. [46] extrapolated LIDAR-based metrics such as standard deviation, mean, 95th percentile, and return proportions among others for model validation and calibration purposes across broad forested lands.With the help of a random forest machine learning algorithm, these researchers could develop models which were validated on several plots for key dependent variables like aboveground biomass, canopy cover, basal area, stand height, and stem volume.The results achieved by these group of researchers demonstrated that LIDAR-based metrics combined with Landsatretrieved products could produce forest structure maps over a large area.In describing the use of height metrics for forest biomass estimation, Wang and Weng [9] noted that it is important to choose or extract metrics that frequently relate to biomass across a broad scope of vegetation or forest conditions.
According to Dong and Chen [32], the canopy height model (CHM) forms the basis upon which other tree-level information can be derived.This is demonstrated in a study carried out by Li et al. [48] in which CHM generated from a digital surface model and a digital terrain model via data pre-processing was utilized for single wood extraction of eucalyptus globulus.Analysis performed based on the extraction results was used to ascertain eucalyptus biomass estimation performance via multiple stepwise regression and machine learning algorithms such as support vector machine, random forest, and decision tree.This can also be seen in the case of Michez et al. [49] where they argued that a dominant canopy height estimated from CHM serves as a key indicator extensively utilized by forest supervisors, particularly in forested regions with the same structure.They further explained that estimated dominant canopy height gives a clear signal of productivity when combined with the stand age and would be beneficial for the estimation of forest biomass.
Regarding the estimation of tropical forest biomass, mean top canopy height (MCH) is also reported by Jubanski et al. [34] as a good explanatory predictor.This was demonstrated in research carried out by Meyer et al. [27] which sought to investigate the possibility of quantifying changes in tropical forest biomass over an extended period in some forested areas in Panama.In an analysis involving five different metrics, MCH emerged with the highest relative contribution to forest biomass estimation.Similarly, in a study conducted by Yang et al. [50] which focused on developing a new methodology for estimating forest biomass in various forest types via allometric relationships, MCH was identified as the most effective height metric for estimating forest biomass in coniferous forests at plot scale.This was followed by sub-tropical broadleaf forests, coniferous and broadleaf-leaved mixed forests, and tropical broadleaf forests, in decreasing order of effectiveness.In addition, Lefsky [25] noted that MCH metrics obtained from airborne LIDAR with small footprints could provide adequate knowledge on tree heights within the same field plot with spatial extent of tree canopies.He further explained that MCH in theory includes the mean of crown areas or tree heights within a specific region.Therefore, it demonstrates a good link between the basal area of the plant and aboveground biomass (AGB).The average tree heights are different from the ground measurement of MCH.Also known as Lorey's Height of forest plot, MCH refers to the weighted basal area of the trees inside the specified plot.
The quadratic mean height (QMH) of forest canopy return is another metric reported by Chen et al. [51] as a strong explanatory variable for forest biomass estimation.It is estimated as where h i is the aboveground height for point i and n is the total number of laser points.
The above equation gives more weight to a higher number of points suggesting that plant biomass becomes more prevalent in stands or plots with tall trees.Brown et al. [52] further explained that, since a power relationship exists between most allometric models and DBH which is strongly related to height, it is anticipated that the relationship between Forests 2023, 14, 2095 9 of 22 tree height and biomass is nonlinear such that the biomass content in taller trees become disproportionally higher.Moreover, Lu et al. [53] argued that QMH had been rated among the best predictors of biomass in most studies because of its ability to incorporate nonlinear relationships.In coniferous forests on the Pacific coasts, Means et al. [54] discovered that their best model included QMH where QMH could be negatively related to biomass at plot scale.However, in a mixed-conifer plots in California, Li et al. [55] found compelling evidence that suggests QMH as the best predictor of forest biomass in the study area while QMH was positively correlated to the forest biomass at the same scale.Discrete-return sensors mounted on LIDAR systems can acquire enough point cloud data to precisely determine individual tree height under open-space canopies, like those found in savanna forest [39].Through a data segmentation process, individual tree heights can be separated from the point density [31] for forest biomass studies.Several novel methods have been utilized to delineate individual trees for biomass estimation.One study utilized a method of multiscale smoothing on a model of canopy height and applied a parabolic surface on an individual scale to estimate the ideal scale for determining canopy crowns.The resulting segmented crowns allowed for the effective separation of spruce and pine using information about their shapes [56].In another study conducted by Reitberger et al. [57], individual tree crowns in LIDAR data were extracted via the normalized cut methodology for individual tree delineation.
Canopy density is the measure of the amount of vegetation observed from the air or space in relation to the ground surface.Canopy density models are distinctive in that they evaluate degradation and deforestation by recording changes in forest regions, allowing assessment of forest patterns and direct impacts on carbon contents [58].It is among the most crucial factors to take into account while conducting and developing a rehabilitation program [59].LIDAR ability to measure 3D features makes it a more dependable method of modeling forest canopy density.Canopy density models have a variety of uses in forestry, including the quantification of the crown fuel layer [60], predicting vegetation biomass, and mapping different species of plant [61].
Apart from the aforementioned height metrics, prior research has shown a connection between forest biomass and explanatory variables like wood density (WD) and diameter at breast height (DBH) [62].The explanatory variables used for the vegetation biomass estimation are strongly related to how well allometric models fit the data.The most frequently used explanatory variables are tree height, DBH, and WD.Allometric models developed with these three explanatory variables may have good prediction performance in both the areas where models were developed and in other locations as well [32].
DBH is a well-known parameter for forest biomass estimation.The application of DBH in forest biomass estimation extends to various environments (viz., agroforestry or forestry), growth stages (viz., lianas, shrubs, or trees).Compared with other variables, DBH has a strong correlation with forest biomass and can easily be measured.However, DBH may be inadequate for forest biomass estimation where the geometry of trees keeps changing.Therefore, it is imperative to combine DBH with other parameters or variables for forest biomass estimation especially when the tree geometry differs because of species diversification and site quality [63].
Wood density is widely utilized as an explanatory variable in conjunction with DBH for estimating vegetation biomass [64], and it is estimated as oven-dry weight per green volume or the amount of carbon stored per unit volume of the stem [65].Although there are uncertainties associated with WD values, it is generally assumed that the concentration of carbon in wood is around 50%.The quantity of carbon stored in wood is influenced by both species and location, as species that contain a greater amount of lignin tend to have higher carbon content [66].As a result, the carbon content may vary depending on the species successional condition [66].Wood density cannot be measured directly in the field.However, its values can be obtained from the literature.The values are usually derived from published or reported data with significant uncertainty because of variations in measurement procedures, sample geographical location, and sample size [67].

LIDAR Metrics for Closed-Canopy and Open-Canopy Conditions
Existing research recognizes the critical role played by LIDAR metrics for AGB estimation in both closed-canopy and open-canopy forest conditions.For instance, Ediriweera et al. [68] utilized LIDAR measurements to estimate the structural characteristics of subtropical rainforest and eucalypt-dominated open-canopy forest landscapes in the northeastern region of Australia.In their research, they analyzed a total of 31 LIDAR metrics to assess parameters related to forest structure.The regression models developed in their study were able to account for 62% of the variabilities in basal area, 66% in mean diameter at breast height, 61% in dominant height, and 60% in foliage projective cover for the subtropical rainforest.In contrast, the open-canopy forest dominated by eucalypt trees yielded the most precise predictions for both mean height and dominant height.As a result, this group of authors concluded that the accuracy of predicting structural parameters using LIDAR metrics was lower in closed-canopy subtropical rainforests characterized by high species diversity compared to less dense canopies with low species diversity.
In their investigation to explore the general relationships between LIDAR metrics and forest structural attributes, such as aboveground biomass in closed-canopy tropical forests, Drake et al. [69] conducted a study.They extracted canopy metrics (viz., canopy heights), from airborne LIDAR data, along with basal area and mean stem diameter, for their analysis.The team discovered significant correlations between LIDAR metrics and mean stem diameter, basal area, and aboveground biomass in the study area.This led to the conclusion that crucial forest structural characteristics, such as aboveground biomass, can be accurately estimated in closed-canopy tropical forests by utilizing metrics such as canopy height, mean stem diameter, and basal area.
As demonstrated by Asner and Mascaro [70], the use of LIDAR metrics in closedcanopy conditions has another significant application.The authors conducted a study utilizing a network of 804 field inventory plots that covered a wide range of tropical vegetation types.In their study, they employed a network of 804 field inventory plots spanning various tropical vegetation types to establish a connection between LIDAR topof-canopy height (TCH) and AGB in tropical forests.They incorporated regional-scale inputs of basal area and wood density for this purpose.The study revealed that LIDAR measurements of TCH alone could account for a portion of the variation in AGB across different tropical regions and conditions.However, by including plot-aggregate estimates of basal area and basal-area weighted wood density, the remaining variation in AGB could be almost entirely explained.As a result, the authors argued that, in closed-canopy conditions, the application of a generalized plot-aggregate allometry can be achieved via a quick survey of the basal area and by gathering information about the dominant species to estimate wood density within the surveyed LIDAR area.

Metrics Selection for LIDAR Biomass Model
Numerous techniques for choosing appropriate metrics for vegetation biomass modeling has been communicated by Lu et al. [11] and these techniques include but are not limited to variable selection due to expert experience and knowledge, stepwise regression analysis, feature extraction method, correlation analysis, neural network, and random forest algorithm.In general, metrics that show a strong correlation with vegetation biomass and a weak correlation with other predictor variables are recommended.In addition, Wang and Weng [9] explained that model generalizability should be considered during the metrics selection stage of the biomass estimation process.Finally, metrics selected should usually relate to biomass across a broader range of vegetation conditions.Table 6 outlines various metrics and other vital information identified in this literature review.

Biomass Estimation Methods
Vegetation biomass is usually estimated via destructive and non-destructive methods.The destructive method is the most direct and accurate way of estimating tree biomass.It mostly involves harvesting trees in an area, and measuring the weight of various parts [81] and weighing these components (viz., branches, tree trunk, and leaves) after they have been oven-dried [82].Although useful, this approach is costly, labor-intensive, and almost impossible to apply at the regional or global level.
The non-destructive approaches evaluate forest biomass without tree harvesting.Allometric equations are used to convert the LIDAR measurements of forest height with other variables into AGB [26].Since the size of a structure in a particular organism relates to the amount or size of another structure in the same organism through the concept of allometry, forest biomass estimated from tree height, age, and diameter could be extended to a broader area with the same characteristics [83].Allometric models have been developed for several forest types around the world to convert LIDAR measurements of forest structures into forest biomass [70].

Model Development
The growing interest in the estimation of vegetation biomass and its variation with respect to time has necessitated the development of LIDAR biomass models.Parametric and nonparametric approaches for LIDAR biomass model development will be discussed.

Parametric-Based Model
Conventional regression models are employed to estimate AGB by utilizing metrics derived from LIDAR data with other variables.Since most of the allometric equations used to determine forest biomass are power models [84], LIDAR metrics and biomass usually undergo a logarithmic transformation during regression model fitting [85].For this reason, the simple power model relates to a simple linear regression whereas the multiplicative power model relates to a multiple regression model [11] as shown in Table 7.According to Lu [86], the accuracy of conventional regression models may be low if the number of sample plots is inadequate or if the linear correlation between variables and biomass is weak.

Regression
Model Description Sample1 Ref.

Simple Linear Regression
Nonparametric algorithms construct the model structure based on the input data rather than predefining it explicitly.Due to their flexibility, nonparametric methods are more likely to generate intricate, nonlinear biomass models [53].There are several nonparametric algorithms; however, this review paper specifically focuses on three of these models, which are outlined in Table 8 provided below.
We noted in this review that parametric-based models were considered in most research conducted as compared to nonparametric-based models.Among the articles that provided relevant model details, approximately 67% employed parametric-based models, while 20% utilized nonparametric-based models.Additionally, 13% of the papers considered both parametric and nonparametric models.RF emerged as the most used nonparametric model for forest biomass estimation followed by SVM and K-nearest neighbor (KNN).

Artificial neural networks (ANNs)
ANN imitates the methods used by the human nervous system to acquire knowledge and process information in similar ways.It has proven to be effective for ecological applications and data modeling.
The issue of collinearity does not affect ANN results.This sets ANN apart from conventional statistical methods and is a significant reason why ANN is preferable.
In addition, the lack of transparency about the internal operations of the system can make it challenging to identify and address potential overfitting issues.[88,89] Random forest (RF) RF is a set of binary decisions based on rules that determine the relationship between an input and its explanatory variable.
Complicated associations existing between variables at different magnitudes can be depicted accurately by regression trees.
Overfitting of large noise data samples is often encountered.[11,90] Support vector machine (SVM) In order for SVM to be effective, it assumes that each group of input parameters has a unique relationship with its corresponding dependent variable and that relating these predictors to each other is adequate to find rules that can be applied to forecast biomass from a set of inputs.
SVM has demonstrated the ability to reduce overfitting, which hinders a model's capacity to effectively characterize new, unobserved data.
Creating a good model is challenging when there are lots of training samples.[11,91]

Biomass Model Assessment
Multiple studies identified in this review paper have demonstrated the use of the model assessment criteria for the selection of allometric equations as presented in Table 6.This can be seen in the case of Cao et al. [37] where model assessment criteria were utilized for the selection of allometric equations for AGB estimation in the northwest part of China.In their research, various machine learning algorithms were used for the development of five different models.To select the best model for biomass estimation, coefficient of determination (R 2 ) and root mean square error (RMSE) assessment criteria were adopted.Based on the assessment, the random forest algorithm with R 2 of 0.899 and RMSE of 14 t/ha was selected as the best model compared with the others.
To determine the optimal method for computing field biomass in some forested areas of India, Deb et al. [80] conducted a pilot study that employed SAR data and the Normalized Difference Vegetation Index (NDVI) to compare conventional linear and nonlinear models with that of an ANN nonparametric-based model.The study further evaluated the models using a few reliability measures, including the Bayesian information criterion (BIC), Akaike information criterion (AIC), residual standard error (RSE), and coefficient of determination (R 2 ).Based on their results, it was concluded that a power model with BIC of 392.1, AIC of 280.4,R 2 of 0.94, and RSE of 20.9 fitted better than the exponential model with AIC of 294.4,BIC of 398.9, RSE of 26.2, and R 2 of 0.90.However, upon comparison with previously assessed nonlinear models, the ANN model proved to be a much better option for fitting NDVI data for field-estimated biomass.This was due to substantially lower BIC of 54.9, AIC of 32.0, and RSE value of 0.007, and a significantly higher R 2 value of 0.98.
In describing the use of various models for forest biomass estimation, Sileshi [92] noted that the choice of a biomass estimation model among a range of possible model forms may have an impact on regional, national, and global biomass estimations.In this regard, Dong and Chen [32] argued that the selection of an appropriate allometric model for the estimation of field-based biomass is critical in the biomass estimation process.Therefore, this section of the review paper seeks to communicate the various applications and weaknesses of some selected biomass estimation criteria which are worth knowing.

Coefficient of Determination
R 2 , also known as the multiple correlation coefficient in multiple regression analysis, is defined as a measure of the variation quantity in the dependent variable explained by the independent variables added to the biomass model [93].It is a statistical evaluation of the closeness of the data distribution to the fitted regression line.As shown in Table 6, R 2 appears to be the most widely used model assessment criteria in this literature review with about 21% of the authors found using R 2 .For instance, a study carried out by Lau et al. [30] demonstrated the use of R 2 as a model assessment criterion.In their study, allometric models were developed from several tree attributes obtained from a terrestrial LIDAR system for the estimation of forest biomass in Guyana.These authors argued that their best models estimated aboveground biomass accurately with R 2 values ranging from 0.92 to 0.93 as compared with traditional pantropical models with R 2 values ranging from 0.85 to 0.89.
Even though R 2 appears to be the most widely used model assessment criterion, its weaknesses as outlined in Table 9 cannot be overlooked.By using an untransformed dataset from analysis undertaken by Henry et al. [94], Sileshi [92] demonstrated the weakness of R 2 by sequentially adding polynomial terms of "d" to the fitted models shown below.Notice how the value of R 2 keeps rising after including a polynomial term.
Table 9. Limitations of selected biomass model assessment criteria found in this review.

Criterion Application Weaknesses
Ref.
Coefficient of determination R 2 = 1 implies that the variability in the dependent variable can be explained by variation in the independent variable.
In comparing the quality of one model to another, R 2 increases automatically when a polynomial term is added to the model.
[92] R 2 = 0 implies that none of the variations in the dependent variable can be explained by variation in the independent variable.R 2 automatically increases when new independent variables are added to the model.[87] Root mean square error A model with the smallest RMSE value is mostly preferred.
In comparing the quality of one model to another, RMSE decreases automatically with an increase in R 2 . [92] Small values of RMSE are mostly observed in over-fit models.RMSE is observed to be ineffective for comparing models with collinear variables.
Akaike information criterion A model with the smallest AIC is mostly preferred.
The basic assumption of AIC suggests that all candidate models are good reflections of reality.The AIC method does not presume that the correct model is among the models being assessed.Therefore, a model can always be selected out of outrageous ones. [92]

Cross-validation
Used to estimate the performance of biomass estimation models via the use of an independent dataset.It is often used to curtail overfitting problems.
The greater the number of folds used, the higher the variance.[92] Similarly, Sileshi [92] demonstrated the weakness of R 2 by adding a new independent variable q to the following fitted models.Notice how the value of R 2 increased with the addition of a new independent variable q.M = −2.04+ 0.28(d) − 0.24(h) R 2 = 0.6220 M = −9.60 + 0.31(d) − 0.32(h) + 14.45(q) R 2 = 0.6440 (6) where "d", "h", and "q" are independent variables and "M" is the dependent variable.
The above illustration strongly suggests that the quality of the LIDAR biomass model assessment may be compromised via the use of R 2 alone.

Root Mean Square Error
This is also known as the residual standard error (RSE) in other literature.RMSE represent the distance or the gap between the actual result and the target estimate, and is a preferred performance evaluation measure when conducting regression analysis [48].We noted in the review that about 16% of the authors adopted RMSE as a model assessment criterion.However, the weaknesses of RMSE as stated in Table 9 also suggest that the utility of RMSE alone as a model selection criterion could compromise the model selection process.

Information Criterion
The most used biomass model assessment information criteria are the BIC and the AIC.During biomass model fitting, BIC is seldom used as compared to AIC in most literature [87].AIC is a criterion used to evaluate the relative quality of statistical models that fit a specific dataset.For a given set of models, AIC evaluates each model's quality in relation to the other models.Therefore, AIC offers a unique approach to model selection.This criterion is useful because all superfluous parameters in the model are penalized explicitly with the addition of 2(p + 1) to the deviance [95].However, this method does not presume that the correct model is among the models being assessed.Therefore, a model can always be selected out of outrageous ones [92].

Cross-Validation
Flores-Anderson et al. [26] defined cross-validation as a method employed to assess the consistency of statistical learning by using a separate dataset from the training sample.There are many ways of using cross-validation methods to assess model performance.These consist of the Monte Carlo approach, twofold cross-validation technique, leave one out approach, and k-fold cross-validation technique.In his research, Sileshi [92] described the K-fold technique as a commonly used approach which can be performed by partitioning the initial dataset into several folds of equal sample size usually called the K subset while the model is fitted to the rest of the K-1 subset.In addition, Lu et al. [11] noted that crossvalidation has the advantage of enhancing the accuracy assessment reliability.However, this approach disregards the independence requirement of the accuracy assessment.

Uncertainty Analysis
Uncertainty analysis is a more rigorous method of assessing LIDAR biomass models.In line with IPCC's recommendations for national greenhouse gas inventories, all uncertainty surrounding biomass estimation must be considered in all AGB and carbon assessments at the national and project levels.The guidelines further stipulate that, in addressing the issue of uncertainty relating to biomass estimation, the following factors should be considered: (1) determine and assess the sources of the uncertainty; (2) whenever possible, reduce uncertainty using cost-effective techniques; and (3) quantify residual uncertainties [96].
Several sources of uncertainties can be found in the estimation of AGB.In examining the available literature on the utilization of allometric equations for AGB estimation, Chave et al. [97] recognized four categories of uncertainty that could impact forest biomass esti-mation.These are errors due to allometric equations selection, errors due to tree parameter estimation, sampling errors related to sample plot dimensions, and how accurately the research plots represented the whole environment.It was finally concluded that the main cause of uncertainty corresponds to the use of allometric equations.Lu et al. [53] also argue that the identification of the uncertainty sources, uncertainty propagation, and accumulation modeling as well as the quantification of the amount of uncertainty are the key factors required to minimize the levels of uncertainties in forest biomass estimation.
Multiple studies have found pixel-based uncertainty analysis to be a reliable technique for assessing the spatial variation of uncertainties.For instance, Chen et al. [98] devised a technique for assessing the uncertainties of the plot-level forest biomass.The technique propagates errors from individual trees to plot level by taking into consideration the errors in the entire biomass estimation process consisting of the tree parameter measurements on the field, development of allometric models, plot-level biomass estimation, plot level biomass development, tree-level biomass prediction based on remotely sensed data, feature extraction from remotely sensed data, and pixel-level forest biomass prediction.When this approach was applied to analyze tree AGB data obtained from airborne LIDAR imaging of tropical woods in Ghana, it was discovered that the predicted AGB error rate surpassed 20% at a spatial resolution of 1 hectare, which was higher than that reported in other studies covering different tropical forests.
In 2016, the uncertainty analysis framework methodology was extended from pixel level to various areas made up of multiple pixels.This pixel-based uncertainty analysis methodology was piloted over a forest area of about 69,508 km 2 in northern Minnesota for forest biomass mapping and prediction by combining several data from airborne LIDAR, in situ measurements, and the national forest inventory (NFI) plots.It was observed that the predicted AGB error at pixel level was predominately made up of residual error from the LIDAR-based biomass model at a time when the spatial resolution was close to 380 m compared to estimate errors for model parameters at a period where the spatial resolution was rough.It was also discovered that at a spatial resolution of 100 m the relative error of forest biomass prediction from LIDAR data decreased to about 11% at a hectare scale across the pilot area that was studied [99].

LIDAR Technology for Biomass Studies: Emerging Trends
Different studies found in this review outline emerging trends of LIDAR technology in biomass studies.In their research, Queinnec et al. [79] analyzed data obtained from boreal forests in Canada with single photon light detection and ranging (SPL) technology for the implementation of forest inventory.Structurally guided sampling (SGS) methodology, random forest machine learning approach, and principal component analysis were used for the analysis.Their research demonstrated strong evidence that forest inventory could be developed over a wide forested area for biomass quantification analysis via the application of SPL technology.
Brown [100] also emphasized the use of a sophisticated airborne LIDAR system coupled with differential GPS systems, laser pulse finder, and dual-camera digital video compartment capable of zooming at different angles and distances.This system used for collecting 3D features on the Earth's surface in the United States demonstrated capabilities of capturing tree height, tree crown area, and crown density at an unprecedented rate for forest biomass studies.

Conclusions
Our findings revealed that LIDAR technology has emerged as a leading data collection tool for aboveground biomass estimation.With various LIDAR platforms and systems available, the technology can effectively measure vertical forest structures to assist with aboveground biomass mapping.Most studies found in this review utilized airborne LIDAR data for plot-level analysis on a local scale and employed parametric-based models over nonparametric-based ones for forest biomass estimation.The horizontal accuracies of the airborne LIDAR data ranging from ±0.1 m to ±0.5 m was generally consistent with standards set by NGS-58 (NOAA 1997) and FGDC-STD-007.In contrast, most of the reported vertical accuracies did not meet these same standards.Evidence from this review also suggests that the mean top canopy height and quadratic mean height are strong predictors for AGB estimation.R 2 and RMSE were the most used model assessment criteria.However, limitations such as R 2 increasing with the addition of polynomial terms or new independent variables, and RMSE decreasing with rising R 2 values were observed.Future studies should investigate the impact of these limitations on forest biomass estimation.To enhance LIDAR biomass estimation accuracy, pixel-based uncertainty analysis proved to be a reliable method for assessing the spatial uncertainties in biomass models for each pixel in the study area.

Figure 1 .
Figure 1.Illustration of the various remote sensing platforms for forest structure measurements.

Figure 1 .
Figure 1.Illustration of the various remote sensing platforms for forest structure measurements.

Table 2 .
Summary of selected literature.

Table 4 .
A summary of LIDAR measurement accuracies identified in this review.

Table 5 .
LIDAR metrics for forestry studies.

Table 6 .
Metrics and other essential information identified in this literature review.

Table 8 .
Merits and demerits of three nonparametric-based models.