Investigating Dual-Source Satellite Image Data and ALS Data for Estimating Aboveground Biomass

: Accurate estimation of above-ground biomass (AGB) in forested areas is essential for studying forest ecological functions, surface carbon cycling, and global carbon balance. Over the past decade, models that harness the distinct features of multi-source remote sensing observations for estimating AGB have gained significant popularity. It is worth exploring the differences in model performance by using simple and fused data. Additionally, quantitative estimation of the impact of high-cost laser point clouds on satellite imagery of varying costs remains largely unexplored. To address these challenges, model performance and cost must be considered comprehensively. We propose a comprehensive assessment based on three perspectives (i.e., performance, potential and limitations) for four typical AGB-estimation models. First, different variables are extracted from the multi-source and multi-resolution data. Subsequently, the performance of four regression methods is tested for AGB estimation with diverse indicator combinations. Experimental results prove that the combination of multi-source data provides a highly accurate AGB regression model. The proposed regression and variables rating approaches can flexibly integrate other data sources for modeling. Furthermore, the data cost is discussed against the AGB model performance. Our study demonstrates the potential of using low-cost satellite data to provide a rough AGB estimation for larger areas, which can allow different remote sensing data to meet different needs of forest management decisions.


Introduction
Carbon peaking and carbon neutrality goals are currently popular topics all over the world [1].Forests account for a high proportion of global above-ground carbon stocks [2].An accurate AGB estimation is the data basis for studying forest carbon stocks and investigating their impact on climate change and ecosystem functions.However, traditional forest inventory for biomass estimation includes inevitable drawbacks when taking large-scale measurements [3], such as great time consumption and high costs, making it difficult to popularize.Remote sensing data contribute to obtaining spatially contiguous high-precision data [4,5].Among these, the cost and performance of the AGB estimation model are the main factors determining the selection of remote sensing data sources.
The observational advantages of remote sensing, such as wide coverage and fast operation, compensate for the shortcomings of traditional methods of measuring forest biomass.Current remote sensing methods for estimating AGB mainly utilize optical remote sensing and light detection and ranging (LiDAR).Optical images contain spectral information of the forest on a large scale.The spectral information obtained from satellite images has a good correlation with AGB, representing its potential to inform predictors in AGB regression.However, if the tree density is high it can be difficult to quantify a reliable statistical relationship between spectral information and biomass [6].Unlike optical image systems, LiDAR systems capture the vertical distance between the sensor and the object.The 3D structural data of vegetation generated by LiDAR has been adopted in many applications [7].Synthetic Aperture Radar (SAR) at lower frequencies (L and P bands) has the ability to penetrate deep into the forest and interact with the trees, providing a better correlation with AGB [8].However, it has stronger ground effects than LiDAR.Owing to their distinct features, optical images and laser scanning point clouds can be combined to monitor forest AGB changes.
The synthesis of multi-source features for modeling AGB has been widely used for AGB estimation [9].For example, optical image data and ALS data [10].de Almeida et al. [4] explore the combination of ALS and hyperspectral data, which showed significantly improved performance in Amazon AGB estimation.It has been proved that remote sensing data sources (ALS, Airborne hyperspectral imagery, or their combination) have a greater impact on modeling outcomes than regression methods.In addition, there are many challenges in the selection of suitable metrics, as well as many different models [9].In contrast to deep learning methods, various machine learning approaches have been explored to investigate the importance of different variables in the estimation of AGB.The choice of parametric and non-parametric statistical models is crucial for AGB estimation, as the performance of models varies significantly in different modeling methods [3].Previous studies have claimed that their developed AGB estimation models can achieve satisfactory results using variable selection, different regression models, and multi-source and multi-resolution data.However, to the best of our knowledge, there has been little prior research on the fusion of image data with different resolutions and ALS data in AGB modeling.
To address these knowledge gaps, our work combines medium-and high-resolution satellite data with airborne point clouds.The impact of different resolutions of satellite imagery on modeling performance is investigated.Satellites data have large coverage and are publicly available, while ALS data can capture 3D tree morphology information but usually cover a smaller area owing to budgetary limitations.This study proposes a framework to leverage the different advantages of satellite imagery and ALS data in estimating aboveground biomass.Specifically, instead of developing a new algorithm, we assess several regression models using dual-source data.This assessment helps us to identify the best regression model based on model performance and cost, which is then used to estimate the aboveground biomass over large areas.AGB modeling is explored in this paper to answer the following questions: (1) What impact does the integration of ALS data variables and satellite imagery variables have on the precision and bias of AGB estimations?(2) To what extent do the different regression methods affect the accuracy and bias of AGB estimation? and (3) How does the spatial resolution of satellite imagery influence the estimation of AGB?
The remainder of this paper is organized as follows.Section 2 describes the multisource data, the extracted variables, and the methods used to explore the AGB regression modeling for forested areas.First, the vertical structure of the forest sample plots is obtained by applying terrestrial laser scanning (TLS) and unmanned aerial vehicle laser scanning (ULS) data on field plots.The extracted diameter at breast height (DBH) and tree height of individual trees are then used to estimate plot-level AGB through allometric model equations to obtain the AGB as a dependent variable.Independent variables are extracted from the ALS data and images with different resolutions in order to explore the correlation between the different variables and forest AGB.Section 3 describes the experimental results and evaluation.A discussion is presented in Section 4, in which AGB models built by the different methods and data sources are compared and analyzed.Finally, the paper is summarized in Section 5.This study explores the performance of AGB modeling with a fusion data source and investigates the impact of integrating spectral information with different cost considerations into high-cost ALS data.

Materials and Methods
As illustrated in Figure 1, the proposed method includes the following main steps:

Study Areas and Datasets
Guangxi province, China, has a subtropical monsoon climate with abundant rainfall and heat.Most of the forest plots are planted trees.In this study, four forested areas in Guangxi province were selected, as listed in Table 1 and Figure 2. In the experiment, a total of 68 plots of 15 × 15 m were set up.The experimental data included ULS data, TLS and ALS data, Landsat 8 (LS8) data, and Gaofen2 (GF2) image data.The ULS and TLS data were collected in June 2020.The ALS data and satellite images were collected in 2019 and 2020, respectively.
The ULS, airborne, and terrestrial LiDAR data allowed for complete scanning of the internal and upper structure of the forest.An Austria RieglVZ-400 3D laser scanning system, which scans 122,000 points per second, was used for the TLS data acquisition in the experiment.A Netherlands Aerialtronics Altura AT8 ULS, which has a flight height of about 50-70 m above the ground, was used for the unmanned aerial vehicle (UAV) data acquisition.The ALS data were acquired using a fixed-wing manned P750 aircraft equipped with an Austria RIGEL-VQ-1560i airborne LiDAR scanning system.The ALS system was used at an average flight altitude of 2500 m above ground, with a full area point density of over three points/m 2 .The ALS data were preprocessed to produce a digital terrain model (DTM) and a normalized point cloud.
For the sample plots, the GF2 satellite images were downloaded from the China Center for Resources Satellite Data and Application, which provides cloud-free images at the sub-meter level.For the sample plots, the GF2 satellite provides cloud-free images at the sub-meter level.The images have four spectral bands with a spatial resolution of 4 m (wavelength 0.45-0.9µm) and one panchromatic band with a spatial resolution of 1 m.Similarly, the LS8 images have nine multispectral bands with a spatial resolution of 30 m along with a panchromatic band with a resolution of 15 m.Pansharpening was performed on the GF2 and LS8 data using the software Esri ENV I5.3 from Davis, CA, USA.After pansharpening, the images were cut and mosaicked, then the preprocessed satellite images of the study area were obtained.

Construction of Field Plots
According to Table 1, all TLS and ULS data in four forests in three cities were collected.The distribution of sample plots is shown in Figure 3.The collected point clouds can be used to segment and calculate individual trees.The accuracy of the point cloud method for obtaining tree parameters has been verified in [11].The AGB of an individual tree is the sum of the parts, including the stems, branches, and leaves.The AGB sample plots were assessed based on the existing allometric equations [12], as shown in Table 2. TLS obtains rich information on the bottom part of the tree, based on which the DBH can be accurately calculated.ULS is able to capture information on the top of the tree, the ground height, and the highest point of the tree, which are important for correctly calculating H.The combination of TLS and ULS data can be used to obtain complete information on the sample plots.First, the collected multi-station TLS point clouds were co-registered to the same coordinate systems, then the TLS and ULS data were co-registered using the Random Sample Consensus (RANSAC) method [13].The average registration residual for the TLS-to-TLS scenario was 0.049 m, while for the ULS-to-ALS scenario it was 0.299 m.The fused point cloud was then used to automatically perform an accurate individual tree extraction [14].In addition, we used the TLS-to-ALS registration from [15].The average registration residual for the TLS-to-ALS scenario was 0.049 m.The conversion of ALS data to 1 m resolution imagery allowed for geographic registration of the ALS and GF2 imagery using the geographic registration method in ENVI 5.3.Satellite image plots with the same geographic extent were obtained from TLS point cloud plots.

Satellite Image Processing and Metric Extraction
Several imagery spectrum variables or transformation results have been proposed as potential AGB predictors.Based on the LS8 and GF2 images, some of the optical image variables were obtained by principal component analysis (PCA) transformation, minimum noise fraction (MNF) transformation, and transformation by various vegetation indices [5].Descriptions of the variables and the calculation equations are provided in Table 3.All the satellite image variables were obtained at the plot level.
For GF2 data and LS8 data, the raw bands have different spectral characteristics in terms of the absorption and reflectance of vegetation, and are used as candidate variables, as shown in Table 3.Here, ρ B , ρ G , ρ R , and ρ N IR correspond to Bands 1 to 4, respectively.PCA is able to extract the most important components of the image, and is commonly used to remove noise from satellite data.The first three principal components are not correlated with each other after transformation.The data variance of the first principal component (PC1) is the largest, then those of PC2 and PC3 decrease in turn.The retained three MNF components arrange the components from largest to smallest according to the signal-tonoise ratio.In addition, the raw bands can be combined to calculate vegetation indices in order to highlight the vegetation.For example, the NDV I is used to provide an indication of the health and growth of the vegetation; the EV I represents an improvement over NDV I in terms of a decoupling the canopy background signal and reducting atmospheric influences.The RV I and DV I reflect the difference between the reflection of vegetation in the visible and near-infrared bands.The SAV I and MSAV I are used to reduce the effect of the soil.The ARV I is insensitive to aerosols, and is particularly suitable for monitoring areas with high atmospheric aerosol levels [16].

ALS Data Processing and Variable Extraction
A range of ALS variables are used to characterize the forest canopy and vertical structures, including height percentile variables, canopy cover variables, and tree height variables at the plot level.In this case, to construct the height percentile variables within a sample plot, the height at a given percentile is extracted from the normalized point cloud.The height percentile variables show the vertical division of the forest.All the extracted ALS variables are listed in Table 4.In general, the first return points of ALS point clouds are the canopy points in the forest, while the last return points are the ground points.The percentage of first return points to total points (FR) variable describes the proportion of canopy points to the total number of points in the forest.Further extraction of the canopy height variables can describe the vertical distribution and variability of the forest canopy.Canopy cover is defined as the percentage of vegetation returns to the total number of returns, which describes the planting density of trees at the horizontal level.

Selection for Satellite Imagery and ALS Data Variables
As the extracted variables contain redundant information, variable selection is a key issue.Importance ranking of variables and Pearson correlation analysis was used to filter the variables.Finally, suitable optical image and ALS data variables, along with their combinations, were added to the regression model.
After extracting variables from the preprocessed GF2, LS8, and ALS data, the random forest (RF) importance ranking of the variables was calculated.The change in the mean squared error (%IncMSE) can be considered as the contribution of the variable to the AGB prediction and used to assess the importance of specific variables.In general, the %IncMSE is used to interpret the decrease in the precision of the AGB prediction when the variable is removed.When %IncMSE is higher than 4%, the variable is retained.Meanwhile, the correlation between the extracted variables was calculated using Pearson correlation analysis.In this paper, 0.7 was considered as the assumed threshold for a high correlation coefficient.If the correlation coefficient exceeds 0.7, only the more important variables are kept.In this way, those variables with high importance and low correlation are selected for modeling.

Regression Models
The regression methods considered in this study included both parametric and nonparametric methods, namely, the stepwise regression method (SRM), support vector machine (SVM), boosting tree, and bagging tree approaches.Ten-fold cross-validation [24] was employed to prevent overfitting of the models using all 68 plots.
Multiple variable regression analysis is a commonly used method in biomass modeling [25].In this study, the SRM method was used to describe the relationship between the independent variables and the dependent variable (AGB).When a new dependent variable was introduced, the original variables were tested one at a time.Variables were retained (p < 0.05) or removed depending on their significance level.The independent variables from Tables 3 and 4 were filtered via SRM to build the final model, as shown in Equation ( 1): where b is an intercept and a n is the parameter for variable n fitting in the SRM.For metrics, variable n is the one in Table 3 or Table 4 from the images and ALS data retained after feature selection and the SRM.
The other three methods (SVM, boosting tree, and bagging tree) are all non-parametric methods which use default parameters.The same type of learner was used in both the bagging tree and boosting tree methods; the differences are shown in terms of sample selection, sample weight, prediction function, and parallel computation.SVM is a dichotomous classification method that separates two categories by seeking an optimal decision boundary at a maximum interval.
The precision and bias vary depending on the different regression models.To assess the results of the different regression models, the coefficient of determination (R 2 ), rootmean-square error (RMSE), and bias were calculated in order to evaluate the performance of the models.The definitions for the calculation of each indicator are as follows: where n is the number of samples, y i is the ground truth of the i th sample plot, ỹl is the predicted value for the i th sample plot, and ȳl indicates the mean of the ground truth of the sample plots.

Variables Selection Results
Essentially, a simple model should be more stable and transferable than a complex one.Therefore, variable selection is necessary before employing regression models for AGB, as the complexity of the regression models depends on the number of input variables.Potential variables were further selected from Tables 3 and 4 for AGB regression based on the RF importance ranking and Pearson correlation analysis.
Concerning the outcomes of variable selection, the chosen variables for GF2 data include PC3, EV I, MSAV I, PC1, and RV I.In the case of LS8 data, the selected variables encompass DV I, EV I, MNF2, MNF3, and PC3.For ALS data, the selected variables encompass h cv , t max , FR, h mean , t min , C, and h max .Additionally, the GF2-ALS model incorporates h cv , PC3, t max , h mean , EV I, RV I, and MNF3, while the LS8-ALS model integrates h cv , t max , PC2, h mean , FR, EV I, DV I, t min , and C. The significance rankings and correlations of the designated satellite image indicators, ALS data, and amalgamated data are illustrated in Figures A1-A3, respectively, presented in Appendix A.

AGB Prediction Using Only Image Variables
Following variable selection based on their correlations and significance, a performance evaluation of multisource data modeling was conducted.In this paper, the AGB regressions using only the GF2 variables or the LS8 variables are called the GF2 model and LS8 model, respectively.The results of each regression model and their associated statistical performance indicators are shown in Table 5.In the GF2 model depicted in Table 5, the obtained R 2 ranges between 0.54 (boosting) and 0.61 (SVM).For the bias and RMSE, the model performance with the largest difference between the predicted value and the ground truth is the boosting model (bias = −4.38Mg/ha and RMSE = 44.21Mg/ha), while the lowest is the SRM (bias = −0.0003Mg/ha) and SVM (RMSE = 40.72Mg/ha).The bold text in Table 5 is the best model result in this data source.Figure 4 showcases the cross-validation outcomes pertaining to the AGB predictions derived from the four models compared to the AGB estimates at the sample plot level.Additionally, black linear trendlines and R 2 have been incorporated into Figure 4 to depict the trends for both predicted values and measured data points.As the slope approaches unity and the intercept of the linear trendline diminishes, the determination coefficient increases, signifying an improved fit.The observations in Figure 4 associated with the SVM methods manifest a relatively concentrated distribution in closest proximity to the trendline.Concurrently, Figure 5 elucidates the residual distribution exhibited by the four models from diverse data sources.Verification of residual randomness can be conducted by means of residual-versus-fitted-value plots.The presence and magnitude of heteroskedasticity can be ascertained from these plots.An even distribution of the residuals on both sides of the x-axis in such plots signifies a limited degree of heteroskedasticity, while a conspicuously non-uniform distribution indicates a pronounced presence of heteroskedasticity.As can be inferred from Figure 5, the SVM method exhibits a reduced number in comparison to the other three methods, along with a more tightly concentrated residual distribution.Similarly, the five extracted LS8 variables were applied to the modeling using the regression method.The results are visually depicted in Figures 4 and 5. Compared with the GF2 model in Table 5, the rRMSE is higher, exceeding 30%.The proportion of explained variation decreases, and the best fitting method is bagging, with an R 2 of 0.52.The lowest R 2 of all the models is 0.42, resulting from the SVM method.The lowest values of RMSE are obtained from the bagging (45.40 Mg/ha) and SRM (50.01 Mg/ha) methods.From Figure 4, the bagging method of the LS8 model exhibits a notably clustered distribution of observations in close proximity to the trendline compared to other methods.Meanwhile, the residual distributions for the various methods demonstrate similarity in Figure 5, where there is an uneven distribution of data on both sides of the x-axis.

AGB Prediction Using Only ALS Variables
The ALS model is a biomass regression model constructed using only ALS data.The results of all four regression models are listed in Table 5 and in Figures 4 and 5.As can be seen from Table 5, among the ALS models regressed by the four methods, SRM achieves the best model performance, with the highest R 2 at 0.78 and the lowest RSME of ±31.85 Mg/ha.The boosting method achieves the highest absolute value (−5.09Mg/ha) for bias, while SRM achieves a low of 0.00008 Mg/ha.From Figures 4 and 5, it can be observed that the SRM approach with the ALS model exhibits exceptional fitting performance and optimal residual distribution performance.From Figure 4, it is evident that the observation points of the SRM method are the closest to the trend line.The fitted line has a slope of 1, and the intercept is minimal.In addition, it is noteworthy that this method tends to underestimate values when AGB estimates are below 50 Mg/ha, as shown in Figure 4. From Figure 5, it is evident that only the SVM method exhibits a noticeable imbalance in the distribution of observation points on both sides of the x-axis.Meanwhile, the SRM method displays the smallest range of variation along the y-axis.

AGB Prediction Using Multi-Source Data
The GF2-ALS and LS8-ALS models represent the utilization of ALS data in conjunction with GF2 and LS8 data, respectively.For the models built by these two fusion data sets as shown in Table 5, the rRMSE does not exceed 25%, except for the LS8-ALS model using the SVM method.R 2 ranges are 0.66 (SVM) to 0.82 (SRM), 0.65 (SVM) to 0.75 (SRM), and the absolute value of bias ranges from 0.0002 (SRM) to −4.05 Mg/ha (SVM), 0.00008 (SRM) to −6.27 Mg/ha (Boosting), the RMSE spans between ±28.85 (SRM) and ±38.29 Mg/ha (SVM), between ±31.85 Mg/ha and ±39.20 Mg/ha (SVM), respectively.For the fusion data, the SRM achieves the best performance on both fused data.For both mixed models, the observations of the SRM method are tightly clustered around the trend line, with the best-fitting trend line.As delineated in Figure 4, when the field-estimated AGB surpasses 250 Mg/ha, it is noteworthy that all models sourced from diverse data origins tend to exhibit an AGB underestimation.Both of these fused data models also show a tendency for underestimation by the SRM method for AGB measurements below 50 Mg/ha.Furthermore, Figure 5 demonstrates the use of the SRM method in the GF2-ALS model results in a more favorable residual distribution compared to the LS8-ALS model.Compared to the other three methods, the residual approach of SRM demonstrates a more even distribution on both sides of the x-axis, with a narrower range of residual values.

Cost and Performance of AGB Modeling
The model performance and costs of the proposed method in this study were compared with existing numbers from the literature [4,9,26,27], with the results shown in Figure 6 and Table 6.As can be seen in Table 6, the source of the cost for satellite imagery per hectare was derived from [28].The cost per hectare for ALS point clouds represents the current cost incurred for data acquisition.In Figure 6, the data sources in Table 6 are classified into satellite image data, ALS data, and multisource data.The five models mentioned in this paper are depicted using points in their respective colors, combined with red circles for emphasis.

Data Source Selection for AGB Modeling
Appropriate choice of data sources for AGB modeling is one of the goals of this paper.In this paper, the performance of ALS data is found to be better for AGB modeling than that of GF2 and LS8 data, as indicated in Table 5.Although good model performance has been achieved in this study, problems with ALS data, such as cost and flight conditions, could affect the further application of ALS in forestry.Satellite images have the advantages of large coverage and frequent monitoring.The issues worth investigating are whether satellite imagery can be used as a single data source for the estimation of AGB, and the impact of resolution improvement on AGB modeling of single data and fused data.We hypothesize that if the image resolution can be further increased, this will provide richer spectral details for the model and may significantly improve the results.In this paper, GF2 data with 1 m resolution and LS8 data with 30 m spatial resolution were used for regression modeling.The spatial resolution of satellite imagery plays an important role in AGB prediction; all of our models that used GF2 as the input achieved better performance than those using LS8, as shown in Table 5.The experiment results demonstrate that GF2 imagery with 1 m resolution can provide more abundant spatial details than LS8 imagery with 30 m resolution .
LiDAR data are expensive, while optical imagery are susceptible to saturation phenomena.Therefore, the use of fused data is explored in this paper, with the results shown in Table 6 and Figure 6.Rana et al. [27] used ALS data, RapidEye imagery and Landsat 5 TM imagery to predict AGB.RapidEye information at 5 m spatial resolution and Landsat 5 (TM) images at 30 m spatial resolution showed little improvement of the regression results, as shown in Table 6, while the ALS data clearly enhanced AGB regression.I particular, the R 2 value witnessed an improvement of nearly 40%.Han et al. [26] tested combining the LiDAR data with image data, but found no significant effect over the use of the GF1 image data alone when including Sentinel 1 data.This may be due to the roughness of the forest canopy provided by SAR data, which has little effect on AGB prediction.On this basis, our study confirms the reliability of combining satellite imagery and LiDAR point clouds in AGB estimation, which is consistent with previous study results [30].Compared with the multi-source data modeling results of de Almeida et al. [4] and Han et al. [26], we achieved better AGB estimation results, as illustrated in Figure 6.Therefore, to further improve the precision of ALS estimation of AGB, imagery with a high spatial resolution is needed in order to provide more detailed spatial details.

Variable Selection for AGB Modeling
Another contribution of this work is the variable selection step using importance ranking and correlation analysis.Remote sensing variables (spectral information, vegetation indices, structural information) are used to correlate with AGB biophysical variables.Variable selection reduces high correlations between two or more predictors and identifies valuable remote sensing variables.
In previous studies, spectral bands, vegetation index, and texture [5] have been the main variables derived for use in AGB estimation in optical remote sensing.Height variables [29] and laser penetration rate [3] are important variables for LiDAR.Based on our exploration of the importance of different variables for AGB modeling, PC3 and h cv are important and reliable variables with good generalizability.This is consistent with previous research [31].For satellite images, PC2 and PC3 generated by PCA perform best, while PC1 performs poorly, as shown in the Appendix (see Figure A1).For ALS data, h cv indicates the degree of dispersion of the data, which is more informative than the numerical variation of the height.The important variables identified in this paper for the estimation of AGB are comparable to the variables used in other research, such as height percentiles [30], height variables [10], and canopy cover [3].The type and complexity of the forest and the growth of the trees may lead to differences in the structure and spectral information in data from different areas.In future research, more types of variables could be used for estimating AGB; the addition of new variables and groups of multiple variables are topics worth exploring.

Improvement of AGB Modeling Performance
Model selection has been explored to find the most appropriate regression methods for AGB modeling.de Almeida et al. [4] found that for high spectral resolution data, the choice of regression model has almost no effect.They used six methods, including linear models with (LMR) and without (LM) regularization, Support Vector Regression (SVR), Stochastic Gradient Boosting (SGB), RF, and Cubist (CB) to regress the Brazilian Amazon AGB.Their results were not consistent with previous research findings.They explain that the reasons for the differences could include the number and type of indicators selected as potential input data, the type of vegetation studied, the quality of the field, and the remote sensing data used to obtain the model.In addition, deep learning (DL) methods have become very popular in recent years [32], although there are still several limitations, such as high data requirements, model complexity, and low interpretability.In this paper, there was no obvious difference between different regression models with the same data source input, except that the SRM model achieved the best performance with the highest R 2 along with the lowest RMSE and rRMSE among the twelve regression models with input.However, SRM is more effective only when the ALS variable is added.In the image data source model, SRM has average performance, as shown in Table 4 and Figure 4.For the model with LiDAR variables, the regression performance of the AGB model constructed by SRM is greatly improved, which has not been found before.SRM adds the ALS variables that are important for AGB modeling and removes the insignificant variables.The ALS variables are more correlated with AGB; therefore, further screening of the variables could significantly improve the performance of the model.Future studies should additionally consider modeling errors and the number of increasing plots.

Cost vs. Performance
Cost is a critical factor in practical applications, as solutions that balance accuracy and minimum cost are often preferred.While airborne LiDAR provides high accuracy, its ability to improve the accuracy of images at different resolutions in the same scene remains unknown.Therefore, the key issue is that estimating different costs affects the accuracy of biomass estimation.All data sources have been categorized into satellite imagery, airborne LiDAR, and fused data in Figure 6.A clear and intuitive representation of their respective performance and cost is shown in Figure 6 and Table 6.In general, for satellite imagery, higher data quality typically corresponds to an increase in the cost of the image.From Figure 6, it can be observed that as data costs gradually increase, the performance of models improves as well.The optimal results from free remote sensing imagery are obtained through the LS8 model employed in this study, whereas the optimal model performance for a fee is achieved using the GF2 model.Nevertheless, transitioning from LS8 data to GF1 data leads to a cost increase of USD 0.02 per hectare, accompanied by an improvement in R 2 of 0.12.Shifting from the use of GF1 imagery to GF2 imagery results in a cost increase of USD 0.03 per hectare, with an R 2 improvement of 0.01.For fused data, although our models accomplished the best model performance, there is an issue to consider.The cost of fused data using SAR data and GF1 imagery is reduced by USD 3.53 per hectare compared to the GF2-ALS model, with an R 2 decrease of 0.12, though it still achieves an R 2 of 0.7.Furthermore, the free Sentinel-1 imagery is more cost effective compared to ALS data.The process of choosing the most appropriate data source guided by a combination of model performance and cost considerations can serve as a valuable reference for fellow scholars embarking on future research.It is imperative to emphasize that model performance is intricately linked to variables such as tree species and planting density.Consequently, Figure 6 serves as an initial visual comparative reference, with further detailed deliberation remaining essential.

Potential of Large Scale AGB Modeling
The current carbon sequestration potential of forests and their resilience against geographical and climatic changes are often assessed on the basis of AGB levels, which reflect forest productivity and resilience.In order to explore larger-scale forest AGB estimation, additional factors need to be further explored.Future work should further explore tree species diversity and heterogeneous forests for AGB prediction based on current experiments.In addition, taking all factors into consideration, the type of dataset closely determines the final performance of AGB estimation.Model precision is not the only factor in data source selection; the update frequency, range cover, and cost of the data are important as well [33].Satellite imagery has a fixed revisit cycle, while low-and medium-resolution satellite imagery such as Landsat series images can be obtained for free.
The large-scale collection of ALS depends on the planning of data updated by the local forestry or surveying and mapping department.Although in this paper the model built using satellite imagery alone was less accurate compared to the best model built using ALS data, satellite imagery is widely available and inexpensive, and may even be available free of charge.With technological innovations, the cost of airborne point cloud data acquisition has recently been decreasing.ALS data production for ALTM 1210 acquisition was about USD 1100 per km 2 in 2004 [34] and about USD 500 per km 2 in 2010 [35].Nevertheless, ALS data costs remain high compared to satellite imagery.Regarding the selection of image data sources for large-scale biomass modeling, further considerations can be made based on the results in Table 6.Our study is the first to use a combination of medium-and high-resolution satellite imagery and ALS data to investigate the cost and performance of multi-source data for AGB modeling.This study demonstrates the potential of using low-cost satellite data to provide a rough estimation of AGB nationally, and as such can guide future forest management decisions.

Conclusions
In this study, medium-resolution and high-resolution satellite imagery were combined with ALS data to estimate model performance and evaluate costs.First, TLS and ULS data were used to acquire individual tree parameters and sample AGB plots, replacing the traditional ground-measured data.The combination of ALS point clouds and satellite images was then used to obtain structure and spectral information of the forested areas.Satellite imagery, ALS, and fused features were extracted and filtered for AGB modeling using four regression methods.The most important part of this paper is the exploration of the performance of the different data sources, especially the performance of AGB estimation combined with medium-to-high resolution imagery and ALS data.the ALS model performed the best, followed by the GF2 model, while the LS8 model exhibited poorer performance.Among the different methods, the GF2-ALS model developed using SRM performed the best, with an R 2 of 0.82.The cost needs to be considered during data selection as well; thus, remote sensing data with different costs were used to explore the potential of AGB estimation, including free and low-cost satellite data as well as more expensive airborne point cloud data.Using imagery alone as the data source, the GF2 model provided an increase in terms of R 2 of 0.1 per square kilometer compared to the LS8 model, at an additional cost of 1 USD.When using fused data, the GF2-ALS data model R 2 increased by 0.2, while the cost increased by USD 351 per square kilometer.Combining imagery with high spatial resolution and ALS data significantly improved the performance of the AGB model.Overall, it is necessary to comprehensively consider both cost and model performance for large-scale estimation of AGB.

( 1 )
tree parameter and AGB calculation at the sample plot level; (2) extraction and selection of variables from satellite images and ALS data; and (3) AGB regression model construction and evaluation using four regression methods and multi-source data.

Figure 1 .
Figure 1.The workflow of the proposed method.

Figure 3 .
Figure 3. Distribution of sample sites and TLS collection locations: (a) Masson pine forest in Guigang municipality; (b) Masson pine forest in Laibin municipality; (c) Eucalyptus forest in Laibin municipality; and (d) Masson pine forest in Qinzhou municipality.

Figure 4 .Figure 5 .
Figure 4. Scatterplots of the predicted and estimated AGB for the four fitting methods.

Figure 6 .
Figure 6.Scatterplot of R 2 and cost.Cost results are for the multispectral products available from the Apollo Mapping archives (Standard Tasking) [28].

Figure A1 .
Figure A1.Satellite imagery variables selection.Importance ranking of the variables in (a) the GF2 model and (c) the LS8 model.Correlation analysis for (b) the GF2 model and (d) the LS8 model.Dark green and red indicate highly negative and positive correlations, respectively.

Figure A2 .
Figure A2.ALS variables selection.(a) Importance ranking of the variables in the ALS model.(b) Correlation analysis for the ALS model.Dark green and red indicate highly negative and positive correlations, respectively.

Figure A3 .
Figure A3.Fused data variables selection.(c) Importance ranking of variables in (a) the GF2-ALS model and (c) the LS8-ALS model.Correlation analysis for (b) the GF2-ALS model and (d) the LS8-ALS model.Dark green and red indicate highly negative positive correlations, respectively.

Table 1 .
Study area description.

Eucalyptus plots Masson pine plots Qinzhou Laibin Guigang Figure 2. The
study areas in Guangxi Province, China.

Table 3 .
Metrics calculated from the satellite image data.

Table 4 .
Summary of the ALS metrics.

Table 5 .
AGB modeling performance of the ten-fold cross-validated regression methods.

Table 6 .
Comparison of image data source modeling performance and cost.