Regional Forest Volume Estimation by Expanding LiDAR Samples Using Multi-Sensor Satellite Data

: Accurate information regarding forest volume plays an important role in estimating a ﬀ orestation, timber harvesting, and forest ecological services. Traditionally, operations on forest growing stock volume using ﬁeld measurements are labor-intensive and time-consuming. Recently, remote sensing technology has emerged as a time-cost e ﬃ cient method for forest inventory. In the present study, we have adopted three procedures, including samples expanding, feature selection, and results generation and evaluation. Extrapolating the samples from Light Detection and Ranging (LiDAR) scanning is the most important step in satisfying the requirement of sample size for nonparametric methods operation and result in accuracy improvement. Besides, mean decrease Gini (MDG) methodology embedded into Random Forest (RF) algorithm served as a selector for feature measure; afterwards, RF and K-Nearest Neighbor (KNN) were adopted in subsequent forest volume prediction. The results show that the retrieval of Forest volume in the entire area was in the range of 50–360 m 3 / ha, and the results from the two models show a better consistency while using the sample combination extrapolated by the optimal threshold value (2 × 10 − 4 ), leading to the best performances of RF (R 2 = 0.618, root mean square error, RMSE = 43.641 m 3 / ha, mean absolute error, MAE = 33.016 m 3 / ha), followed by KNN (R 2 = 0.617, RMSE = 43.693 m 3 / ha, MAE = 32.534 m 3 / ha). The detailed analysis that is discussed in the present paper clearly shows that expanding image-derived LiDAR samples helps in reﬁning the prediction of regional forest volume while using satellite data and nonparametric models.


Introduction
Forests, as one of the essential terrestrial ecosystems, play an indispensable ecological and social service functions [1,2], being one of the important sources for carbon sink [3][4][5]. The growing stock volume is recognized as one of the most important forest attributes for monitoring forest growth, assessing the yield of timber of plantation and natural forest, and estimating forest biomass, especially, for the forest stand timber harvest operations, i.e., rotation, method, and allowable size requiring accurate information on tree size. Thus, forest stock information, which can improve the efficiency of forest management and reduce the cost of time and labor, is in high demand in the industry sector [6]. The regional tree size results used to be roughly quantified while using diameter-at-breast-height (DBH)-based allometry based on forest inventory data, which is undertaken every five years. Currently, the remote sensing technique has emerged as an important tool for forest inventory, providing continuous and up-to-date information on forest volume, which can help in forest resource management and forest growth observations [7][8][9]. Earlier studies involving the use of multi-source satellite techniques for forest volume prediction can be divided into two categories: using only Airborne Light Detection and Ranging (LiDAR) for accurate and convenient acquisition of small-scale forest volume distributions; and adopting wide-range Synthetic Aperture Radar (SAR) or multi-spectral satellite images, or coordinating multi-sensor data to estimate regional-wide forest volume mappings.
LiDAR is capable of directly capturing the three-dimension information on forest structure via actively transmitting laser pulses that interact with forest structure and then receiving the return signals. It has evolved into the preeminent remotely sensed platform to characterize detailed information of forest attributes spatially, since the end of the last century [10][11][12][13][14][15][16][17][18] due to its high precision and flexibility critical for operational forest management, and it has been extensively adopted for artificial or natural forest attributes monitoring on the individual tree-level [19][20][21] and small-scale [22][23][24]. Moreover, forest volume, as one of the structure parameters, was often accurately estimated by LiDAR [25], for example, Clementel et al. [26] have carried out statistical models combined with medium-resolution LiDAR to produce timber volume mapping, and Lo et al. [19] demonstrated tree growth competition index (LCI) derived from LiDAR scanning while using a rasterized canopy height model (multilevel morphological active-contour algorithm) was a key factor for forest volume estimation. Additionally, the relationship between volume and height and the sensitivity of tree volume estimation to LiDAR trajectory error were implemented [9,27].
Even though LiDAR can acquire horizontal and vertical information on forest structure at high spatial resolutions and vertical accuracies, it was mainly applied over small-and moderate-scale forest applications, owing to the cost constraints [28]. Instead, the optical and SAR remote sensing are more applicable to depicting forest structure over a large area. Since the 20th century, air-and space-imaging technologies (Airborne Imaging Spectrometer, i.e., AIS and Landsat thematic mapper, i.e., TM) were introduced into forest inventory [29][30][31]. The multi-spectral images of high quality and availability have been used to predict regional forest volume, prior to the use of other remote sensing technologies [8,32,33]. Sentinel-2 data have emerged as one of the popular data sources for stock volume mapping, due to its higher resolution and the unique red-edge band sensitive to vegetations since the launch of the Sentinel-2A satellite in 2015 [34,35]. Optical remote sensing still has some drawbacks: it is prone to yielding spectral saturation, leading to an underestimation of the forest structure parameters in the dense forest due to its insufficient penetration through the canopy; meanwhile, the impact of the weather could result in the lack of data in cloudy and rainy days. Nevertheless, SAR remote sensing has all-weather, all-time capabilities, and higher sensitivity for forest structure as compared with optical imaging. SAR satellite data, such as ERS, RADARSAT, JRES, PALSAR, and so on, were diffusely adopted to derive forest growing stock volume [36][37][38][39]; additionally, optical and radar data synergy has also been extensively studied in volume extraction, yielding more reliable precision [37,40,41]. To improve the accuracy of forest volume mapping in large areas, e.g., municipal, provincial, and even larger scales, coordinating high-precision LiDAR scanning and other satellite data was proven to be an important approach [42]. Hawrylo et al. [43] separately used multiple linear regression (LM) and random forest (RF) method to predict stock volume in Scots pine Stand while using Sentinel-2 combining with airborne point clouds, and unmanned aerial vehicles (UAV) combined with Sentinel-2 was also studied for stock volume estimation through a hierarchical model-based mode of inference [44]. In addition, Landsat and LiDAR composites [45], as well as combination of more than two-sensor (LiDAR, Landsat, and PALSAR) [40], were used to predict forest volume distribution.
Although all of the earlier studies have provided various solutions to forest volume prediction, they still suffer from some limitations. One of the main limitations was that the LiDAR-only results could not cover a large area, and the image-derived LiDAR points were directly used as the training and testing samples for regional forest volume inference. When the area of the whole study region was much larger than that of the LiDAR scanning extent, the samples from LiDAR will no longer be able to characterize the distribution of the forest volume across the entire region [46,47]. The objectives of the present study are to expand the original image-derived from LiDAR data with more reasonable distribution throughout the study area, which refers to intensifying the plots to increase the number and spatial frequency of the training samples; to find out the optimal threshold value used in sample expanding; and, to estimate the impact of additionally extrapolating LiDAR samples on the accuracy of predicting outcomes while using the approaches of Random Forest (RF) and K-Nearest Neighbor (KNN), and generate a regional forest volume mapping at~30 m. The detailed analysis shows a better approach to map forest volume combined with multi-source remotely sensed data.

Chifeng City, Inner Mongolia
The study area is located in the east of Inner Mongolia, covering four administrative regions of Chifeng city (municipal district, Aohanqi, Kalaqinqi, and Ningcheng County) (Figure 1), which has complex and diverse topography with an elevation ranging from 300 m to 2000 m, and small mountain flat land, as well as alluvial plain scattering along the river. The climate of this area is temperate semi-arid continental monsoon climate zone, the average annual temperature in most areas is 0-7 • C increasing from northwest to southeast, and the annual average precipitation is 381 mm. Additionally, Chifeng also has abundant forest resources, and plantations of pine, poplar, and some other shrubbery dominated the whole study area. The study area, covering four administrative regions, located in Chifeng, Inner Mongolia (a, b, and c denote the study area, Wangyedian forest district in which the LiDAR scanning result located, and partial enlargement of LiDAR data respectively).

Landsat-8 OLI & Sentinel-1A
Optical and SAR satellite images were used for regional remote sensing applications. The whole image collection consists of Landsat-8 and Sentinel-1A, which were acquired through Google Earth Engine (GEE) (https://code.earthengine.google.com/) cloud platform with a restricted study area [48]. GEE provides powerful computing capabilities and convenient access to satellite data, but also mature machine learning algorithms [49][50][51]. We generated Landsat-8 surface reflectance (SR) images composites via atmospheric correction operation while using the Land Surface Reflectance Code (LaSRC) implemented in GEE [52], of which four visible and near-infrared (VNIR) bands and two short-wave infrared (SWIR) bands were selected for subsequent analysis, because these bands have been found to depict forest characteristics in a number of studies [53][54][55], and Sentinel-1A radar backscatter composites from a dual-polarization C-band SAR instrument was processed while using the Sentinel-1 Toolbox to generate a calibrated, ortho-corrected product for 2017. The SR composite value was the greenest pixel from the eight scenes, where the greenest pixel means the pixel with the highest value of the Normalized Difference Vegetation Index (NDVI) ( Table 1).

Forest Volume Maps of LiDAR and Field Plots
The forest volume that was extracted by UAV LiDAR data was acquired from 22 to 24 September 2017 representing the close enough date as compared to the sensing date of the optical and radar images. The data consist of four patches that are distributed in Wangyedian forest districts ( Figure 1) and were regarded as reference and validation for further analysis. A total of 17 field plots from 21 to 28 September 2017 were also obtained in relation to UAV LiDAR acquisitions to accomplish accuracy assessment of the original LiDAR scanning, which is the critical basis of this study. The consistent time was to ensure the reliability of the verification results.

Topographic Data
The Shuttle Radar Topography Mission (SRTM) (http://srtm.csi.cgiar.org/srtmdata/) was a National Aeronautics and Space Administration (NASA) mission undertaken in 2000 to enable accessing elevation data from 56 • S to 60 • N (over 119 million km 2 , covering more than 80% of the global surface), which produced high-precision (one arc-second, or around 30 m) digital surface elevation models (DSM) from 9.8 terabytes C-band radar images that were obtained by the Endeavour Space Shuttle from 11-22 February while using radar mapping technology [56]. It has been providing a reliable source of data for geoscience analysis at global and regional scales as one of the world's most complete high-precision terrain data [57][58][59]. The restricted administrative divisions using ArcGIS software were utilized to clip topographic data. The western elevation of the entire study area is higher than that of the eastern regions, and the highest elevation value from the SRTM of the study area is about 2000 m ( Figure 2).

Auxiliary Data
Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC30) is the first 30 m resolution global land cover maps that were produced using Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) data by Tsinghua University [60]. The Land Cover classification of which the detailed information that is given in Table 2 was downloaded from the website (http://data.ess.tsinghua.edu.cn/fromglc2017v1.html) and used as a mask layer to enable us to directly access the spatial distribution of forest in the selection of restricted data over forest layer.

Methods
The overall workflow consists of the following four processes: (1) Extracting features from Landsat-8 and Sentinel-1; (2) intensifying points of forest volume that were derived from LiDAR mapping to generate training and validation samples; (3) feature selection and model training for RF and KNN algorithm; and, (4) estimating forest volume and assessing the accuracy of outcomes. Figure 3 shows the specific technical process.

Features Extraction from the Satellite Data
The surface reflectance and radar backscatter were further processed to spectral or texture indicators more relevant to forest volume in ENVI software. Six vegetation indices (NDVI, EVI, DVI, RVI, SAV, and MASI) were extracted from the Landsat-8 surface reflectance data to reflect the spectral characteristics of forest volume. Six texture predictor variables (mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation) were obtained for each band of Landsat-8 and Sentinel-1 data while using co-occurrence method for 3 × 3 windows using ENVI software, which resulted in a total of 64 texture features, of which 48 features from the six bands of Landsat-8, and the others from the two Sentinel-1 bands. We have also used Principal Component Analysis (PCA) for the six bands of Landsat-8, and the first two components were considered. For the topographic feature, three features (elevation, slope, and aspect) were acquired from the 30m-resolution SRTM data, and the reclassification was conducted for the slope and aspect variable based on the prior knowledge of forestry investigation that has been used in many field works (https://wenku.baidu.com/view/f4f111280066f5335a812134.html), where the slope was divided into six levels, and the aspect was reclassified into three categories (Table 3).

Expanding of LiDAR samples
The forest volume has been extracted from LiDAR data in the Wangyedian forest district to produce training and validation. We firstly made an accuracy assessment of the original LiDAR scanning results while using the 17 field plots. We used the toolbox (create random point integrated in ArcGIS) to yield 666 forest reference plots from the 30m-resolution forest volume using LiDAR, of which 134 plots served as validation samples to assess the results for the further analysis, and 532 plots functioned as field plots (FPs refer to plots that were obtained by sampling LiDAR images and functioned as real field plots) used for subsequent virtual plots (VPs refer to the intensified plots using FPs) generation ( Figure 4). We expanded the original FPs based on the raster features to obtain more plots to study the correlation between the satellite image features and FPs. According to the similarity criterion, two types of variables were involved: one was environment variables, being made up of reclassified aspect, slope, and the forest mask layer from FROM-GLC30, and the other was remotely sensed variables, consisting of four VNIR bands were considered to the FPs expanding operation. First, we extracted the value of the pixel of the environment, as well as remote sensing, features overlapping each FP, which was repeated for all FPs, and we then searched for the most similar pixels (MSPs) throughout the study area. The FP and MSPs have to satisfy two conditions: environmental values equal to those of the FP, which means that they belong to the same surroundings with the given FP, and their remotely sensed values should be close enough to FP that means the spectral difference cannot exceed to the given threshold value. Such conditions indicate their spectral and texture information reflected by satellite images is similar to the given FP [46], as: where e and r represent the environment and remotely sensed variables respectively, and the subscripts i and j indicate their corresponding index numbers. T denotes the threshold value that is used to quantify the spectral closeness; to find the optimal T value, we initialized the threshold in an interval [0.0001, 0.001], yielding a total of 10 threshold values. The threshold range was given in consideration of the one-sidedness and subjectivity of a single threshold, and it was inspired by previous related studies on measurement of spectral closeness of Landsat-5, where a fixed threshold (0.01) was given [47]. As we all know that if the threshold is too lenient, the VPs would be very error-prone. Therefore, we took 0.01 as the maximum threshold and try to find a better threshold. When all of the MSPs were identified for a certain FP, the value of FP was directly assigned to the MSPs, and the MSPs were then turned into VPs. A total of 10 sets of VPs corresponding to the given 10 T values were acquired at last.

Feature Selection and Modeling
RF [61,62] and KNN [63] were adopted to model the relationship between forest volume value and remotely sensed features. For RF, it is based on a large number of the tree structure and is regarded as an easy-to-use algorithm, because its intuitive and understandable hyper-parameters usually produce a good prediction result by default. The KNN algorithm is also a mature and simple machine learning algorithm that assigns the average of the properties of these neighbors to the sample by finding its k nearest neighbors. For training samples, there were 10 sets of VPs samples with different numbers being generated by threshold values, together with one set of FPs, yielding 21 sets of sample combinations (the number of combination FPs, FPs + VPs, and VPs was 10, 10, and one, respectively).
The feature selection and model training processes were produced from the combinations using the "scikit-learn" package in the python programming language. The feature selection performance was carried out for 83 candidate features, and the optimal 10 features were selected by ranking their importance based on mean decrease Gini (MDG) criteria for subsequent modeling. A predictor variable that has the maximum MDG value is of the most importance. We have used both RF and KNN models and applied the grid searching methodology for detecting the optimal values of key hyper-parameters for the two models to estimate the reliable forest volume distribution. Table 4 describes the configuration for key hyper-parameter optimization.

Forest Volume Mapping and Validation
A total of 20 regional forest volume mapping results were generated for the 10 T values based on trained models combined with the best 10 feature variables, and the mapping accuracy was evaluated while using a stand-alone validation dataset composed of 134 LiDAR points ( Figure 4). We have also computed the root mean square error (RMSE) and mean absolute error (MAE) to quantify the error in the estimation of forest volume, as: and where y i denotes the measured and predicted volume values of points i andŷ represents the mean value for all of the measured validation points. The n is the number of validation samples and equals to 134 here. The RMSE and MAE were compared while using RF and KNN models.

Expanding of LiDAR Samples
More abundant samples were produced while using FPs that were derived from LiDAR mapping combined with remote sensing and topographic images. Table 5 depicts the accuracy assessment of the original LiDAR scanning, from which the accuracy of forest volume measurement by LiDAR estimation against ground truth can be seen from the statistical values (RMSE and R 2 ), and the contingency matrix also shows that the LiDAR estimation accuracy is above 84%. The intensified sample distribution was restricted within the entire study area (Figure 5), working out more reasonable and representative for regional forest volume estimation when compared with the original LiDAR samples that were concentrated in one forest district. As Figure 5 shows, the spatial frequency of VPs increased with the increasing threshold value. Additionally, we have carried out statistical analysis using a bar chart with the horizontal and vertical axis denoting the expanded sample number and threshold value to understand the distribution of the number of each expanded point for different thresholds.

Feature Importance and Modeling
We have carried out detailed statistics for 83 candidate features while using mean decrease Gini (MDG) measures that were embedded into the RF algorithm. The expanding samples procedure produced a total of 21 extrapolated sample combinations, but we plotted the most important 10 variables that had the highest MDG values of three combinations using line chart, of which the vertical axis listed the best feature name, and the horizontal axis represented the score that was measured by MDG. The most significant feature was a topographic variable (i.e., Elevation) from all the inputting variables using only FPs samples (e.g., the left subgraph). Whereas the blue band of Landsat-8 demonstrated higher contribution over the others to the volume prediction while using two different sample combinations (i.e., FPs + VPs (T = 2 × 10 −4 ) and VPs (T = 2 × 10 −4 ) generated by expanding LiDAR samples at a given threshold value (2 × 10 −4 ). Besides, the two combinations yielded the same top three important features (B1, Elevation, and RVI, respectively) that reached a similarity of 70% (e.g., the middle and right subgraphs). In addition, the optimal features of FPs contained more texture variables when compared to the other two combinations ( Figure 6). Figure 6. The ranking chart of the optimal 10 variables for forest volume estimation selected using MDG algorithm in RF. The ordinate represents the optimal variable name, and its importance gradually decreases upward (a)-(c) denote the result of sample combination field plots (FPs), FPs + VPs (T = 2 × 10 −4 ) , and VPs (T = 2 × 10 −4 ) , respectively, where B1 represent the blue spectrum band of Landsat-8 OLI).
The model training step of RF and KNN was carried out based on the previously selected 10 features, and the grid-searching implements determined all of the optimal values for hyper-parameters of models under 10 different threshold conditions (i.e., from 1 × 10 −4 to 1 × 10 −3 in 1 × 10 −4 step length). The resulting optimal models were used in the volume estimation.

Optimal Forest Volume Estimation and Validation
The forest volume mappings were produced while using the two trained models, and the results were evaluated using the validation samples ( Figure 4). All of the selected sample combinations were separately entered into RF and KNN model to carry out the estimation process. According to accuracy validation, the variation of the R squared value is shown in Figure 6. All of the variations show an increase after decreasing rapidly, and the highest precisions were obtained at the peak point of T = 2 × 10 −4 , where the R squared values were found to be more than 0.6, higher than using only FPs combination (two horizontal lines). In addition, the accuracy of the same model, taking the RF model as an example, was found to be consistent under the driving of the two sample combinations, (i.e., FVP and VP), and additionally adding the original sample (FVP, red curve) shows a better performance, and its accuracy was found to be higher than that of only using the extended sample (VP, yellow curve). When compared with the KNN model, this difference in the accuracy of different sample composites that were generated by the random forest model was smaller, indicating that the random forest model was more stable and robust to forest volume estimation (Figure 7). The root mean square error (RMSE) and mean absolute error (MAE) variations agree with the R squared value. We have considered the RF and KNN models to estimate the highest precision results of forest volume distribution containing three groups, of which each was made up of two mappings from the corresponding models (Figure 8), and we also conducted an accuracy assessment while using the validation samples ( Figure 9). The results show that the forest volume across the study area varies from 50 to 360 m 3 /ha, and spatial distribution shows the difference between original samples (FPs only) and additionally adopting expanded samples (i.e., VPs (T = 2 × 10 −4 ) ) only and FPs + VPs (T = 2 × 10 −4 ) ). For the former, the high values were mainly concentrated in the west of the Kalaqinqi and Ningcheng county, while they were also distributed in the south of Aohanqi with the adding of VPs. The scatter plot shows that expanding the FPs collection can potentially improve the accuracy of the outcome when compared to the use of only FPs. The RF model using the VPs only combination performed a better precision result than that driven by only FPs; the result was further improved (R 2 = 0.618, RMSE = 43.641 m 3 /ha, and MAE = 33.016 m 3 /ha) by combining of FPs with VPs (i.e., FPs + VPs) while using the RF model. The accuracy trend using the KNN model was similar to the RF, and the accuracy of the outcomes of the two models was almost the same for the same sample combination, reaching the highest accuracy (R 2 = 0.617, RMSE = 43.693 m 3 /ha, and MAE = 32.534 m 3 /ha). The accuracy for the forest volume distribution was almost same (R 2 = 0.62, RMSE = 43 m 3 /ha, and MAE = 32 m 3 /ha) with the comparison of two models.

Discussion
The expanding forest volume samples resulted in more abundant samples that were rationally distributed throughout the entire study area, which can especially alleviate the shortage of ground-truth samples for forest attributes calculation in large areas while using machine learning [47]. We achieved regional forest volume estimation while using RF and KNN algorithms that have been widely used both in classification and regression [64][65][66][67]. Related studies directly imputed forest volume combining Landsat images and forest inventory data using KNN model, yielding a result with highest precision (RMSE = 74 m 3 /ha) [68]. The accuracy of forest volume estimation could be improved (RMSE = 62 m 3 /ha) when LiDAR was used with other multi-source remote sensing data [40]. When compared with existing related research, our method achieves an accuracy improvement (RMSE = 44 m 3 /ha), and it reduces the dependence on the amount of LiDAR data. We have found two critical procedures, one is sample expanding, and the other is feature selection, which is involved in the following discussions.

Samples Extrapolation
The low cost-effectiveness and time-consuming approach to acquire enough forest field measurements that, however, are important to regional forest applications, especially when a nonparametric method is considered. We considered three satellite-acquired topographic variables, i.e., elevation, aspect, and slope, which were different from the earlier studies, where more climatic variables (temperature, precipitation, and solar radiation) were required [46]. The threshold value, used for spectral similarity measure to find MSPs at a given original point, was not a single fixed value; instead, it was assigned within an interval to identify the optimal one, which resulted in more sample combinations for subsequent procedures discussed earlier. The larger the threshold, the more extended samples size were produced ( Figure 5), whereas, the change in accuracy was not linearly positively correlated to the number of extended samples (Figures 7 and 10), which means that the accuracy of outcomes, besides the quantity, was also related to the correlation among the samples. For satellite auxiliary data, the primary surface reflectance of four VNIR bands of Landsat-8 was used, by which many indices were designed for vegetation studying, and we will try to involve more data, such as L-band radar backscatter being found to be more sensitive to forest attributes in the future study [69].

Feature Importance Measure
It is of great importance to select variables before modeling nonparametric methods due to the existence of covariates in satellite data [70]. The contribution of features to forest volume was evaluated while using MDG measures, which were embedded in RF algorithm, in which the changes of Gini impurity metric was calculated and ranked, and the MDG methodology has been applied for selecting features derived from air-and space-images and was demonstrated to be popular, robust, and stable for feature importance measure [71][72][73][74][75][76]. Moreover, the feature selection process was performed for each sample combination, and the variations of the results ( Figure 6) show that the quantity of sample affected the variable importance measurement. We need to take additional criteria for improving feature selection in our future studies.

Implications and Future Work
The overall accuracy (Figures 8 and 9) of this study has proven the feasibility of the proposed method that would be very helpful in the acquisition of forest structure parameters. Thus, future work could acquire reliable regional forest volume distribution at a relatively low cost by intensifying the UAV-LiDAR data. The following aspects could affect the results in this study. First, the existence of positioning error of the original LiDAR plots and remotely sensed data would cause some deviations in outcomes, of which the improvement depends on the equipment and data processing methods. Besides, adding ground-measured samples to the accuracy verification comparison could further improve the reliability of the results. Additionally, the validity of the feature directly affects the results, the aspect and slope are directly input into model, while the sine and cosine transformations were also used in related studies [77]. For non-parametric models, the initialization of hyperparameters has a significant influence on the model output, the key hyper-parameters of RF and KNN are given in Table 4 in insofar study, but they vary with the specific problem being dealt with. Thus, our further work would consider using additional ground truth field plots to validate the results and analyze whether the interactions between different topographic variables and terrain components affect the results. In addition, we would also pay attention to hyper-parameters tuning of machine learning algorithms, and further analyze its influence on final model output.

Conclusions
In the present study, regional forest volume mappings at a resolution of~30 m were produced based on the point-specific plots that were extracted from LiDAR scanning images combined with the expanded samples, Landsat-8, Sentinel-1, and topographic images. It is a challenging but rewarding way to use the LiDAR data within a limited distribution range for generating quantitative forest attributes at a moderate geographic scale. The forest volume was in the range of 50-360 m 3 /ha; besides, the results with the highest accuracy were produced. Our results show a stable performance (R 2 = 0.62, RMSE = 43 m 3 /ha, and MAE = 32 m 3 /ha) over the RF and KNN models. Meanwhile, the optimal threshold value was found to be given 2 × 10 −4 in the range from 1 × 10 −4 to 1 × 10 −3 in the intervals of 1 × 10 −4 .
The present study proposed a promising approach for producing forest volume mapping combined with multi-source remotely sensed data via expanding the original samples that have a limited number or distribution. Therefore, the proposed methodology could be incorporated into forest resource survey and monitoring programs to assist in the quantitative measurement of forest properties to fully exploit the advantages of satellite technology and reduce the time and labor costs of traditional surveys.