Integration of Multi-Sensor Data to Estimate Plot-Level Stem Volume Using Machine Learning Algorithms–Case Study of Evergreen Conifer Planted Forests in Japan

: The development of new methods for estimating precise forest structure parameters is essential for the quantitative evaluation of forest resources. Conventional use of satellite image data, increasing use of terrestrial laser scanning (TLS), and emerging trends in the use of unmanned aerial systems (UASs) highlight the importance of modern technologies in the realm of forest observation. Each technology has di ﬀ erent advantages, and this work seeks to incorporate multiple satellite, 0.11, RMSE = 109.23 m 3 / ha and rRMSE = 17.71%. Model UAS + SAR improves overall (R 2 = 0.51, RMSE = 80.90 m 3 / ha and rRMSE = 15.52%) and it is the only useful model compared to the former two, although majority of the validation plots were still overestimated. that signiﬁcantly partitioned the data.


Study Area
The study area was in the Kiryu Experimental Watershed located on the southern side of Lake Biwa in Otsu City, Shiga Prefecture, Japan. The center coordinates of the test location are 135.9943°E, 34.9635°N. The site is managed by the Department of Agriculture, Kyoto University, which has used the site for multiple purposes, including the monitoring of the ecosystem dynamics related to forest hydrology and gas exchange [41]. It is important to quantify the forest structures within the area for evaluating the forest management and ecosystem dynamics. The test site covered an area of approximately 80 m by 80 m, and a flux tower was present near the center. The target trees were Japanese cypress (Chamaecyparis obtuse (Sieb. et Zucc.)) covering the subwatershed area, which were mostly planted in 1959 but included some older trees more than 100 years old. Japanese cypress is the second most common evergreen conifer tree planted in the country, so it is a good choice as the subject of this analysis. The elevation of the site ranges from approximately 190-255 m a.s.l. (above mean sea level), and the annual mean temperature and rainfall are 13.5 °C and 1,682 mm, respectively (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). The heterogeneous microtopography in the site shows the characteristics of headwater channels, including steep valley-side slopes, degraded headwater channel beds, and incised channels downstream. Many forested areas within Japan feature such rugged topography, and it can be challenging to quantify forest resources in such places through remote sensing approaches ( Figure  1). Figure 1. Characteristics of the Kiryu Experimental Watershed. Moderately dense trees are seen at the site with heterogeneous microtopography. The middle image shows the flux tower at the site from the surface, and the right image shows the flux tower and the surrounding landscape as seen from the unmanned aerial system (UAS) (Photo by author: 19 December, 2017). Detail of geographical location is shown by Iizuka et al. [34].

Methodology
A flowchart of the methodology is shown in Figure 2. The concept of this work was to determine whether multiple remote sensing data sets can improve the estimation of the stem volume through different regression analyses. Radar data and processed data from UAS photogrammetry were used as variables in the estimation, while the TLS data collected in situ were treated as the ground truth information. Moderately dense trees are seen at the site with heterogeneous microtopography. The middle image shows the flux tower at the site from the surface, and the right image shows the flux tower and the surrounding landscape as seen from the unmanned aerial system (UAS) (Photo by author: 19 December, 2017). Detail of geographical location is shown by Iizuka et al. [34].

Methodology
A flowchart of the methodology is shown in Figure 2. The concept of this work was to determine whether multiple remote sensing data sets can improve the estimation of the stem volume through different regression analyses. Radar data and processed data from UAS photogrammetry were used as variables in the estimation, while the TLS data collected in situ were treated as the ground truth information. Figure 2. Flowchart of the methodology. Terrestrial laser scanning (TLS) was used for ground truthing and generation of the digital terrain model (DTM). A UAS was utilized for collecting aerial images, which were then processed to obtain a digital surface model (DSM) and orthoimages. Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) and Sentinel-1 data were preprocessed and used as independent variables. All variables were incorporated in the end with different regression analysis for the test.

PALSAR2 Mosaic Data and Sentinel-1 TOPSAR Preprocessing
Two different SAR data sets were collected. One was the data from Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2). The global mosaic data set from 2017 was downloaded from the Japan Aerospace Exploration Agency (JAXA) webpage. The advantage of L-band SAR is that the long wavelength can penetrate the canopy layers and interact with stems and branches [18]; thus, we can collect information from the lower layer of forests. The HH (horizontally transmitted, horizontally received) and HV (horizontally transmitted, vertically received) data collected were processed with the following formula to convert the information from digital numbers (DN) to backscattering intensity information. σ = 10 log ( ) CF (1) where σ 0 is the backscattering intensity in decibel units and CF is the calibration factor: −83.0 dB [42]. The global mosaic data are already radiometric corrected and in a form of the plane perpendicular to the line of sight from sensor to the ground surface [43]. The second data set was from the Sentinel-1 Terrain Observation with Progressive Scans SAR (TOPSAR) developed by the European Space Agency (ESA). The C-band wavelength is shorter can interact more within the canopy [44], and is thus able to collect information from the top layers of the forests. The data were downloaded via the webpage, and the Level-1 Ground Range Detected (GRD) product observed in the Interferometric Wideswath (IW) mode, containing dual polarized (VV, VH) data, is used. The data were observed on 22 nd June, 2017. The data were processed with thermal noise removal and radiometric slope correction using the 5 m digital elevation model published by the Geospatial Information Authority of Japan and Range Doppler terrain correction to convert the map into geographical coordinates. The terrain-corrected data were processed with a 3-by-3 Lee filter [45] to reduce the artifacts from the conversion. Only a subset of the processed data corresponding to the surroundings of the study site was extracted. The processing of Sentinel-1 uses SNAP (Sentinel-1 Toolbox ver. 6.0.0) developed by the ESA. Sentinel-1 was available only for VV and VH, which is due to the acquisition plans for this region.

UAS Observation and Image Processing
On 19 December, 2017, a flight was conducted over the Kiryu Experimental Watershed to collect aerial photographs of the test site. A Phantom 4 Pro (DJI, Shenzhen, China) multicopter UAS was Figure 2. Flowchart of the methodology. Terrestrial laser scanning (TLS) was used for ground truthing and generation of the digital terrain model (DTM). A UAS was utilized for collecting aerial images, which were then processed to obtain a digital surface model (DSM) and orthoimages. Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) and Sentinel-1 data were preprocessed and used as independent variables. All variables were incorporated in the end with different regression analysis for the test.

PALSAR2 Mosaic Data and Sentinel-1 TOPSAR Preprocessing
Two different SAR data sets were collected. One was the data from Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2). The global mosaic data set from 2017 was downloaded from the Japan Aerospace Exploration Agency (JAXA) webpage. The advantage of L-band SAR is that the long wavelength can penetrate the canopy layers and interact with stems and branches [18]; thus, we can collect information from the lower layer of forests. The HH (horizontally transmitted, horizontally received) and HV (horizontally transmitted, vertically received) data collected were processed with the following formula to convert the information from digital numbers (DN) to backscattering intensity information.
where σ 0 is the backscattering intensity in decibel units and CF is the calibration factor: −83.0 dB [42]. The global mosaic data are already radiometric corrected and in a form of the plane perpendicular to the line of sight from sensor to the ground surface [43]. The second data set was from the Sentinel-1 Terrain Observation with Progressive Scans SAR (TOPSAR) developed by the European Space Agency (ESA). The C-band wavelength is shorter can interact more within the canopy [44], and is thus able to collect information from the top layers of the forests. The data were downloaded via the webpage, and the Level-1 Ground Range Detected (GRD) product observed in the Interferometric Wideswath (IW) mode, containing dual polarized (VV, VH) data, is used. The data were observed on 22 June 2017. The data were processed with thermal noise removal and radiometric slope correction using the 5 m digital elevation model published by the Geospatial Information Authority of Japan and Range Doppler terrain correction to convert the map into geographical coordinates. The terrain-corrected data were processed with a 3-by-3 Lee filter [45] to reduce the artifacts from the conversion. Only a subset of the processed data corresponding to the surroundings of the study site was extracted. The processing of Sentinel-1 uses SNAP (Sentinel-1 Toolbox ver. 6.0.0) developed by the ESA. Sentinel-1 was available only for VV and VH, which is due to the acquisition plans for this region.

UAS Observation and Image Processing
On 19 December, 2017, a flight was conducted over the Kiryu Experimental Watershed to collect aerial photographs of the test site. A Phantom 4 Pro (DJI, Shenzhen, China) multicopter UAS was assigned an automated flight plan with DJI Ground Station (GS) Pro. The UAS took off from the top of the flux tower (approximately 29 m) and ascended an additional 70 m in altitude. The flight was designed as a grid, with the camera oriented vertically downward toward the surface, and the forward and side overlaps of each image were set to 85%. The aerial photographs collected were processed with the SfM method [46]

Canopy Segmentation
In a previous study [34], we found that the shape of the tree canopy can correlate with the diameter at breast height (DBH) and is a parameter that can be used to identify individual tree stems. The watershed segmentation [34] was implemented here to collect the individual tree canopy shapes, which could be used to estimate other forest parameters. The raw dense point cloud generated from Photoscan was imported into the CloudCompare software, an open source software for handling 3D point cloud and mesh processing [47]. The raw dense points forming the canopy shape have a bumpy, unsmoothed shape, which leads to errors when segmenting the shape of the canopy of an individual tree. Thus, in CloudCompare, the dense points were processed with a moving least squares (MLS) method in the Point Cloud Library (PCL) wrapper plugin (http://pointclouds.org/) to smooth the surface of the objects (Figure 4a,b). The smoothed alternative model was used for generating a DSM and further performed with watershed segmentation using System for Automated Geoscientific Analysis (SAGA) GIS software ver. 6.0.0 [48]. The module needed two parameters to be chosen: selected local maxima and not to assign a threshold for joining. Other parameters were left as the default values. The segmented result was overlaid with the shadow mask image, generated from the orthoimagery, to exclude areas of shadows and canopy gaps (Figure 4c

Canopy Segmentation
In a previous study [34], we found that the shape of the tree canopy can correlate with the diameter at breast height (DBH) and is a parameter that can be used to identify individual tree stems. The watershed segmentation [34] was implemented here to collect the individual tree canopy shapes, which could be used to estimate other forest parameters. The raw dense point cloud generated from Photoscan was imported into the CloudCompare software, an open source software for handling 3D point cloud and mesh processing [47]. The raw dense points forming the canopy shape have a bumpy, unsmoothed shape, which leads to errors when segmenting the shape of the canopy of an individual tree. Thus, in CloudCompare, the dense points were processed with a moving least squares (MLS) method in the Point Cloud Library (PCL) wrapper plugin (http://pointclouds.org/) to smooth the surface of the objects (Figure 4a,b). The smoothed alternative model was used for generating a DSM and further performed with watershed segmentation using System for Automated Geoscientific Analysis (SAGA) GIS software ver. 6.0.0 [48]. The module needed two parameters to be chosen: selected local maxima and not to assign a threshold for joining. Other parameters were left as the default values. The segmented result was overlaid with the shadow mask image, generated from the orthoimagery, to exclude areas of shadows and canopy gaps (Figure 4c,

TLS Survey
On 19th December, 2017, in situ measurements were carried out using TLS. A lightweight, phase-based short-range scanner (Trimble TX5, Trimble Inc., Sunnyvale, US) was used. The device uses 905 nm wavelength infrared light, and its point acquisition has the ability to measure up to 120 m in distance at a scan frequency of up to 900,000 points per second [49]. Scans were performed not only at distance intervals but also at multiple locations on the neighboring slope. For instance, one scan was collected at the bottom of the slope, while another scan was performed at the top of the neighboring slope. The scanning time was set to approximately 4 minutes, and the device rotates while scanning a 360-degree view. The scanning resolution was set to a point spacing of 12.272 mm at 10 m distance. Initial processing of the raw point cloud data and postprocessing were carried out using the Trimble RealWorks ver. 8.1 (Trimble Inc., Sunnyvale, USA). Using the cloud-based registration, the point clouds from different scanning positions were registered to each other based on the iterative closest point (ICP) algorithm to merge all points. A total of 22 scans were performed for the TLS. The merging process had an average registration accuracy of 0.6 mm for all 22 scans. Figure 5a shows an example of one of the scanning scenes.
Because the scanned TLS data are in a format of relative coordinates, the dense point cloud generated from the UAS SfM method was combined with the TLS data to determine the geocoordinates. Conventionally, ground control points (GCPs) may be used to locate the points with coordinate data, however, the densely forested area lead to difficulties in receiving sufficient GNSS (Global Navigation Satellite System) signals for use with GCPs. The flux tower was the reference object to preliminary match the top layer (SfM point cloud) and bottom layer (TLS point cloud), then the cloud to cloud registration was used same as merging the TLS scans. Registration accuracy resulted approximately 20 cm. A polygon was generated with 10 m grids, and each area was extracted ( Figure 5b).

TLS Survey
On 19th December, 2017, in situ measurements were carried out using TLS. A lightweight, phase-based short-range scanner (Trimble TX5, Trimble Inc., Sunnyvale, US) was used. The device uses 905 nm wavelength infrared light, and its point acquisition has the ability to measure up to 120 m in distance at a scan frequency of up to 900,000 points per second [49]. Scans were performed not only at distance intervals but also at multiple locations on the neighboring slope. For instance, one scan was collected at the bottom of the slope, while another scan was performed at the top of the neighboring slope. The scanning time was set to approximately 4 minutes, and the device rotates while scanning a 360-degree view. The scanning resolution was set to a point spacing of 12.272 mm at 10 m distance. Initial processing of the raw point cloud data and postprocessing were carried out using the Trimble RealWorks ver. 8.1 (Trimble Inc., Sunnyvale, USA). Using the cloud-based registration, the point clouds from different scanning positions were registered to each other based on the iterative closest point (ICP) algorithm to merge all points. A total of 22 scans were performed for the TLS. The merging process had an average registration accuracy of 0.6 mm for all 22 scans. Figure 5a shows an example of one of the scanning scenes.
Because the scanned TLS data are in a format of relative coordinates, the dense point cloud generated from the UAS SfM method was combined with the TLS data to determine the geocoordinates. Conventionally, ground control points (GCPs) may be used to locate the points with coordinate data, however, the densely forested area lead to difficulties in receiving sufficient GNSS (Global Navigation Satellite System) signals for use with GCPs. The flux tower was the reference object to preliminary match the top layer (SfM point cloud) and bottom layer (TLS point cloud), then the cloud to cloud registration was used same as merging the TLS scans. Registration accuracy resulted approximately 20 cm. A polygon was generated with 10 m grids, and each area was extracted ( Figure 5b). Remote Sens. 2020, 12,

Non-Destructive Biophysical Parameter Retrieval for Ground Truth
The combined TLS point cloud was further used for measuring the forest's biophysical parameters. Each of the plots was examined, and the tree height and the DBH were measured using the measuring tool in the RealWorks software. Tree parameters were recorded within each 10 m grid. The measured DBH and tree height data were converted into stem volume data using the Yamamoto equation [50] or the Schumacher and Hall equation [51]: where V is the stem volume (m 3 ), D is DBH (cm) and H is tree height (m). The parameters a, b and c are coefficients, and for the cypress trees in this region, their values are as follows: a = −4.31101, b = 1.83546, c = 1.10655 [52]. Figure 6a,b shows the sum of the volume for each grid and the relationship between the DBH and tree height of individual trees within all grid areas. A total of 33 plots were examined, and all of the individual trees within each plot were measured. The cross and circles on Figure 6a indicate the plots further used for training (21 plots) and validation (12 plots). The total number of trees measured for all plots was n = 403. The volume data per plot was multiplied by 100 to compute the volume per unit area (m 3 /ha).

Non-Destructive Biophysical Parameter Retrieval for Ground Truth
The combined TLS point cloud was further used for measuring the forest's biophysical parameters. Each of the plots was examined, and the tree height and the DBH were measured using the measuring tool in the RealWorks software. Tree parameters were recorded within each 10 m grid. The measured DBH and tree height data were converted into stem volume data using the Yamamoto equation [50] or the Schumacher and Hall equation [51]: where V is the stem volume (m 3 ), D is DBH (cm) and H is tree height (m). The parameters a, b and c are coefficients, and for the cypress trees in this region, their values are as follows: a = −4.31101, b = 1.83546, c = 1.10655 [52]. Figure 6a,b shows the sum of the volume for each grid and the relationship between the DBH and tree height of individual trees within all grid areas. A total of 33 plots were examined, and all of the individual trees within each plot were measured. The cross and circles on Figure 6a indicate the plots further used for training (21 plots) and validation (12 plots). The total number of trees measured for all plots was n = 403. The volume data per plot was multiplied by 100 to compute the volume per unit area (m 3 /ha).

Generating Remotely Sensed Variables
To compute a terrain variable for the estimation, the combined point cloud data were used. The cloth simulation filter (CSF) method was implemented to extract only the terrain surface point clouds [53]. In general, the CSF method flips the 3D surface model (point cloud) upside down (inverted), and a cloth is placed on top of the surface. Depending on the use of a hard cloth or a soft cloth, the cloth surface fills the surface of the inverted 3D model. If this cloth is relatively hard (coarser resolution), the process will produce a rougher terrain surface, while a softer (finer resolution) cloth will generate topography that is more precise. By inverting back to the original axis, the cloth will form the topography of the area excluding all the top surface objects (e.g., houses, trees, etc.) ( Figure 7). The parameters used for the CSF method were in steep slope mode, the cloth resolution was 0.5 m, the iteration number was 500, and the threshold was 0.2. The terrain data extracted i.e., DTM with a resolution of 3.4 mm, were then processed further to generate relative slope position (RSP) due to the heterogeneous microtopography of the test site, which might have some influence on the growth trend of the trees because of hydrological aspects [54]. A value of 1 indicates the peak of the slope, while a value of 0 indicates the bottom of the slope [55].
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/remotesensing and c are coefficients, and for the cypress trees in this region, their values are as follows: a = −4.31101, b = 1.83546, c = 1.10655 [52]. Figure 6a,b shows the sum of the volume for each grid and the relationship between the DBH and tree height of individual trees within all grid areas. A total of 33 plots were examined, and all of the individual trees within each plot were measured. The cross and circles on Figure 6a indicate the plots further used for training (21 plots) and validation (12 plots). The total number of trees measured for all plots was n = 403. The volume data per plot was multiplied by 100 to compute the volume per unit area (m 3 /ha).

Generating Remotely Sensed Variables
To compute a terrain variable for the estimation, the combined point cloud data were used. The cloth simulation filter (CSF) method was implemented to extract only the terrain surface point clouds [53]. In general, the CSF method flips the 3D surface model (point cloud) upside down (inverted), and a cloth is placed on top of the surface. Depending on the use of a hard cloth or a soft cloth, the cloth surface fills the surface of the inverted 3D model. If this cloth is relatively hard (coarser resolution), the process will produce a rougher terrain surface, while a softer (finer resolution) cloth will generate topography that is more precise. By inverting back to the original axis, the cloth will form the topography of the area excluding all the top surface objects (e.g., houses, trees, etc.) ( Figure  7). The parameters used for the CSF method were in steep slope mode, the cloth resolution was 0.5 m, the iteration number was 500, and the threshold was 0.2. The terrain data extracted i.e. DTM with a resolution of 3.4 mm, were then processed further to generate relative slope position (RSP) due to the heterogeneous microtopography of the test site, which might have some influence on the growth trend of the trees because of hydrological aspects [54]. A value of 1 indicates the peak of the slope, while a value of 0 indicates the bottom of the slope [55]. In total, the variables were canopy areaavg (CAavg), canopy areamin (CAmin), canopy areamax (CAmax), CHMavg (m), CHMmin, CHMmax, and fraction of canopy cover (FCC) (%). First, for the canopy area, the rasterized segmentation results were used as the base information. A mask image for shadows and canopy gaps was generated using the orthoimages. The mask was applied to the segmentation images to filter out areas that did not require computation. The segmentation image was processed to calculate the average, minimum and maximum value of the canopy area (in m 2 ) within each 10 m by 10 m grid. The second variable was the tree height, expressed as the canopy height model (CHM). The DSM generated from the SfM method was used with the DTM data computed from the TLS data. By subtracting the DTM from the DSM, we obtained the height of the forested area (CHM). Additionally, the average, minimum, and maximum values of CHM were computed for each grid. Thirdly, for the FCC, the total area of canopy coverage within the grid area was computed (as a percentage).
For the radar data, γ 0 HH, γ 0 HV, γ 0 VV, and γ 0 VH backscattering intensity (dB) was prepared. Both Sentinel-1 and PALSAR have different ground resolutions of 10 m and 25 m, respectively. Therefore, to match the UAS data, both radar images were resampled to the DSMs using the cubic convolution method for further processing purposes.

Machine Learning Algorithms
RFR [56] and SVR [57] are the machine learning methods that have been used most often in image classification [58,59], and their applications have expanded to include forest attributes [60] and biomass modeling [61]. Machine learning methods are becoming an important approach in addition to traditional linear regression, because RFR and SVR can characterize the trends between the variables even when the samples are not linearly separable [62]. For the final estimation of the stem In total, the variables were canopy area avg (CA avg ), canopy area min (CA min ), canopy area max (CA max ), CHM avg (m), CHM min , CHM max , and fraction of canopy cover (FCC) (%). First, for the canopy area, the rasterized segmentation results were used as the base information. A mask image for shadows and canopy gaps was generated using the orthoimages. The mask was applied to the segmentation images to filter out areas that did not require computation. The segmentation image was processed to calculate the average, minimum and maximum value of the canopy area (in m 2 ) within each 10 m by 10 m grid. The second variable was the tree height, expressed as the canopy height model (CHM). The DSM generated from the SfM method was used with the DTM data computed from the TLS data. By subtracting the DTM from the DSM, we obtained the height of the forested area (CHM). Additionally, the average, minimum, and maximum values of CHM were computed for each grid. Thirdly, for the FCC, the total area of canopy coverage within the grid area was computed (as a percentage).
For the radar data, γ 0 HH , γ 0 HV , γ 0 VV , and γ 0 VH backscattering intensity (dB) was prepared. Both Sentinel-1 and PALSAR have different ground resolutions of 10 m and 25 m, respectively. Therefore, to match the UAS data, both radar images were resampled to the DSMs using the cubic convolution method for further processing purposes.

Machine Learning Algorithms
RFR [56] and SVR [57] are the machine learning methods that have been used most often in image classification [58,59], and their applications have expanded to include forest attributes [60] and biomass modeling [61]. Machine learning methods are becoming an important approach in addition to traditional linear regression, because RFR and SVR can characterize the trends between the variables even when the samples are not linearly separable [62]. For the final estimation of the stem volume, two regression tests were implemented to quantify the estimation capability. The stem volume data was used as the dependent variable, while all the other remotely sensed data were used as explanatory variables. For the test, the RFR and SVR library (library "randomForest" [63] and "e1071" [64]) for the R Studio (Integrated Development for R. RStudio, Inc., Boston, MA) and the "raster" and "rgdal" packages were installed to read and process the raster images that were generated.

Support Vector Regression (SVR)
The type of SVR method implemented in this study was the epsilon SVR (ε-SVR). Further instruction has been provided by Mounce et al. [65] and Marabel and Alvarez-Taboada [61].
To determine the most suitable parameter, a hyperparameter optimization was performed to find the best ε, cost parameter, where ε affects the fitting of the model to the training data. A higher ε is associated with an increased error zone, which might result in fewer support vectors. The cost parameter determines the decision boundaries of data sets. A lower cost parameter allows for more errors (more general flat model), while a higher cost parameter provides tight fitting to the model (a more complex model). Training samples were collected from the "training plots" (21 plots) (indicated in Figure 6a). Stratified random sampling was used to generate 2000 samples within the area and extracted values for each variables. SVR was trained using bootstrapping sampling with a sample size of 90% and iteration = 5. A grid search method [66] was implemented to train various numbers of models with slightly different parameters. This study considered ε parameter values 0, 0.1, 0.2 . . . 1 and cost parameter values of 2 2 , 2 3 , 2 4 . . . 2 7 . More than 60 models were trained first, and from the preliminary result adjustment were made for the finer tuning of the model. The best model had values of ε = 0.01, cost = 64, as well as a radial basis function kernel, gamma = 0.1. Remaining plot area (12 plots) was used for the validation. The predicted values were averaged within the plot area and compared with the reference value.

Random Forest Regression (RFR)
The RFR model was developed in the same way as the procedure for SVR. The necessary parameters needed are the size of the minimum trees (node size), number of available parameters for splitting at each tree node (mtry) and the numbers of trees to grow (ntree). Again, a grid search method is used for training each models: node size = 10, 20 . . . 50; mtry = 1, 2 . . . 5; ntree = 500, 600 . . . 1500. The best model had values of node = 20, mtry = 3 and ntree = 1300. The trained model was used with the imported raster variables, and the results were computed.

Correlation of Each Variables and Validating the Predicted Models
Correlation matrix table between each of the 12 variables and stem volume were computed with Pearson's correlation coefficients (r) and p-value for the additional check. The validation of each predicted model was considered by computing the correlation of determination (R 2 ), root mean square error (RMSE) and relative RMSE (rRMSE) between the predicted value and the reference values. The RFR models were further tested to compute the importance of each variables, whether or not they significantly contributed to the prediction of the model. The rfPermute R-package [67] was used, which assess a null distribution of variable's importance against the observed distribution. Each variable was shown if it was significantly different from the null distribution, considering a 5% significance level. We would like to emphasize that the R 2 here is considered as how well the predicted model fits to the 1:1 line, and not the proportion of the variance in the dependent variable that is predictable from the independent variable. To avoid confusion, it will be denoted as fitting R 2 . Fitting R 2 was computed from the following equation: where y is the reference value,ŷ is the predicted value and y is the mean of the reference values. Alexander et al. [68] recommends the criteria of fitting R 2 > 0. 6

Prediction Power of RFR and SVR
Using multiple regression showed lower accuracy for SVR, and the RFR approach outperformed the SVR approach (Figure 9). Considering the numbers of models tested (Table 1), the best model for RFR was Model RFR18 (fitting R 2 = 0.665, RMSE = 66.87 m 3 /ha, rRMSE = 11.95%), which considered CHM min , CHM avg , γ 0 VH , FCC, γ 0 VV , CA min and CA avg variables. The SVR method showed lower accuracy compared to RFR, where the best model was Model SVR2 (fitting R 2 = 0.519, RMSE = 80.12 m 3 /ha, rRMSE = 12.67%), which considered CHM avg variable. Figure 10 shows Figure 11 summarizes the fitting R 2 , RMSE, and rRMSE values of each model computed. As mentioned, a forward stepwise selection method was implemented to determine how the model behaves with the addition of each variable. For the RFR, clearly the addition of each variable enhances both the fitting R 2 and RMSE. The fitting R 2 and RMSE both showed dramatic increase in accuracy, however beyond Model18, the results started to saturate. The results also show that using only the  Figure 11 summarizes the fitting R 2 , RMSE, and rRMSE values of each model computed. As mentioned, a forward stepwise selection method was implemented to determine how the model behaves with the addition of each variable. For the RFR, clearly the addition of each variable enhances both the fitting R 2 and RMSE. The fitting R 2 and RMSE both showed dramatic increase in accuracy, however beyond Model18, the results started to saturate. The results also show that using only the  Figure 11 summarizes the fitting R 2 , RMSE, and rRMSE values of each model computed. As mentioned, a forward stepwise selection method was implemented to determine how the model behaves with the addition of each variable. For the RFR, clearly the addition of each variable enhances both the fitting R 2 and RMSE. The fitting R 2 and RMSE both showed dramatic increase in accuracy, however beyond Model 18 , the results started to saturate. The results also show that using only the second variable (CHM avg ) can perform estimation fairly well (fitting R 2 = 0.173, RMSE = 105.09 m 3 /ha, rRMSE = 20.6%), from which we can presume that the volume is contributed more by the vertical structure (i.e., tree height). Variables related to canopy area (Model 6-8 ) show higher errors when used alone, which is understandable since the volume is difficult to estimate only from the horizontal information. There are differences of results for each variables, though in general it can be said that multiple regression performs better than single variable regression. SVR showed a similar trend of reducing errors in RMSE with increasing variables, although not much improvement was seen for the fitting R 2 . The RMSE showed lower errors when all variables were combined (Model SVR23 : fitting R 2 = 0.126, RMSE = 108.01 m 3 /ha, rRMSE = 19.92%), however it achieved a better result just by using the second variable (Model SVR2 : fitting R 2 = 0.519, RMSE = 80.12 m 3 /ha, rRMSE = 12.67%). Some combinations, such as from Model SVR15 , failed to delineate the information from each variable, resulting in low fitting R 2 (negative value) and high RMSE and rRMSE, but gradually improve towards the full model. SVR can have better chances using the single influential variable and needs care when selecting multiple variables. (a) (b) Figure 11. Summary of the results for all regression analyses tested. Each result for the correlation of determination (R 2 ), root mean square error (RMSE), and relative RMSE (rRMSE) is shown for (a) RFR and (b) SVR. Only positive R 2 is drawn and the models and without R 2 plots indicates all negative fitting R 2 , hence erroneous models.

Importance and Significance of Variables (RFR)
The most important predictive variable for estimating volume (ModelRFR23) was CHMmin, shown in Figure 12 for the Gini importance measure. The variables following were CHMavg, FCC, CAavg, CAmin, CAmax and CHMmax, which were all indicated to be significant for partitioning the data (red bars). The additional models ModelUAS, ModelSAR, and ModelUAS+SAR are shown also. For ModelUAS+SAR, the UAS variables show that the influencing parameter for the estimation is significant, while the SAR variables, although shown to have some influence, were not significant. Using both variable sets on their own (ModelUAS and ModelSAR) showed that all the variables were significant in partitioning the data. The ranking of the ModelSAR variables changed from ModelRFR23, where γ 0 VV influenced more than γ 0 VH. Figure 11. Summary of the results for all regression analyses tested. Each result for the correlation of determination (R 2 ), root mean square error (RMSE), and relative RMSE (rRMSE) is shown for (a) RFR and (b) SVR. Only positive R 2 is drawn and the models and without R 2 plots indicates all negative fitting R 2 , hence erroneous models.

Importance and Significance of Variables (RFR)
The most important predictive variable for estimating volume (Model RFR23 ) was CHM min , shown in Figure 12 for the Gini importance measure. The variables following were CHM avg , FCC, CA avg , CA min , CA max and CHM max , which were all indicated to be significant for partitioning the data (red bars). The additional models Model UAS , Model SAR , and Model UAS+SAR are shown also. For Model UAS+SAR , the UAS variables show that the influencing parameter for the estimation is significant, while the SAR variables, although shown to have some influence, were not significant. Using both variable sets on their own (Model UAS and Model SAR ) showed that all the variables were significant in partitioning the data. The ranking of the Model SAR variables changed from Model RFR23 , where γ 0 VV influenced more than γ 0 VH .  Red color bars indicates the parameter that significantly partitioned the data.

Challenges of Multiple Regression Analysis with Multi-Sensor Data
Both the RFR and SVR methods showed good prediction of the stem volume. Our method resulted in fitting R 2 = 0.665, RMSE = 66.87 m 3 /ha (rRMSE = 11.95%) for RFR and R 2 = 0.519 and RMSE = 80.12 m 3 /ha (rRMSE = 12.67%) for SVR. They could perform better compared to the predictive power of multilinear regression (MLR) [69][70][71]; in contrast to other works, this work focused on a wide range of volumes, from approximately 100 to 800 m 3 /ha. If we look at a case with a stem volume range similar to that of Adbullahi et al. [71] (up to 1049 m 3 /ha), this work produces a better plot level estimate (although it focuses on mixed forests and the plot size is different: 500 m 2 circular plots) compared to using only X-band SAR data with its best accuracy of 155 m 3 /ha (rRMSE = 41.90%). The author discusses the possible saturation effect of the radar signals limiting the prediction. SAR saturation effects are always in discussion, and this could be caused from the large spatial unit aggregating the variations in backscattering signals (other than the limitation of radar frequency). For example, Iizuka et al. [72] tested the NDVI response to FCC with different grid resolutions. With coarser resolution, the range of the NDVI became more limited, indicating that variations were aggregated in larger units. Similar results might be shown with SAR, which limits extracting the feature information at a limiting area with lower resolution data. It was not clear in our results whether such effects were causing limitations, nevertheless, they should be noted when SAR is used in dense forest regions. This work also shows improvements compared to the works by Shataeea et al. [73] by using airborne laser scanner (ALS) and Landsat-TM data at plot level (plot size of the grid of TM data) with RF yielding 179.39 m 3 /ha (rRMSE = 42.93%), expressing that by adding the third dimension (i.e. tree height) data to the variable can enhance in the estimation. This was done by adding CHM information in our work. Although the ALS can produce data at a larger spatial extent, the footprints of laser scanning density may result in coarser tree information (e.g. crown delineation), although this can be overcome when utilizing UAS information. Unfortunately, similar work with the same observational scale and the integration of a multi-sensor approach was not found. Therefore, it is difficult to directly compare our results with others. Navarro et al. [40] showed the integration of UAS and Sentinel-1 and -2 for above ground biomass (AGB) estimation of mangroves in Senegal. Although it is expressed as integration, the UAS-based measurements were used as a reference for estimation with Sentinel data, and only Sentinel-1 and -2 were used. Their SVR approach using both data sets shows slightly lower validation RMSE (RMSE = 2.35 Mg/ha) compared with using only Sentinel-1 (RMSE = 2.22 Mg/ha), but improved from using only Sentinel-2 data (RMSE = The upper shows more influential variables used in the random forest model for estimating stem volume. Red color bars indicates the parameter that significantly partitioned the data.

Challenges of Multiple Regression Analysis with Multi-Sensor Data
Both the RFR and SVR methods showed good prediction of the stem volume. Our method resulted in fitting R 2 = 0.665, RMSE = 66.87 m 3 /ha (rRMSE = 11.95%) for RFR and R 2 = 0.519 and RMSE = 80.12 m 3 /ha (rRMSE = 12.67%) for SVR. They could perform better compared to the predictive power of multilinear regression (MLR) [69][70][71]; in contrast to other works, this work focused on a wide range of volumes, from approximately 100 to 800 m 3 /ha. If we look at a case with a stem volume range similar to that of Adbullahi et al. [71] (up to 1049 m 3 /ha), this work produces a better plot level estimate (although it focuses on mixed forests and the plot size is different: 500 m 2 circular plots) compared to using only X-band SAR data with its best accuracy of 155 m 3 /ha (rRMSE = 41.90%). The author discusses the possible saturation effect of the radar signals limiting the prediction. SAR saturation effects are always in discussion, and this could be caused from the large spatial unit aggregating the variations in backscattering signals (other than the limitation of radar frequency). For example, Iizuka et al. [72] tested the NDVI response to FCC with different grid resolutions. With coarser resolution, the range of the NDVI became more limited, indicating that variations were aggregated in larger units. Similar results might be shown with SAR, which limits extracting the feature information at a limiting area with lower resolution data. It was not clear in our results whether such effects were causing limitations, nevertheless, they should be noted when SAR is used in dense forest regions. This work also shows improvements compared to the works by Shataeea et al. [73] by using airborne laser scanner (ALS) and Landsat-TM data at plot level (plot size of the grid of TM data) with RF yielding 179.39 m 3 /ha (rRMSE = 42.93%), expressing that by adding the third dimension (i.e., tree height) data to the variable can enhance in the estimation. This was done by adding CHM information in our work. Although the ALS can produce data at a larger spatial extent, the footprints of laser scanning density may result in coarser tree information (e.g., crown delineation), although this can be overcome when utilizing UAS information. Unfortunately, similar work with the same observational scale and the integration of a multi-sensor approach was not found. Therefore, it is difficult to directly compare our results with others. Navarro et al. [40] showed the integration of UAS and Sentinel-1 and -2 for above ground biomass (AGB) estimation of mangroves in Senegal. Although it is expressed as integration, the UAS-based measurements were used as a reference for estimation with Sentinel data, and only Sentinel-1 and -2 were used. Their SVR approach using both data sets shows slightly lower validation RMSE (RMSE = 2.35 Mg/ha) compared with using only Sentinel-1 (RMSE = 2.22 Mg/ha), but improved from using only Sentinel-2 data (RMSE = 3.74 Mg/ha). Similar to our results, the multiple regression of SVR is failing. The approach of optical and SAR is possible, but needs more investigation in higher density forests.
As far as we are aware, truly integrating multi-scale/multi-resolution data has not been seen until now. For comparison, two machine learning regression was applied. The results showed that the RFR method yielded improved estimates of the stem volume compared to SVR, depending on the variables used. Validation between the reference volume showed a good fit (fitting R 2 = 0.665 (∆R 2 = 0.146)), and the RMSE was much lower than that of the SVR approach (66.87 m 3 /ha (∆RMSE = 13.25 m 3 /ha)). The test was conducted in a small area with few sample plots (n = 33) and with a small spatial extent. It might not compensate with broad scale analysis (unless such fine resolution data is collected broadly), however the demands of quantifying at a local scale (and extreme site dependence) might be possible from applying the proposed method. Considering the cost-effectiveness, utilizing SAR information with the aid of UAS variables of tree height and diameter would be an improvement.

UAS Remote Sensing Variables
The variables used in this analysis were determined from the perspectives of the correlation with the volume information and the possibility of computation with a GIS platform. The canopy area has a relationship with stem DBH [34]; important information for the horizontal volume. Therefore, using this variable would have some prediction power. The same concept applies to CHM, which is a straightforward variable for tree height. Three metrics were determined for both variables: average, minimum, and maximum. Because the test samples were in a grid plot, there are variations in the canopy size and the height within the grid. Especially for canopy size, the segmentation result and the gridding in the later steps of the process removed some of the edge parts of some tree canopies, which might lead to errors in the neighboring grid where the edge of the canopy falls. The use of average, minimum, and maximum of canopy area should reduce errors, and it does show improved results when only using the average data ( Figure 10). FCC provides an estimate of the gaps within the tree stands. More canopy cover means that the density is high; moreover, there are more leaves; hence, the tree height and DBH trajectory differ [74]. Therefore, depending on the FCC, DBH, and tree height, the growth characteristics and the stem volume could differ. Such information could also be obtained indirectly by giving the FCC, and is expected to improve the model; it is in fact the variable ranked third in its influence ( Figure 12).

TLS Variables
The topographic factor, in this case, the RSP, can be considered to account for different wetness conditions in an area. Depending on the slope position, the soil water content changes, and the tree responses and processes, e.g., transpiration, differ [75]. Although the correlation matrix shows relation to volume, the addition of RSP did not result in substantial improvement in the model, but it could be used in other related works when site productivity is considered in the analysis, since wetness conditions are correlated with the growth of the cypress trees [75].

Radar Variables
For the radar images, studies have shown that the L-band wavelength has a stronger relationship with stems, whereas the C-band wavelength interacts more with the canopy [76,77]. The RFR and SVR model were developed by using only γ 0 HH , γ 0 HV , γ 0 VV , and γ 0 VH to estimate the volume, and the RFR resulting had moderate accuracy. The results indicate that it is possible to use the RFR approach for plot level analysis, even by using only SAR variables. Implementing SAR data is still a major candidate for global estimations for its advantages of weather and daylight independence. But it has been shown in this work that potentials can be seen for assessing local (plot-level) analysis by incorporating finer detailed information, such as from UASs. Incorporating the PALSAR and Sentinel-1 data can also produce a stronger prediction because of the incorporation of information from both the bottom and top layers of the forests [78]. Including the third variable for the forest structure (e.g., tree height) should improve its predictions. γ 0 HH was correlated with CHM min , γ 0 HV and γ 0 VH was correlated with CHM min and CHM avg . γ 0 VV was the only one correlating with all CHM variables. It could be noted that C-band co-polarization response is affected more by the vertical structure of objects, and thus has a similar contribution to CHM. The cross-polarized variable usually interacts with volume, however it was not correlated significantly, and only γ 0 VV did so. The stem volume of Chamaecyparis obtusa is influenced more by tree height, thus the γ 0 VV correlating to all CHM variables corresponded also to volume. The complex local topography could have affected the backscattering information, which could not be as expected from other works related to L-band estimation (especially at plot level), and moreover scale difference of data resolution might have affected characterization of the local backscattering trend (brief explanation in Section 5.3). Considering the fine spatial/temporal resolution of Sentinel-1 and its availability free to the public, it can be recommended for further investigations.

Model Errors
The general issue associated with machine learning methods is always a problem such as overfitting. Extreme tight fitting of the model can fail to apply at different sites, even though the model shows a good estimation [79]. This work does not consider the application of extending the estimation to other areas, however the methodology needs further improvements and verification if forested regions need to be quantified at a broader scale. Jin et al. [79] developed a model using the random forests machine learning method to predict the tree heights for different forest types across the US using optical satellite data, topographic data, and climatic variables. They noted that training the model at one specific site and applying it to another area was not possible, whereas to some extent, better results were obtained if the model was trained using all of the samples across different sites with different vegetation types. Such a universal model improved the estimation across the US and different vegetation types. Therefore, the model in this study may not produce reliable estimates for other areas unless more training data are collected throughout the nation for a comprehensive model. Nevertheless, the prediction power of RFR was as expected and proved that it results are better than those of SVR. A key future advance would be if the RFR or the SVR method were used to develop a universal model for estimation over larger areas. Our future work will examine this potential. Possibilities include utilizing UAS LiDAR for broader and precise collection of dense point clouds while utilizing TLS for independent ground truthing of forest attributes. The combination of these two tools should enhance the methodology and produce advancements in precision forestry.

Scale Difference of Multi-Sensor Approach
Scale issues of remote sensing approaches are already taking attention and are emphasized more with advances in technology [80]. The most challenging aspect of this work is integrating platforms with various observational scales (measurement unit) [80]. The millimeter scale of TLS, centimeter scale of UAS, and 10-25 m scale of SAR can be recognized with the Modifiable Areal Unit Problem (MAUP) [81]. TLS and UAS can sense the individual tree level, while SAR data corresponding to the plot size of this study (Sentinel-1), or larger units (PALSAR-2) could have resulted with discrimination of different patterns of forest properties [82,83]. The correlation matrix showed higher correlation with Sentinel-1 and lower for PALSAR-2, and the influencing variables of model development emphasized the finer resolution Sentinel-1 VH and VV ranking higher then PALSAR-2 HH and HV data. The observational scale of the SAR data was perhaps lacking to the modeling scale [80] (10 m plot), where information of such as the aggregated PALSAR-2 data did not ensure enough features for correctly delineating the forest properties. In other words, this challenging work of multi-sensor/multi-scaled approaches could be improved with finer resolution SAR data. Navarro et al. [40] already confirms this with their estimation of UAS-based AGB with Sentinel-1 and -2 data, but in a area of sparse vegetation. Further potential uses will be investigated for densely forested areas in Japan in the future.

Data Processing of Point Clouds
Two issues can be discussed regarding processing and data acquisition. One is where TLS data were utilized to collect ground truth information of the stem volume of each grid area. Future challenges will increase the need for automated processes in sampling methods to increase the efficiency and consistency of measurements. The manual sampling of individual trees in dense point clouds is time consuming and requires expertise in handling the point cloud to correctly allocate each tree. However, compared with in situ observations, utilizing 3D spatial data can improve the time efficiency [84]. The motivating idea behind the collection of data with TLS was the assumption that TLS could be used to collect forest attributes as an alternative to destructive methods [27]. In fact, the sampling was successful, as evidenced by the relationship curves between DBH and tree height. No obvious outliers were found, because this study focused on a subwatershed area and the trees were correctly interpreted as evenly aged stands. The second issue was the scanning distance and the angle of the TLS. Because the returns of the laser become coarser and decrease in number with increasing distance, the structure of the canopy lacks detail. Changing the settings of the TLS to higher resolution could increase the point density [85], although there is always a tradeoff between resolution and observation time. However, the lack of canopy information makes it difficult to estimate the tree structure (tree height); hence the point cloud generated by the SfM method was used to integrate with the TLS point clouds. Points can be measured from the ground with TLS, and UAS photogrammetry can provide additional information on the tops of the canopies [86]. Moreover, the TLS data were provided with geographical coordinates because the SfM model already had coordinate information that was collected by the GNSS embedded in the UAS platform. The georeferenced results can then be combined with other data, such as satellite images. It was possible to merge the data sets at this site because the flux tower acts as a reference object in the matching of the two models; however, further consideration would be needed for other sites that lack similar reference objects.

Beyond Precision Forestry
Advances in the method for quantifying forest resources are still in progress. Even with this work's satisfying results in improving the estimation, it will not conclude work towards this goal. In fact, goals are rather dependent on objectives. It is meaningless to follow the approach of a local scale method to analyze a global trend, or the approach of global estimation for local managements. Selecting the appropriate one is necessary, and this work contributes to advances in the approach to the local scale, possibly up to the landscape scale. Remote sensing methods can be cost effective, less time consuming and safer, compared to direct ground surveys [7]. In the case of Japan, concerns are rising concerned with reducing labor and the ageing of workers, resulting in the devastation of abandoned or unmanaged forest areas [87]. The proposed method can shed light on such issues, so that proper policies can be implemented to correctly and strategically manage the forest resources by sustainably surveying the forest with modern technologies.

Conclusions
This work experimented with the possibilities for estimating the plot base stem volume using random forest regression (RFR) and support vector regression (SVR), incorporating multi-sensor remote sensing data sets from TLS, UAS photogrammetry, and SAR. The RFR produced better results compared with SVR, resulting in a fitting R 2 = 0.665, RMSE = 66.87 m 3 /ha, and rRMSE =11.95%. The SVR approach showed a good correlation when a single variable was used (fitting R 2 = 0.519, RMSE = 80.12 m 3 /ha and rRMSE =12.67%), however its accuracy was reduced when multiple variables were combined (fitting R 2 = 0.126, RMSE = 108.01 m 3 /ha and rRMSE =19.92%). TLS data reveals the precise ground truth information of stem volume and terrain data. Ultra-high resolution UAS imagery has contributed to delineating forest attributes at the individual stand level, and SAR backscattering corresponded mainly with the vertical structure of the forests. The Gini index has presented the most influential variable for RFR modeling with UAS variables (canopy height, canopy size, and canopy cover), ranked by CHM min , CHM avg , FCC, CA avg , CA min , CA max , and CHM max . Although the case study was set in a small area, the experiment showed the potential for the incorporation of conventional remote sensing data to outperform traditional methods, utilizing only LiDAR, UAS photogrammetry, or satellite data with RFR. The proposed method could provide important information on upcoming trends in precision forestry, as various global activities need strategic actions in response to REDD+, the Paris Agreements, and the SDGs. Future challenges will be considered in the development of a comprehensive model for estimating forest parameters over broader regions and potentially at a landscape scale.