Estimation of Forest Aboveground Biomass of Two Major Conifers in Ibaraki Prefecture, Japan, from PALSAR-2 and Sentinel-2 Data

: Forest biomass is a crucial component of the global carbon budget in climate change studies. Therefore, it is essential to develop a credible way to estimate forest biomass as carbon stock. Our study used PALSAR-2 (ALOS-2) and Sentinel-2 images to drive the Random Forest regression model, which we trained with airborne lidar data. We used the model to estimate forest aboveground biomass (AGB) of two signiﬁcant coniferous trees, Japanese cedar and Japanese cypress, in Ibaraki Prefecture, Japan. We used 48 variables derived from the two remote sensing datasets to predict forest AGB under the Random Forest algorithm, and found that the model that combined the two datasets performed better than models based on only one dataset, with R 2 = 0.31, root-mean-square error ( RMSE ) = 54.38 Mg ha − 1 , mean absolute error ( MAE ) = 40.98 Mg ha − 1 , and relative RMSE ( rRMSE ) of 0.35 for Japanese cedar, and R 2 = 0.37, RMSE = 98.63 Mg ha − 1 , MAE = 76.97 Mg ha − 1 , and rRMSE of 0.33 for Japanese cypress, over the whole AGB range. In the satellite AGB map, the total AGB of Japanese cedar in 17 targeted cities in Ibaraki Prefecture was 5.27 Pg, with a mean of 146.50 Mg ha − 1 and a standard deviation of 44.37 Mg ha − 1 . The total AGB of Japanese cypress was 3.56 Pg, with a mean of 293.12 Mg ha − 1 and a standard deviation of 78.48 Mg ha − 1 . We also found a strong linear relationship with between the model estimates and Japanese government data, with R 2 = 0.99 for both species and found the government information underestimates the AGB for cypress but overestimates it for cedar. Our results reveal that combining information from multiple sensors can predict forest AGB with increased accuracy and robustness. the saturation point at 175 Mg ha − 1 , since the HV values leveled off at this point. Nevertheless, when the AGB reached 235 Mg ha − 1 , the slope continued to increase. The cause of this pattern is unclear; it may have resulted from a relatively small number of samples at high AGB, and the uncertainty of the AGB accuracy rise.


Introduction
Forests play a significant role in the global carbon budget, as they store a large share of terrestrial carbon in their biomass [1]. About 90% of the total carbon in the world's vegetation stock comprises forests, which cover 65% of the land area [2]. The forest aboveground biomass (AGB) is therefore considered one of the most important factors in evaluating forest carbon pools [3]. To better understand the amount of stored carbon in forest, spatially explicit and temporally consistent estimates of AGB are urgently needed [4]. Field biometric studies to quantify AGB, usually using the diameter at breast height (DBH) and tree height as inputs for allometries based on destructive sampling, have provided simple and useful models, but constructing reliable allometric relationships over large areas is difficult, time-consuming, and expensive [5].
Although Random Forest estimates AGB well, it tends to overestimate AGB at low values of AGB and underestimate it at high values [35]. Despite these limitations, many researchers have estimated forest AGB in different countries by using SAR and optical satellite datasets, including Mexico, China, Russia, the USA, and Cameroon [11,[36][37][38][39]. However, few areas have been studied using Random Forest in Japan [40]. To provide more data on this approach, we selected two species of forest tree as our targets. Japanese cedar (Cryptomeria japonica) and Japanese cypress (Chamaecyparis obtusa) play important roles in Japanese forest ecosystems, and cover 28% of the forested area in Japan, which is equivalent to 19% of Japan's land surface [41].
Our objectives here were to: (1) assess the potential of combining two types of satellite data (SAR and optical sensors) to improve AGB estimation performance; (2) estimate the spatial extent of forest AGB for two major forest types in northern Ibaraki Prefecture, Japan; and (3) benchmark the AGB estimates using forest register data collected by the Ibaraki Prefecture government.

Study Area
We focused on plantations of two forest tree species, both in the cypress family (Cupressaceae), growing in Japan's Ibaraki Prefecture, central Japan. The prefecture has an active forest industry, supported by C. japonica and C. obtusa. These are major plantation species throughout Japan, occupying 4.44 × 10 6 and 2.60 × 10 6 ha of forest (equivalent to 18% and 10% of the forest area in Japan), respectively [41]. Both are important timber resources, and are also associated with public functions such as conservation of natural land, prevention of global warming, and recharge of water sources.
Our study area was located in northern Ibaraki Prefecture, which is the prefecture's main forest area. The southern part of the prefecture is dominated by agricultural land with almost no forests ( Figure 1). The regional average elevation is 22 m above sea level and the highest point is 1021 m, with rugged terrain; the target forests are distributed mainly in mountainous topography.
Remote Sens. 2022, 14, x FOR PEER REVIEW 3 of 23 plausible strategy for combining variables from different datasets. This approach has been used successfully in many cases with different combinations of satellite datasets [32][33][34]. Although Random Forest estimates AGB well, it tends to overestimate AGB at low values of AGB and underestimate it at high values [35]. Despite these limitations, many researchers have estimated forest AGB in different countries by using SAR and optical satellite datasets, including Mexico, China, Russia, the USA, and Cameroon [11,[36][37][38][39]. However, few areas have been studied using Random Forest in Japan [40]. To provide more data on this approach, we selected two species of forest tree as our targets. Japanese cedar (Cryptomeria japonica) and Japanese cypress (Chamaecyparis obtusa) play important roles in Japanese forest ecosystems, and cover 28% of the forested area in Japan, which is equivalent to 19% of Japan's land surface [41].
Our objectives here were to: (1) assess the potential of combining two types of satellite data (SAR and optical sensors) to improve AGB estimation performance; (2) estimate the spatial extent of forest AGB for two major forest types in northern Ibaraki Prefecture, Japan; and (3) benchmark the AGB estimates using forest register data collected by the Ibaraki Prefecture government.

Study Area
We focused on plantations of two forest tree species, both in the cypress family (Cupressaceae), growing in Japan's Ibaraki Prefecture, central Japan. The prefecture has an active forest industry, supported by C. japonica and C. obtusa. These are major plantation species throughout Japan, occupying 4.44 × 10 6 and 2.60 × 10 6 ha of forest (equivalent to 18% and 10% of the forest area in Japan), respectively [41]. Both are important timber resources, and are also associated with public functions such as conservation of natural land, prevention of global warming, and recharge of water sources.
Our study area was located in northern Ibaraki Prefecture, which is the prefecture's main forest area. The southern part of the prefecture is dominated by agricultural land with almost no forests ( Figure 1). The regional average elevation is 22 m above sea level and the highest point is 1021 m, with rugged terrain; the target forests are distributed mainly in mountainous topography.   Figure 2 illustrates the work flow we used for the AGB estimation, which comprises: (1) satellite data collection and preprocessing (resampling, application of a unified coordinate system, filtering, and image clipping); (2) extraction of landscape textures from PALSAR-2 data; (3) computation of indices derived from the satellite images (e.g., the HV and HH polarization ratios; vegetation indices such as NDVI, EVI); (4) model development (selection of the optimal variables, tuning of the hyperparameters); and (5) mapping and estimation of the AGB of the two species in the targeted cities in Ibaraki Prefecture.  Figure 2 illustrates the work flow we used for the AGB estimation, which comprises: (1) satellite data collection and preprocessing (resampling, application of a unified coordinate system, filtering, and image clipping); (2) extraction of landscape textures from PALSAR-2 data; (3) computation of indices derived from the satellite images (e.g., the HV and HH polarization ratios; vegetation indices such as NDVI, EVI); (4) model development (selection of the optimal variables, tuning of the hyperparameters); and (5) mapping and estimation of the AGB of the two species in the targeted cities in Ibaraki Prefecture.

Forest AGB Observed by Airborne Lidar
Airborne lidar data obtained from the Ibaraki Prefecture government was utilized for training the Random Forest model. Ibaraki Prefectural Government preprocessed the airborne lidar product and generated the stem volume through the following procedures: (1) collecting and using 40 human measured points (20 for each forest species), each point covering 0.04 ha, for ground-based calibration to evaluate the accuracy of the airborne lidar data related to stem volume calculation, (2) collecting airborne lidar data in northern Ibaraki Prefecture on 31 July 2020,

Forest AGB Observed by Airborne Lidar
Airborne lidar data obtained from the Ibaraki Prefecture government was utilized for training the Random Forest model. Ibaraki Prefectural Government preprocessed the airborne lidar product and generated the stem volume through the following procedures: (1) collecting and using 40 human measured points (20 for each forest species), each point covering 0.04 ha, for ground-based calibration to evaluate the accuracy of the airborne lidar data related to stem volume calculation, (2) collecting airborne lidar data in northern Ibaraki Prefecture on 31 July 2020, (3) determining the values of parameters related to the stem biomass calculation calibrated by ground measured plots (i.e., tree species, tree height, and diameter at breast height [DBH]; Table 1 and (4) calculating the stem volume from the tree height and DBH using the conventional allometric equations for these species in Japan [42]. The accuracy of the forest parameters used in stem volume calculation are evaluated by the root mean square error (RMSE) with the 40 fields measured data mentioned above; the accuracy of the parameters is shown in Table 2.
where N is the number of validation plots collected in the field. Y i refers to the field measured parameters. Y i refers to lidar-based predicted parameters in the corresponding position i. The maps of volume for each of the two forest species were generated in the lidar covered area and then converted into AGB values at a 20-m mesh size using a biomassvolume equation with a biomass expansion factor and the tree volume and density by Equation (2) [43]. AGB values are presented as Mg ha −1 . It is important to note that only cedar and cypress were considered in the AGB calculation; this is acceptable because we focused on plantations, which are essentially single-species forests. We obtained 201,854 airborne lidar AGB samples for cedar and 69,374 for cypress and used these data as the modeling samples in our subsequent analysis.
where V is the volume, WD is wood density and BEF is biomass expansion factor [43]. The Advanced Land Observing Satellite-2 (ALOS-2) is a follow-on mission from ALOS. ALOS-2 has the Phased Array L-band Synthetic Aperture Radar-2 (PALSAR-2), a microwave sensor that can observe, day and night, under any weather conditions. Here, we obtained the 25-m PALSAR2 L-band global mosaic data from May 2019 from the Japan Aerospace Exploration Agency (JAXA; https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/ data/index.html, accessed on 11 June 2021). JAXA preprocessed the PALSAR-2 data, including geometrical calibration against the AW3D30 digital elevation model after 2019. The original SAR data for Japan used the highly sensitive Beam Quad mode, which provides Remote Sens. 2022, 14, 468 6 of 21 full polarizations, including HH, HV, VV, and VH. The PALSAR-2 signal can be converted into gamma naught backscattering coefficients by using the following equation: where γ 0 is the backscattering coefficient (gamma naught), DN is the digital number value of each pixel, and CF is the calibration factor, −83 [44]. Moreover, we applied a LEE speckle filter with a kernel window size of 3 × 3 to smooth the images [45]. Before LEE filtering, the radar images were also averaged using a 3 × 3 pixel mean filter to reduce the effect of speckle and spatial heterogeneity of the forest stands and to alleviate the problem of noise from dark spots [24]. Because the plot boundaries of airborne lidar samples may overlie several pixels, using a 3 × 3 window improved performance compared with the single-pixel extraction method [46]. In addition to correcting for backscatter, we calculated the radar vegetation indices for different polarizations and calculated the texture information for HV and VH using a gray-level co-occurrence matrix (GLCM) with a 3 × 3 window size and with a relative displacement vector (d = 1, θ = 45 • ). The displacement vector explains the spatial distribution of the level pairs separated by d with direction θ [47]. In AGB estimation, the GLCM-derived texture is considered as a kind of predictor that can improve the accuracy of estimation [48]. The texture information can also enlarge the saturation range between AGB and satellite images [49]. We adopted eight popular texture parameters for the VH and HV polarization: mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation. Table 3 summarizes the SAR-derived variables used for modeling. Table 3. List of variables from the PALSAR-2 data. In the texture calculations, h represents high of row number, k represents column number of image window and m hk refers the value in the cell h, k of the image window.  [51] HV + HH I3 [52] (HH − HV)/(HV + HH) I4 [53] HV/HH I5 [50] VH − VV I6 [51] VH + VV I7 [52] (VH − VV)/(VH + VV) I8 [53] VH/VV I9 [54] 8 × HV/(HH + VV + 2 × HV) Texture (HV, VH) Sentinel2-MSI data collected from the European Space Agency (ESA: https://scihub. copernicus.eu/dhus/, accessed on 11 June 2021) were used as optical satellite variables to drive the Random Forest model. Because the optical satellite sensor is easily affected by clouds owing to the wavelengths it uses, we selected data from days with low cloud cover and as close as possible to the time of the AGB data sample collection. From the remaining data, we acquired L2-A level data that had been preprocessed by the ESA, including atmospheric correction and scene classification to L1-B data. The image was acquired on 8 May 2019. The Sentinel-2 MSI sensor provides multispectral data with a spatial resolution ranging from 10 to 60 m. We excluded the 60-m data in this study. We also averaged the Sentinel2 images using a 3 × 3 pixel mean filter to extract the values. We then computed the vegetation indices from the Sentinel-2 data (Table 4) and used those data for modeling.

Extraction of Satellite Images Values from Forest AGB Plots
The AGB plots and satellite images were first unified into the Universal Transverse Mercator (UTM) coordinate system (zone 54 N), with datum of WGS84. Then all of the satellite images were resampled in 20 m resolution using bilinear convolution to meet the resolution of airborne Lidar metric. The geometric center of every airborne Lidar plot was represented as the position of the AGB and extracted the corresponding values of all predictors from the satellite images. Finally, a total of 48 predictors were utilized in a regression model for our analysis: 10 Sentinel-2 MSI spectral bands, 9 Sentinel-2 MSI-derived vegetation indices, 4 ALOS-PALSAR-2 radar backscatter coefficient bands, 16 texture information variables (8 textures each for VH and HV respectively), and 9 radar backscatter coefficient-derived indices.

Random Forest Regression
Modeling datasets were randomly split into 80%, 10%, and 10% bins for training, validation, and testing samples, respectively, using the train_test_split function in the sklearn package for the Python language.
Random Forest predicts AGB from the remote-sensing predictors by growing many decision trees and averaging every result for each tree. We not only performed inversion modeling for each type of remote sensing data, but also identified (through filtering) the best performing variables for each tree species in this model according to the impuritybased feature importance of the variables, and then we combined the selected variables and used them to process the Random Forest model again. This filtering is necessary because the presence of many redundant variables contributes little to the model, it results in the inclusion of repetitive information, and increases the complexity of the model. For each experiment, we assessed the accuracy of the predictions using the testing data, and then tuned the hyperparameters using the validation data.
Four error statistics were selected to evaluate the model's performance: the root-meansquare error (RMSE) in Equation (2), the coefficient of determination (R 2 ) in Equation (3), the mean absolute error (MAE) in Equation (4), and the relative RMSE (rRMSE) in Equation (5) as RMSE divided by the mean of the observed AGB values. In the comparison between RMSE and MAE, RMSE is harder to interpret and is more sensitive to outliers than MAE. However, a detailed interpretation is not critical, because variations of the same model will have similar error distributions. Therefore, RMSE is more appropriate as a loss function to tune the hyperparameters for the model as in our case [64]. However, it is still necessary to use MAE together with RMSE to evaluate the variation of model errors [65]. Overall, lower values of RMSE, rRMSE, and MAE and higher R 2 indicate better performance of a model. In addition, the smaller the difference between RMSE and MAE, the smaller the variance between errors will be.
where N is the number of observed values, Y i is the observed AGB value for observation i, Y i is the predicted AGB value, and Y is the mean of the observed AGB values. Even though many variables have potential value for estimating AGB, not all are available to be used in the modeling owing to high inter-variable correlation or weak relationships with AGB [66].
Including such variables provides little improvement of accuracy, although it may increase model flexibility. To eliminate the least useful variables, we used the impurity-based feature importance for each variable: the higher the impurity, the more critical the feature. We computed the importance of a feature as the (normalized) total reduction of the criterion; here, we used the mean squared error (MSE) as the criterion brought by that feature, which is also known as the Gini importance.

Determination of the Saturation Level
The saturation level for an individual tree species is crucial for evaluating the estimation result. We defined the AGB saturation level as occurring where a clear pattern of AGB leveling was found in the logarithmic regression slope in a plot of the HV backscatter coefficient against AGB since longer wavelength L-bands with HV backscatter are identified as the most sensitive polarizations to AGB [67]. This approach has been used in previous studies to reveal the relationship between AGB and satellite images and the Remote Sens. 2022, 14, 468 9 of 21 model's performance [68][69][70]. Since we used a large sample size in our study, it is difficult to accurately determine the location of the saturation point. Therefore, we adopted an interval sampling method in which we counted the average value of the data points in bins equal to 5 Mg ha −1 in size as the AGB, and removed points with a value greater than 250 Mg ha −1 from the calculation range. Such a kind of approach was accessed in previous research with a huge number of AGB samples [49,71]. Finally, we estimated the saturation levels for each species by examining the slope within every interval (in units of 5 Mg ha −1 ) with the HV backscatter coefficient. Saturation points in the scatterplot were defined as the points where the slope of each AGB interval was less than 0.1 or where it starts to change in a very disorderly way.

Evaluation of Forest Resources
After running the models, the best performance model with the highest accuracy was utilized to map the AGB of Japanese cedar and cypress in several target cities with a large area of plantations of the two forest species. The identification of forest area is very crucial for AGB mapping, because mismatched forest distribution maps will cause large estimation errors derived from mismatched estimation models and wrong corresponding forest area. The forest distribution map from the Ibaraki Prefecture government was selected to classify the tree species distribution in our study so that an accurate AGB map could be generated that followed the same standard as the Ibaraki Prefecture AGB map [72]. Finally, we compared the satellite-based AGB map with the forest registered map from Ibaraki Prefecture to evaluate the AGB in the targeted cities. Figure 3 shows the relationship between the HV backscattering coefficient and AGB. AGB leveled off at a slope of 0.01 dB for the cedar, which represented an AGB of 105 Mg ha −1 . However, it was difficult to determine the saturation point for cypress by this method since the slope showed high variation. We defined the saturation point at 175 Mg ha −1 , since the HV values leveled off at this point. Nevertheless, when the AGB reached 235 Mg ha −1 , the slope continued to increase. The cause of this pattern is unclear; it may have resulted from a relatively small number of samples at high AGB, and the uncertainty of the AGB accuracy rise.

Determination of the AGB Saturation Level
Remote Sens. 2022, 14, x FOR PEER REVIEW Figure 3 shows the relationship between the HV backscattering coefficient a AGB leveled off at a slope of 0.01 dB for the cedar, which represented an AGB o ha −1 . However, it was difficult to determine the saturation point for cypress by thi since the slope showed high variation. We defined the saturation point at 175 since the HV values leveled off at this point. Nevertheless, when the AGB reache ha −1 , the slope continued to increase. The cause of this pattern is unclear; it m resulted from a relatively small number of samples at high AGB, and the unce the AGB accuracy rise.

Development of the Random Forest Model
In the Random Forest model, the importance measures for the variables are by the number of variable categories and the measurement scale of the predictor [73]. Determining the optimal feature space is an important step for model deve Increasing the number of variables might lead to a high time requiremen calculations despite a low increase of accuracy [74]. Consequently, we div variables into two parts (the PALSAR-2 group and the SENTINEL2-MSI gro

Development of the Random Forest Model
In the Random Forest model, the importance measures for the variables are affected by the number of variable categories and the measurement scale of the predictor variables [73]. Determining the optimal feature space is an important step for model development. Increasing the number of variables might lead to a high time requirement for the calculations despite a low increase of accuracy [74]. Consequently, we divided the variables into two parts (the PALSAR-2 group and the SENTINEL2-MSI group) and selected the optimal variables on the basis of their importance values. We selected the variables whose importance was greater than 0.05 to interact together and assessed the model again. Figure 4 shows the importance results. The most important variables for cypress (i.e., the variables with a Gini importance greater than 0.05) were VH mean, VH variance, HV variance, VH correlation, and HV correlation (Figure 4a), and Band 5, Band 9, Band 8a, Band 11, SR, NDVI, and Band 12 (Figure 4b). The most important variables for cedar were VH mean, VH variance, HV mean, and HV variance (Figure 4c) and Band 12, Band 4, Band 9, Band 11, Band 5, Band 8a, and Band 6 ( Figure 4d).  Random Forest algorithm was run repeatedly to obtain the optimal hyperparameters in each PALSAR-2-based model, each Sentinel2-MSI-based model, and the model that combined the two datasets. We chose four input hyperparameters to determine their optimal values in each model: the number of trees in the forest (EST), the maximum depth of the decision tree (MD), the minimum number of samples (MS) required to split in every internal node, and the minimum number of samples required to be at every leaf node (ML). We reserved 10% of the samples for use as the validation samples to determine their values from the RMSE score. Owing to the size of our dataset, we didn't perform crossvalidation using the validation datasets. We set the number of variables fed to each of the decision tree (MD), the minimum number of samples (MS) required to split in every internal node, and the minimum number of samples required to be at every leaf node (ML). We reserved 10% of the samples for use as the validation samples to determine their values from the RMSE score. Owing to the size of our dataset, we didn't perform cross-validation using the validation datasets. We set the number of variables fed to each predictor tree (named max_features in the sklearn package) to the square root of the number of input variables in every model [74], and when we optimized one hyperparameter, we set the others to their default value as MD = 10, ML = 1, MS = 2, and EST = 200. We optimized EST with the determined values of the other three hyperparameters in the last step. The lowest RMSE for the EST tuning indicated the best scores with the optimized hyperparameters. Figure 5 shows the results of the hyperparameter tuning. In every case, the combined model had the best performance. Therefore, we used it to estimate AGB in our subsequent analyses.
Remote Sens. 2022, 14, x FOR PEER REVIEW 12 of 23 hyperparameters in the last step. The lowest RMSE for the EST tuning indicated the best scores with the optimized hyperparameters. Figure 5 shows the results of the hyperparameter tuning. In every case, the combined model had the best performance. Therefore, we used it to estimate AGB in our subsequent analyses.

Model Accuracy Assessment
Testing data assessed the model's accuracy using our selected error statistical indicators. These testing data represented 10% of the overall sample, and excluded data used in model development to keep robustness. For cedar, the Random Forest algorithm was able to predict the AGB with Thus, it is necessary to retrieve the relationship between AGB and satellite images by separating for each tree species in the forest.

Model Accuracy Assessment
Testing data assessed the model's accuracy using our selected error statistical indicators. These testing data represented 10% of the overall sample, and excluded data used in model development to keep robustness. For cedar, the Random Forest algorithm was able to predict the AGB with R 2 = 0.31, RMSE = 54.38 Mg ha −1 , MAE = 40.98 Mg ha −1 , and rRMSE = 0.35 from the 20,186 test samples ( Figure 6). For cypress, it was able to predict the AGB with R 2 = 0.37, RMSE = 98.63 Mg ha −1 , MAE = 76.97 Mg ha −1 , and rRMSE = 0.33 from the 6938 test samples. For the two tree species, different variables from different remote sensing sensors were important in determining the model's performance.
Thus, it is necessary to retrieve the relationship between AGB and satellite images by separating for each tree species in the forest.

Mapping AGB
We generated AGB maps for Japanese cedar and Japanese cypress at 20 m resolutio using the Random Forest algorithm for the targeted cities in Ibaraki Prefecture. Appendi A compares the data from the Japanese forestry register record for these cities with th predictions of our model. The AGB ranged from 7.49 to 277.02 Mg ha −1 , for Japanese ceda and 85.35 to 492.28 Mg ha −1 for Japanese cypress in the targeted cities (Figure 7). Japanese cedar and cypress are the main tree species in Japan. Both are evergree coniferous trees native to Japan. In the overall statistics we analyzed, cypress had a highe AGB than cedar. In terms of the point where AGB saturation occurred according to th

Mapping AGB
We generated AGB maps for Japanese cedar and Japanese cypress at 20 m resolution using the Random Forest algorithm for the targeted cities in Ibaraki Prefecture. Appendix A compares the data from the Japanese forestry register record for these cities with the predictions of our model. The AGB ranged from 7.49 to 277.02 Mg ha −1 , for Japanese cedar and 85.35 to 492.28 Mg ha −1 for Japanese cypress in the targeted cities (Figure 7).

Mapping AGB
We generated AGB maps for Japanese cedar and Japanese cypress at 20 m resoluti using the Random Forest algorithm for the targeted cities in Ibaraki Prefecture. Append A compares the data from the Japanese forestry register record for these cities with t predictions of our model. The AGB ranged from 7.49 to 277.02 Mg ha −1 , for Japanese ced and 85.35 to 492.28 Mg ha −1 for Japanese cypress in the targeted cities (Figure 7). Japanese cedar and cypress are the main tree species in Japan. Both are evergre coniferous trees native to Japan. In the overall statistics we analyzed, cypress had a high AGB than cedar. In terms of the point where AGB saturation occurred according to t Japanese cedar and cypress are the main tree species in Japan. Both are evergreen coniferous trees native to Japan. In the overall statistics we analyzed, cypress had a higher AGB than cedar. In terms of the point where AGB saturation occurred according to the HV backscatter coefficient, cypress had a wider range of AGB values than cedar. We think this may be caused by differences in the physical structure of the two species: cypress has a higher biomass range and a larger DBH, which may have led to greater volume scattering, resulting in fluctuations in the relationship between the backscatter coefficient and AGB. Cypress had a much higher mean average AGB (293.12 Mg ha −1 ) than cedar (146.50 Mg ha −1 ). However, cedar had a lower standard deviation (44.37 Mg ha −1 ), with its AGB distributed mainly between 100 and 200 Mg ha −1 , whereas Japanese cypress had an AGB distribution with two peaks between 200 and 400 Mg ha −1 , with a higher standard deviation (78.48 Mg ha −1 ). In some previous research, AGB estimation based on the stratification of vegetation types greatly improved the performance [75,76]. We assessed the AGB estimation for two tree species, and found a significant difference in the AGB value distribution (Figure 8). However, the development of an AGB estimation model based on stratification of multiple tree species is more difficult because it requires additional data: (1) vegetation distribution maps for the targeted species, (2) ground-based AGB values classified by species, and (3) a sufficiently large sample size to build a robust model while still leaving data for testing and validation. HV backscatter coefficient, cypress had a wider range of AGB values than cedar. We think this may be caused by differences in the physical structure of the two species: cypress has a higher biomass range and a larger DBH, which may have led to greater volume scattering, resulting in fluctuations in the relationship between the backscatter coefficient and AGB. Cypress had a much higher mean average AGB (293.12 Mg ha −1 ) than cedar (146.50 Mg ha −1 ). However, cedar had a lower standard deviation (44.37 Mg ha −1 ), with its AGB distributed mainly between 100 and 200 Mg ha −1 , whereas Japanese cypress had an AGB distribution with two peaks between 200 and 400 Mg ha −1 , with a higher standard deviation (78.48 Mg ha −1 ). In some previous research, AGB estimation based on the stratification of vegetation types greatly improved the performance [75,76]. We assessed the AGB estimation for two tree species, and found a significant difference in the AGB value distribution (Figure 8). However, the development of an AGB estimation model based on stratification of multiple tree species is more difficult because it requires additional data: (1) vegetation distribution maps for the targeted species, (2) groundbased AGB values classified by species, and (3) a sufficiently large sample size to build a robust model while still leaving data for testing and validation.

Role and Limitation of Satellite-Derived Variables in Accurate Estimation of Japanese Cedar and Japanese Cypress AGB
SAR and optical remote sensing have different drawbacks and advantages for AGB estimation. Either dataset by itself is not enough to accurately estimate forest AGB [77]. SAR is relatively unaffected by weather, since it can penetrate clouds and work all day and night. It can also penetrate through the canopy, soil, and dry snow. However, even L-band SAR becomes saturated at an AGB of 100 Mg ha −1 in complex heterogeneous tropical forest structures. In forests with a simple structure and few dominant species, the saturation level could increase to about 250 Mg ha −1 [78]. We found that the optical data were more resistant than the SAR data to AGB saturation for Japanese cedar and cypress at high AGB values (Figure 9).

Role and Limitation of Satellite-Derived Variables in Accurate Estimation of Japanese Cedar and Japanese Cypress AGB
SAR and optical remote sensing have different drawbacks and advantages for AGB estimation. Either dataset by itself is not enough to accurately estimate forest AGB [77]. SAR is relatively unaffected by weather, since it can penetrate clouds and work all day and night. It can also penetrate through the canopy, soil, and dry snow. However, even L-band SAR becomes saturated at an AGB of 100 Mg ha −1 in complex heterogeneous tropical forest structures. In forests with a simple structure and few dominant species, the saturation level could increase to about 250 Mg ha −1 [78]. We found that the optical data were more resistant than the SAR data to AGB saturation for Japanese cedar and cypress at high AGB values (Figure 9).
To identify the saturation level for the two tree species, we used the HV backscatter from the SAR data. Cedar became saturated at 105 Mg ha −1 and cypress at 175 Mg ha −1 . Even though these species are similar in their structure and living conditions, they showed a clear difference in the saturation level with SAR at relatively low values. In contrast, the optical sensors are strongly affected by weather conditions, but also show AGB saturation. Because these different sensor data have different advantages and drawbacks, integration of radar data with optical-sensor data has the potential to improve AGB estimation because it may reduce the number of mixed pixels and data saturation problems [66].  Tables 3 and 4. To identify the saturation level for the two tree species, we used the HV backscatter from the SAR data. Cedar became saturated at 105 Mg ha −1 and cypress at 175 Mg ha −1 . Even though these species are similar in their structure and living conditions, they showed a clear difference in the saturation level with SAR at relatively low values. In contrast, the optical sensors are strongly affected by weather conditions, but also show AGB saturation. Because these different sensor data have different advantages and drawbacks, integration of radar data with optical-sensor data has the potential to improve AGB estimation because it may reduce the number of mixed pixels and data saturation problems [66].
Our aim was to develop models of cedar and cypress for estimating AGB from two types of satellite data: L-band microwave radar data from PALSAR-2 and multispectral optical remote sensing data from Sentinel2-MSI. For our study species, microwave remote sensing was more sensitive to saturation than optical remote sensing. Therefore, the estimation results of the PALSAR-2 model performed worse in both species (Figure 10). In contrast, the model that combined the two datasets performed best in every case. This demonstrates that combining different types of remote sensing data can improve the estimation accuracy and AGB range.  Tables 3 and 4. Our aim was to develop models of cedar and cypress for estimating AGB from two types of satellite data: L-band microwave radar data from PALSAR-2 and multispectral optical remote sensing data from Sentinel2-MSI. For our study species, microwave remote sensing was more sensitive to saturation than optical remote sensing. Therefore, the estimation results of the PALSAR-2 model performed worse in both species (Figure 10). In contrast, the model that combined the two datasets performed best in every case. This demonstrates that combining different types of remote sensing data can improve the estimation accuracy and AGB range.
However, underestimation at high AGB values remains large, since satellite information (especially microwave and optical remote sensing data) inevitably became saturated. Although this problem can be alleviated by adding texture information [49] or by combining multi-source remote sensing data, as we did in this study, it is still fundamentally difficult to solve the saturation problem. The airborne lidar data have a high range for estimation of AGB (i.e., high resistance to saturation) owing to their wavelength characteristics, but such data are expensive, which makes it impossible to cover large areas such as a whole country or continent, unlike the space satellite data that are used for large-area studies. Hence, airborne lidar data have mainly been used in small areas [79]. Establishing a model that would extend AGB estimation to large areas by combining data from field plots, airborne lidar, and space satellite data thus has considerable potential to enlarge the area that could be surveyed with airborne lidar data [80]. In such a method, lidar data and satellite remote sensing data could be combined to support large-area AGB predictions, but more tests are still needed, and the analytical framework must be improved to support this use of multisource data. This is because the use of different buffer sizes to combine data from different sources can decrease the accuracy of AGB estimation [80]. However, underestimation at high AGB values remains large, since satellite information (especially microwave and optical remote sensing data) inevitably became saturated. Although this problem can be alleviated by adding texture information [49] or by combining multi-source remote sensing data, as we did in this study, it is still fundamentally difficult to solve the saturation problem. The airborne lidar data have a high range for estimation of AGB (i.e., high resistance to saturation) owing to their wavelength characteristics, but such data are expensive, which makes it impossible to cover large areas such as a whole country or continent, unlike the space satellite data that are used for large-area studies. Hence, airborne lidar data have mainly been used in small areas [79]. Establishing a model that would extend AGB estimation to large areas by combining data from field plots, airborne lidar, and space satellite data thus has considerable potential to enlarge the area that could be surveyed with airborne lidar data [80]. In such a method, lidar data and satellite remote sensing data could be combined to support large-area AGB predictions, but more tests are still needed, and the analytical framework must be improved to support this use of multisource data. This is because the use of different buffer sizes to combine data from different sources can decrease the accuracy of AGB estimation [80]. The Random Forest model has an excellent predictive ability but has the characteristics of tree-regression. The algorithm operates by constructing many decision trees during training and outputs the predicted mean or mode of the individual decision trees. The AGB prediction averages all variables extracted from the satellite images. However, the algorithm cannot predict the value from the training samples. Using only the satellite-based data, the problem becomes more severe, since saturation occurred in all of the satellite data. Because our approach overlays the underestimation of high AGB values with AGB saturation in the satellite images, it is hard to obtain good performance in forests with high AGB.

Benchmark AGB Estimated in the Japanese Forest Inventory
The satellite-derived AGB map was compared with government statistics for the total AGB in every targeted city ( Figure 11). The forestry statistics in Japan are based on the forest register, which is used for forest management. Japanese forests are managed as land units called sub-compartments. The forest register records forest conditions for every subcompartment, such as its area, tree species, mean age, and stand volume. Japanese law requires that the forest register be updated every 5 years. The satellite-derived AGB was The Random Forest model has an excellent predictive ability but has the characteristics of tree-regression. The algorithm operates by constructing many decision trees during training and outputs the predicted mean or mode of the individual decision trees. The AGB prediction averages all variables extracted from the satellite images. However, the algorithm cannot predict the value from the training samples. Using only the satellite-based data, the problem becomes more severe, since saturation occurred in all of the satellite data. Because our approach overlays the underestimation of high AGB values with AGB saturation in the satellite images, it is hard to obtain good performance in forests with high AGB.

Benchmark AGB Estimated in the Japanese Forest Inventory
The satellite-derived AGB map was compared with government statistics for the total AGB in every targeted city ( Figure 11). The forestry statistics in Japan are based on the forest register, which is used for forest management. Japanese forests are managed as land units called sub-compartments. The forest register records forest conditions for every sub-compartment, such as its area, tree species, mean age, and stand volume. Japanese law requires that the forest register be updated every 5 years. The satellite-derived AGB was more significant than the value in the register in Ibaraki, and some previous studies also concluded that the forest register underestimated the forest volume [81]. Japan's Forestry and Forest Products Research Institute compared the register's data with a field measurement at 10,189 sub-compartments throughout Japan, and found that the fieldmeasured forest volume was 1.88 times the value in the forest register for Japan as a whole [82]. These results agree with the present results for cypress, since the estimated AGB was larger than the value in the register. A previous study analyzed airborne lidar data for all of Ehime Prefecture (a prefecture located in southwestern Japan), and found that the total forest volume in Ehime was 2.01 times the value in the register [83]. The authors mentioned that errors in both the lidar estimates and the register contributed to this difference, but suggested that the errors in the register were much larger. These previous studies suggest that our results are reasonable, and the satellite estimates are closer than the register to the actual values. The forest register may underestimate the forest volume for at least two reasons: (1) it records only an estimated value based on the tree species and forest age, not a field-measured value, and (2) the empirical yield tables (used for the estimation that produces the register values) were developed in the 1950s and 1960s and have not been updated since then, so their gap relative to the actual values may have increased [82,84]. One reason for this gap is that there were insufficient measurement data for old-growth forests to support empirical development of yield tables, so the tables underestimate the volume of old-growth forests. Accurate forest resource information is fundamental for forest management, and the forest register, which does not have a monitoring function for actual forests, cannot provide the necessary support. Our approach may solve this problem.
of Ehime Prefecture (a prefecture located in southwestern Japan), and found that the total forest volume in Ehime was 2.01 times the value in the register [83]. The authors mentioned that errors in both the lidar estimates and the register contributed to this difference, but suggested that the errors in the register were much larger. These previous studies suggest that our results are reasonable, and the satellite estimates are closer than the register to the actual values. The forest register may underestimate the forest volume for at least two reasons: (1) it records only an estimated value based on the tree species and forest age, not a field-measured value, and (2) the empirical yield tables (used for the estimation that produces the register values) were developed in the 1950s and 1960s and have not been updated since then, so their gap relative to the actual values may have increased [82,84]. One reason for this gap is that there were insufficient measurement data for old-growth forests to support empirical development of yield tables, so the tables underestimate the volume of old-growth forests. Accurate forest resource information is fundamental for forest management, and the forest register, which does not have a monitoring function for actual forests, cannot provide the necessary support. Our approach may solve this problem. Figure 11. Comparison of total aboveground biomass (AGB) between statistical data in the Japanese forest register and the remote-sensing data. Note that the scales differ greatly between the two species. Each data point represents the cumulative AGB at each of 17 targeted places (villages, towns, and cities) in Ibaraki Prefecture.

Uncertainty in AGB Estimation
The geographic location between the airborne lidar sample points and satellite pixels will bring significant uncertainty. We use the geometric center of an airborne Lidar sample point with a resolution of 20 m (0.04 hectares) to extract satellite pixel values. Nevertheless, an area of 0.04 hectares cannot cover a plurality of pixels well, leading to missing and biased parts of the data. Although we have alleviated the uncertainty Figure 11. Comparison of total aboveground biomass (AGB) between statistical data in the Japanese forest register and the remote-sensing data. Note that the scales differ greatly between the two species. Each data point represents the cumulative AGB at each of 17 targeted places (villages, towns, and cities) in Ibaraki Prefecture.

Uncertainty in AGB Estimation
The geographic location between the airborne lidar sample points and satellite pixels will bring significant uncertainty. We use the geometric center of an airborne Lidar sample point with a resolution of 20 m (0.04 hectares) to extract satellite pixel values. Nevertheless, an area of 0.04 hectares cannot cover a plurality of pixels well, leading to missing and biased parts of the data. Although we have alleviated the uncertainty mentioned above through the mean filter, using the mean filter will cause some irrelevant pixels to be calculated into the target pixels, especially when some different tree species are inlaid with each other or in the boundary of the forest area [85]. One method is to select only a cluster of airborne radar sample points gathered by a single tree species as the ground data points and calculate the average value of the pixel points covered by the sample points as the satellite data value corresponding to the airborne lidar plot, but this would significantly increase the complexity of the calculation.
The temporal difference between airborne Lidar and satellite images can also be uncertain. As cloud cover often occurs in the northern part of our research area, in order to avoid the impact of this situation on detection, it is challenging to select satellite data that is the same as or very close to the ground sample point collection time and dozens of days may lead to a difference of forest growth which may lead to temporal variability in satellite images [86]. The change of the period has led to errors in the agreement between the biomass and satellite data. One of the methods focuses on the growth period, ignoring the influence of the year, and selecting ground sample points in other years to collect the satellite data of the corresponding month, but this may lead to forest changes caused by excessive time. Therefore, it is essential to check the forest to see massive changes due to people's affection or natural hazards.
Finally, there is the error caused by the biomass equation because we used the airborne lidar data as the "real" sample data and satellite data for correction. This method provides more sample data than traditional human measurement sample points. A more extensive sample set will avoid the curse of dimensionality caused by too many satellite variables and has better robustness compared to small data set. However, the method of calculating the stock volume by obtaining the parameter values of the trees and converting it through the volume-biomass transferring equation will also have a significant deviation compared with the actual biomass calculation equation of a single tree called recording and grouping errors [79]. However, AGB estimations using only field data over large areas suffer an enormous error rate. Therefore, choosing a large amount of data or a small amount of data with higher accuracy needs to be traded off carefully.

Conclusions
We developed robust and effective models to estimate the AGB of Japanese cedar and cypress by a machine learning approach. As far as we know, no other studies have used remote sensing data to retrieve the AGB of those two types of forest at prefecture level. We hope to create a new approach to remedying the lack of forest biomass in Japan. By combining PALSAR-2 and Sentinel2-MSI data and using a large number of validation samples from lidar-based AGB plots, we increased the accuracy of AGB prediction for both species compared with using only one data source. The hyperparameter tuning in Random Forest also improved estimation accuracy, especially for the depth of the tree structure. Because the choice of modeling variables strongly affects the accuracy and simplicity of the models, our approach also helps to select the optimal variables for inclusion in the final model. The texture information for the PALSAR-2 images played an important role in estimating AGB, and this confirms the value of retaining SAR texture information.
Although our method provided a scientific basis for more accurately estimating AGB of the two tree species in our study, more work will be necessary to adapt the method to multi-species forests. The methodology could then be adopted for mapping and estimating forest biomass in Japan and updating the forest register. This use of remote sensing will provide a cost-efficient way to estimate forest conditions and their spatial and temporal variation. Acknowledgments: This study was supported by JAXA Research Announcement on Earth Observations (No. ER2A2N208). The airborne lidar data was provided by the Ibaraki Prefectural Government (permit No. RINSEI-264).

Conflicts of Interest:
The authors declare no conflict of interest. Figure A1 compares the data in the Japanese forestry register with predictions from the remote-sensing model for the targeted cities in Ibaraki Prefecture. Government (permit No. RINSEI-264).

Conflicts of Interest:
The authors declare no conflict of interest. Figure A1 compares the data in the Japanese forestry register with predictions fro the remote-sensing model for the targeted cities in Ibaraki Prefecture.