Predicting Growing Stock Volume of Scots Pine Stands Using Sentinel-2 Satellite Imagery and Airborne Image-Derived Point Clouds

Estimation of forest stand parameters using remotely sensed data has considerable significance for sustainable forest management. Wide and free access to the collection of medium-resolution optical multispectral Sentinel-2 satellite images is very important for the practical application of remote sensing technology in forestry. This study assessed the accuracy of Sentinel-2-based growing stock volume predictive models of single canopy layer Scots pine (Pinus sylvestris L.) stands. We also investigated whether the inclusion of Sentinel-2 data improved the accuracy of models based on airborne image-derived point cloud data (IPC). A multiple linear regression (LM) and random forest (RF) methods were tested for generating predictive models. The measurements from 94 circular field plots (400 m2) were used as reference data. In general, the LM method provided more accurate models than the RF method. Models created using only Sentinel-2A images had low prediction accuracy and were characterized by a high root mean square error (RMSE%) of 35.14% and a low coefficient of determination (R2) of 0.24. Fusion of IPC data with Sentinel-2 reflectance values provided the most accurate model: RMSE% = 16.95% and R2 = 0.82. However, comparable accuracy was obtained using the IPC-based model: RMSE% = 17.26% and R2 = 0.81. The results showed that for single canopy layer Scots pine dominated stands the incorporation of Sentinel-2 satellite images into IPC-based growing stock volume predictive models did not significantly improve the model accuracy. From an operational point of view, the additional utilization of Sentinel-2 data is not justified in this context.


Introduction
Remote sensing has many applications in the field of forestry.It supports activities related to forest inventory, planning, management and monitoring at a local, regional and up to a global scale.Many different applications are powered by remote sensing data.However, the widespread operational use of remote sensing-based methods for forest stand volume estimation has often been limited by the relatively high costs of Airborne Laser Scanning (ALS) data and commercial high-resolution satellite images.Despite this, measuring the volume of individual trees and growing stock volume of forest stands remains of great interest to researchers and practitioners [1,2].
The growing popularity of relatively low-cost airborne image-derived point clouds (IPC) and open access to medium-resolution multispectral Sentinel-2 satellite images have created an opportunity to make remote sensing-based inventory methods even more applicable in practice.The two twin optical satellites Sentinel-2A and Sentinel-2B developed by the European Space Agency (ESA) were launched in 2015 and 2017, respectively, as a part of Europe's Global Monitoring for the Environment and Security (GMES) programme.Sentinel-2 provides images with 13 spectral bands spanning from the visible (VIS) and the near infra-red (NIR) to the short wave infra-red (SWIR) with different ground sampling distances (GSD) from 10 m to 60 m [3].The spatial, spectral and radiometric characteristics of Sentinel-2 are expected to contribute to improved forest biophysical parameters estimates [4].However very few researchers have addressed the question of the potential usefulness of Sentinel-2 data in this context [4][5][6].Previous studies have shown that IPCs offer a cost-effective alternative to ALS data for estimating forest inventory attributes [7][8][9][10][11].
The topic of inclusion of satellite images into ALS-based predictive models of growing stock volume (V) and above ground biomass (AGB) is reported in the literature.Integration of Landsat ETM+ images and ALS data was used for timber volume estimation in the mixed forests of Baden-Württemberg [12].For a selected region of Baden-Württemberg, it was found that Landsat TM data improved the predictive model of V by decreasing the RMSE% from 18.4% to 16.8% [13].Adding the IRS 1C LISS III multispectral data to ALS-based predictive models of timber volume showed the potential for a slight increase in accuracy (an increase in adjusted-R 2 between 0.01 and 0.04) in complex forest areas of the Southern Alps [14].The inclusion of Landsat ETM+ vegetation indices in stem volume predictive models of Pinus radiata D. Don stands in Spain increased the R 2 from 0.60 to 0.82 [15].In contrast, little improvement in stem volume estimation was achieved through combining RapidEye (Planet) data with ALS metrics for Pinus radiata D. Don plantations in Australia [16].It was also found that the synergistic use of Landsat 8 OLI and airborne LiDAR data significantly improved the AGB estimation of primary tropical rainforest [17].On the other hand, in subtropical woody plant communities in Australia, incorporating Landsat ETM+ variables did not improve the AGB prediction models in terms of R 2 and the RMSE% was only slightly improved from 18.4% to 16.8%.There are also known examples of using medium-resolution satellite images for assessment of the characteristics of forest stands that influence the stock volume, such as stand age [18] and canopy cover [19].
Previous research shows that there is no clear answer to the question of whether the inclusion of medium-resolution satellite images improves ALS-based estimates of V or AGB.Often it seems to depend on the particular conditions of the evaluated forest stands.We supposed that this ambiguity would also apply to IPC-based predictive models, and therefore, in this study we chose to examine whether the inclusion of Seninel-2 data improves IPC-based predictive models of V in the case of Scots pine stands.To the best of our knowledge, and up until now, there are no other studies that deal with this topic.This study investigated the usefulness of Sentinel-2 satellite images and their inclusion in IPC-based predictive models for growing stock volume modeling of single canopy layer Scots pine (Pinus sylvestris L.) stands.The four main objectives of this research are: (1) to evaluate the accuracy of area-based approach (ABA) predictive models of growing stock volume of Scots pine stands based on Sentinel-2 reflectance values; (2) to evaluate whether Sentinel-2 data can improve the accuracy of IPC-based predictive models; (3) to compare the accuracy of predictive models created using multiple linear regression (LM) and random forest (RF) methods; and (4) to assess the robustness of predictor variables obtained from Sentinel-2 and IPC datasets in the context of growing stock volume modeling using different regression methods (LM and RF).

Materials and Methods
The study area is located in the South-Eastern Poland.The analyzed area of about 7800 ha of managed forest stands is owned by the Polish State Forest National Holding (Figure 1).The Scots pine (Pinus sylvestris L.) is the dominant tree species with a 95% share of the total stock.According to the forest inventory data provided by the Polish State Forest National Holding the average stand age is 67 years.The stands are generally even-aged and one layered.In order to capture the variability of the surveyed Scots pine stands, the field plot locations were selected via a pre-stratification method using three forest stand attributes: stand age, height and canopy cover.Stand age was obtained from an existing forest database.The ALS point clouds acquired on 19 October 2013 under leaf-on canopy conditions were used for stand height (95th percentile of normalized point heights) and canopy cover (percentage of first returns above 4 m) assessment.These two parameters were calculated with 20 m grid cells using FUSION (Version 3.42) software package [20].Then, for the analyzed stand parameters the 20 m resolution raster layers were created with the following classes: four classes for stand age using a 25-year step and beginning at year 20; five classes for stand height using a 5 m step from a range of 10-35 m; and four classes for canopy cover using a 25% step.Then, a spatial combination of the three raster layers (age, height, canopy cover) was created giving the raster layer a value for each unique combination of input layers.Forest stands younger than 20 years and small strata with a total area <1.0 ha were excluded from analysis.This led to 47 unique strata.Finally, two plots were selected for each of the 47 strata using a stratified random sampling scheme that resulted in 94 field plots with an area of 400 m 2 (11.28 m radius).Field measurements were performed in August 2015.Plots were located in the field with sub-meter precision using a MobileMapper 120 (Spectra Precision) GNSS receiver.The diameter at breast height (DBH) and tree species on the plot were recorded for all standing trees with DBH ≥ 7.0 cm, whereas the height was measured only for three trees from the seven trees closest to the plot center at each inventory plot.The selected seven trees were ordered according to DBH, and the third, fourth and fifth trees were then measured for height.Subsequently, the height and volume for all DBH recorded trees were calculated based on empirical formula created for Scots pine in Poland [21].Obtained measurements of Lorey's mean height (H [m]), basal area (G [m 2 /ha]) and growing stock volume (V [m 3 /ha]) calculated from all trees at plot level are shown in Figure 2. In order to capture the variability of the surveyed Scots pine stands, the field plot locations were selected via a pre-stratification method using three forest stand attributes: stand age, height and canopy cover.Stand age was obtained from an existing forest database.The ALS point clouds acquired on 19 October 2013 under leaf-on canopy conditions were used for stand height (95th percentile of normalized point heights) and canopy cover (percentage of first returns above 4 m) assessment.These two parameters were calculated with 20 m grid cells using FUSION (Version 3.42) software package [20].Then, for the analyzed stand parameters the 20 m resolution raster layers were created with the following classes: four classes for stand age using a 25-year step and beginning at year 20; five classes for stand height using a 5 m step from a range of 10-35 m; and four classes for canopy cover using a 25% step.Then, a spatial combination of the three raster layers (age, height, canopy cover) was created giving the raster layer a value for each unique combination of input layers.Forest stands younger than 20 years and small strata with a total area <1.0 ha were excluded from analysis.This led to 47 unique strata.Finally, two plots were selected for each of the 47 strata using a stratified random sampling scheme that resulted in 94 field plots with an area of 400 m 2 (11.28 m radius).Field measurements were performed in August 2015.Plots were located in the field with sub-meter precision using a MobileMapper 120 (Spectra Precision) GNSS receiver.The diameter at breast height (DBH) and tree species on the plot were recorded for all standing trees with DBH ≥ 7.0 cm, whereas the height was measured only for three trees from the seven trees closest to the plot center at each inventory plot.The selected seven trees were ordered according to DBH, and the third, fourth and fifth trees were then measured for height.Subsequently, the height and volume for all DBH recorded trees were calculated based on empirical formula created for Scots pine in Poland [21].Obtained measurements of Lorey's mean height (H [m]), basal area (G [m 2 /ha]) and growing stock volume (V [m 3 /ha]) calculated from all trees at plot level are shown in Figure 2. To achieve the intended research objectives three different types of remote sensing data were required: ALS point clouds, digital aerial images and Sentinel-2 satellite images.
The ALS point clouds acquired on 19 October 2013 under leaf-on canopy conditions were used for generating Digital Terrain Model (DTMALS) in the Polish coordinate system (PL-1992; EPSG-2180).The classified ALS point clouds (ver.1.2; ASPRS, Bethesdan, MD, USA) with average density of 6.3 points•m −2 were obtained from the Main Office of Geodesy and Cartography (GUGiK) in Warsaw (Poland).The 1.0 m spatial resolution DTMALS for the entire test area was generated using the FUSION (Version 3.42) software package [20].
Digital aerial images were collected with two separate flights performed on the 9 and 28 August 2015 using a UltraCamXp camera (Vexcel, Graz, Austria).The obtained Colour InfraRed (CIR) images had Ground Sampling Distance (GSD) of 0.25 m.In order to generate IPCs the PhotoScan Professional software package was used (version 1.3.4,Agisoft, Petersburg, Russia).The software exploits a computer vision Structure from Motion approach (SfM; [22]) for image matching and 3D point cloud generation [23,24].The software parameters set for image processing in this study are given in Table 1.The 11 Ground Control Points (GCPs) were used for calculation of external camera orientation (EO) and georeferencing.The coordinates of GCPs were obtained from an aerial orthophoto (2015) available through the Web Mapping Service: www.geoportal.gov.pl(XY coordinates; PL-1992 reference coordinate system) and from the DTMALS (Z coordinate; Kronsztadt 1986 elevation reference system).The achieved point cloud density amounted to 6.9 points/m 2 .Subsequently, the IPCs were normalized using the DTMALS values by subtracting the point elevation from the ground elevation.In the next step, the IPCs were limited to the borders of each of the 94 inventory plot.Then, the selected point cloud metrics (Table 2) were calculated using FUSION software [20].Only points with normalized height above the 1.0 m threshold were considered in the calculation in order to eliminate ground-related noise.Additionally, the 4.0 m height threshold was used for canopy density metrics calculation to select those points that represented the tree canopy heights only.To achieve the intended research objectives three different types of remote sensing data were required: ALS point clouds, digital aerial images and Sentinel-2 satellite images.
The ALS point clouds acquired on 19 October 2013 under leaf-on canopy conditions were used for generating Digital Terrain Model (DTM ALS ) in the Polish coordinate system (PL-1992; EPSG-2180).The classified ALS point clouds (ver.1.2; ASPRS, Bethesdan, MD, USA) with average density of 6.3 points•m −2 were obtained from the Main Office of Geodesy and Cartography (GUGiK) in Warsaw (Poland).The 1.0 m spatial resolution DTM ALS for the entire test area was generated using the FUSION (Version 3.42) software package [20].
Digital aerial images were collected with two separate flights performed on the 9 and 28 August 2015 using a UltraCamXp camera (Vexcel, Graz, Austria).The obtained Colour InfraRed (CIR) images had Ground Sampling Distance (GSD) of 0.25 m.In order to generate IPCs the PhotoScan Professional software package was used (version 1.3.4,Agisoft, Petersburg, Russia).The software exploits a computer vision Structure from Motion approach (SfM; [22]) for image matching and 3D point cloud generation [23,24].The software parameters set for image processing in this study are given in Table 1.The 11 Ground Control Points (GCPs) were used for calculation of external camera orientation (EO) and georeferencing.The coordinates of GCPs were obtained from an aerial orthophoto (2015) available through the Web Mapping Service: www.geoportal.gov.pl(XY coordinates; PL-1992 reference coordinate system) and from the DTM ALS (Z coordinate; Kronsztadt 1986 elevation reference system).The achieved point cloud density amounted to 6.9 points/m 2 .Subsequently, the IPCs were normalized using the DTM ALS values by subtracting the point elevation from the ground elevation.In the next step, the IPCs were limited to the borders of each of the 94 inventory plot.Then, the selected point cloud metrics (Table 2) were calculated using FUSION software [20].Only points with normalized height above the 1.0 m threshold were considered in the calculation in order to eliminate ground-related noise.Additionally, the 4.0 m height threshold was used for canopy density metrics calculation to select those points that represented the tree canopy heights only.One cloud free Sentinel-2 scene acquired on 31th August (S2) was used in this study.The data were obtained from the Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home)as Level-1C data with Top of Atmosphere (TOA) reflectance.Characteristics of the spectral bands of Sentinel-2 MSI (Multi-Spectral Instrument) sensor are presented in Table 3.The atmospheric correction of Level-1C input data was performed using the Sen2Cor plug-in for Sentinel-2 Toolbox and SNAP software provided by ESA (version 6.0.0,Brockmann Consult, Geesthacht, Germany).The obtained Bottom of Atmosphere (BOA) surface reflectance images were resampled to 10 m spatial resolution.In the case of B5, B6, B7 and B8A spectral bands this resulted in four identical surface reflectance values at 10 m pixel size.All spectral bands except B1, B9 and B10, were used for building the predictive models.Finally, the mean surface reflectance values for BOA bands were calculated for each inventory plot.The calculated mean value was weighted by the fraction of each pixel that was covered by the field plot.These mean values were subsequently used as predictor variables in regression models.Multiple linear regression (LM) and random forest (RF; [25]) methods were used to create predictive models of growing stock volume (V).The models were created from three sets of predictor variables: reflectance values of Sentinel-2 images acquired in August 2015 (S2), IPC metrics (IPC), and a combination of selected IPC metrics and Sentinel-2 reflectance values (IPC.S2).To reduce the number of models for evaluation only those metrics selected in the final IPC.LM model were added to the IPC.S2 predictor set.First, the non-collinear variables were selected from each set of metrics separately.For a pair of analyzed predictors with a Spearman correlation higher than 0.9, the one with the lower correlation with V was dropped.Subsequently, a 2 k number of LM models were created for each predictor set, where k is the number of variables in a certain set.This enabled evaluation of LM models with all possible combinations of pre-selected predictor variables within one predictor set.The best model within each predictor set and regression method (LM, RF) was selected based on the lowest value of root mean squared error (RMSE).The RF approach has a tuning parameter called mtry which is the number of randomly selected predictors, to choose from at each split [25,26].For the purpose of tuning the mtry parameter the sequence of integers across the range from 2 to the number of predictors in each predictor set was used.For each RF model all non-correlated variables within the variable set were used.The number of trees for the forest was set to 1000 in all models.
Models accuracy was assessed using a bootstrap method with 500 repetitions [27].The observations not used in a certain repetition for model training were used for assessment of the model performance (hold-out set).Measures of the models' accuracy were obtained using the R 2 , RMSE and Bias.These were calculated using the following equations: where: Obtained performance parameters were subsequently averaged over the 500 bootstrap iterations.The averaged RMSE and Bias were then divided by the mean value of V and multiplied by 100% to create RMSE% and Bias%, respectively.Additionally, an inferential assessment of model performance was performed using the "diff" function from the caret R package [28].The ideas and method of this tool are based on benchmark experiment design [29,30].Benchmark studies are aimed at comparing the performance of several competing algorithms for a certain learning problem.The general idea is that using a proper resampling scheme (e.g., bootstrap) the resulting performance observations are independent and identically distributed.Thus, the performance of predictive models can be compared using standard statistical test procedures to test hypotheses about the distribution of the performance measures.In our study, the pair-wise differences between the models' RMSE and R 2 from 500 bootstrap runs were computed and tested with Paired Student's t-test to assess if the difference is equal to zero (α = 0.05).For a single bootstrap run the same hold-out observations were used for each evaluated model.
To assess the robustness of predictor variables we used different approaches for LM and RF models.In the case of LM, the relative variable importance (RVI) was assessed based on Akaike weights [31].The RVI was calculated as the sum of the Akaike weights in the subset of models including that predictor.The calculation was performed separately for each set of predictors (S2, IPC and IPC.S2) for all possible combinations of variables.Calculation of variable importance for LM was performed using the R "AICcmodavg" package [32].
For RF models, the RVI of variables was assessed using the approach implemented in the "randomForest" R package [33].The calculation of variable importance was performed in each predictor set for the optimal model in terms of the mtry parameter.The idea behind this method is that the Mean Squared Error (MSE) calculated for a single tree on the out-of-bag portion of the data is compared to the MSE of the same tree obtained after permutation of each predictor variable.The difference between these two MSE values are subsequently averaged over all trees, and normalized by the standard error.For easier comparison between RVI values obtained for LM and RF methods the variable importance values from RF models were scaled to have a maximum value of 1.

Predictor Variables Pre-Selection
The Spearman correlation coefficient values describing the correlation between V and evaluated predictor variables are showed in Tables 4 and 5.The variable Elev.mean was found to have the highest correlation with V (r S = 0.91).Also high correlations were found between V and elevation percentiles of point cloud.It is worth noting that the elevation percentiles were highly correlated with Elev.mean, therefore, they were not used in the next steps for creating predictive models.In the case of Sentinel-2 reflectance values, the highest correlations with V were observed for B12 (r S = −0.51)and B11 (r S = −0.46)bands while the lowest with B2 (r S = 0.05).The scatterplots of growing stock volume (V) at plot level versus Sentinel-2 reflectance values are presented in Figure 3.For most Sentinel-2 spectral bands the relationship between V and reflectance values was negative (Figure 3; Table 5).Removing highly correlated predictors lead to a reduction in the number of models for evaluation.The highest number of models were evaluated for the IPC.S2 predictor set (512).The optimal value of the mtry tuning parameter varied from 2 for S2.RF to 8 for the IPC.RF model (Table 6).

Performance of Predictive Models of Growing Stock Volume
The 6 growing stock volume (V) predictive models were created based on predictor variables obtained from spectral bands of Sentinel-2 images and IPC metrics, using two regression methods (LM and RF).The relationships between observed and predicted values of V obtained from bootstrap resampling are presented in Figure 4.

Performance of Predictive Models of Growing Stock Volume
The 6 growing stock volume (V) predictive models were created based on predictor variables obtained from spectral bands of Sentinel-2 images and IPC metrics, using two regression methods (LM and RF).The relationships between observed and predicted values of V obtained from bootstrap resampling are presented in Figure 4. Averaged R 2 , RMSE, RMSE%, Bias and Bias% values for the created predictive models are presented in Table 7.Additionally, the distribution of the selected performance metrics (R 2 , RMSE%, Bias%) from the 500 bootstrap runs are presented in Figure 5.The best model was achieved using LM based on the IPC combined with the Sentinel-2 image, with RMSE%IPC.S2.LM = 16.95% and R 2 IPC.S2.LM = 0.82.The LM regression method created more accurate models than the RF method in all predictor variable sets.The bias was close to 0 for all constructed models.The poorest model was obtained with the RF method based on S2 data, with RMSE%S2.RF = 36.46%and R 2 S2.RF = 0.17.The largest difference in R 2 was observed between the IPC.S2.LM and the S2.RF models (0.65).The difference in RMSE was also highest for this pair of analyzed models.Inferential assessment of model performance showed that the differences in accuracy in terms of R 2 and RMSE were statistically significant (α = 0.05) for all pairs of models excepting the IPC.S2.RF and IPC.RF pair.The results of the inferential assessment of the models' differences in terms of R 2 are presented in Table 8.Averaged R 2 , RMSE, RMSE%, Bias and Bias% values for the created predictive models are presented in Table 7.Additionally, the distribution of the selected performance metrics (R 2 , RMSE%, Bias%) from the 500 bootstrap runs are presented in Figure 5.The best model was achieved using LM based on the IPC combined with the Sentinel-2 image, with RMSE% IPC.S2.LM = 16.95% and R 2 IPC.S2.LM = 0.82.The LM regression method created more accurate models than the RF method in all predictor variable sets.The bias was close to 0 for all constructed models.The poorest model was obtained with the RF method based on S2 data, with RMSE% S2.RF = 36.46%and R 2 S2.RF = 0.17.The largest difference in R 2 was observed between the IPC.S2.LM and the S2.RF models (0.65).The difference in RMSE was also highest for this pair of analyzed models.Inferential assessment of model performance showed that the differences in accuracy in terms of R 2 and RMSE were statistically significant (α = 0.05) for all pairs of models excepting the IPC.S2.RF and IPC.RF pair.The results of the inferential assessment of the models' differences in terms of R 2 are presented in Table 8.

Relative Variable Importance
Analysis of relative variable importance showed that depending on the applied regression method, LM or RF, different predictor variables were found to be most robust (Figure 6).Only the Elev.meanvariable showed the highest RVI value for both regression methods (IPC and IPC.S2 variable sets).In case of IPC-based models the Canopy.relief.ratio was found to be a robust predictor variable for the LM but not for the RF approach.In Sentinel-2-based models, B4 was shown to be an important predictor for both, the LM and the RF methods.

Relative Variable Importance
Analysis of relative variable importance showed that depending on the applied regression method, LM or RF, different predictor variables were found to be most robust (Figure 6).Only the Elev.meanvariable showed the highest RVI value for both regression methods (IPC and IPC.S2 variable sets).In case of IPC-based models the Canopy.relief.ratio was found to be a robust predictor variable for the LM but not for the RF approach.In Sentinel-2-based models, B4 was shown to be an important predictor for both, the LM and the RF methods.

Discussion
The two main goals of this study were to evaluate the suitability of Sentinel-2 reflectance values for estimating the growing stock volume (V) of single layer Scots pine stands and to analyze whether the Sentinel-2 imagery can improve the accuracy of IPC-based predictive models.We also examined the accuracy of two different regression methods-multiple linear regression (LM) and random forest (RF) and analyzed the robustness of variables used in created predictive models.As far as we know, this is the first time that fusion of Sentinel-2 satellite images and IPC data has been examined in the context of growing stock volume estimation of Scots pine stands.
Our results indicate that Sentinel-2-based predictive models estimates the growing stock volume (V) of Scots pine stands with low accuracy.It is worth noting that although additional use of Sentinel-2 imagery in the IPC-based predictive model provided a statistically significant increase in accuracy, the increase was relatively low (0.01 increase in R 2 and 0.31% decrease in RMSE%).The obtained results show that a multiple linear (LM) regression approach provides a more accurate growing stock volume (V) estimation of single layer Scots pine stands than the random forest (RF) method when using IPC and Sentinel-2 datasets.The results show that different predictor variables may be optimal for constructing growing stock volume models of Scots pine stands depending on the applied regression method-LM or RF.
The results obtained for growing stock volume estimation using IPC data (RMSE% IPC.LM = 17.26%) are consistent with our previous findings for the same reference dataset but with different aerial images (RMSE% = 17.0%; [34]).The obtained accuracies of IPC-based predictive models also concur well with findings from previous experiments performed by other authors in similar forest stand conditions [8,35].Regarding the effectiveness of IPC metrics for V estimation using LM, the Elev.mean and Canopy.relief.ratiowere found to be the most robust predictor variables which is consistent with previous results obtained for this study area [34].
Our results regarding the suitability of Sentinel-2 spectral bands for growing stock volume (V) estimation are similar to findings from a previous study on AGB estimation based on Seninel-2 simulated bands [5].The relationships between all ratio vegetation indices (RVI) and normalized difference vegetation indices (NVI) and AGB were analyzed.It was also found that none of the band combinations showed strong relationships with AGB.The highest R 2 values were achieved using RVI band combination B6 (or B7, B8A, B8 or B9)/B11 (R 2 = 0.24) [5].
Sentinel-2-based predictive models of V had lower accuracy than those reported for Mediterranean forest ecosystems [4].In our study the coefficient of determination of the best Sentinel-2-based model amounted to R 2 S2.LM = 0.24, while authors studying Mediterranean forests achieved R 2 = 0.63 (RF method).In the above-mentioned study, bands B11 and B6 were found to be most effective in estimating V with R 2 = 0.46 and 0.38, respectively.In our study, in the case of two RF models, the SWIR bands (B11 and B12) were also found to be useful for V estimation.In contrast to previous studies [4] we did not use the B6 band, because it was correlated with the B8A band.Surprisingly, we found the B2 band to be a useful predictor of V in the case of the IPC.S2 variable set.This finding is in contrast with previous results reported in the literature [4,36].There are also known results of using Sentinel-2 imagery for estimation of V in broadleaf forests in Italy.It was found that using k-NN imputation algorithm based on an RF distance matrix the V can be estimated with a relative root mean squared difference ranging from 11.36% to 26.05% for two different study areas [6].These results are difficult to compare with those obtained in [4] and in this study since different modeling methods were used.There are also known results where different S2 bands were found to be most robust for two different study areas: B8, B11, B12 and B5, B6, B8 [6].In our study different sets of best S2 bands were also identified depending on the regression method (LM, RF).This suggests that the robustness of S2 spectral bands in the estimation of V should always be analyzed for each particular case.
Another study evaluating the suitability of Sentinel-2 imagery for growing stock volume estimation was performed in Norway spruce (Picea abies (L.) H. Karst.)dominated stands in Germany [37].In that study the high-resolution airborne hyperspectral HyMap dataset was used to simulate the Sentinel-2 satellite data.Using the K-nearest neighbor algorithm the authors created a V predictive model with an RMSE% = 21.58% at stand level.
Given that our study was performed on almost single species stands, the general conclusions about the relatively low usability of Sentinel-2 satellite imageries for growing stock volume (V) estimation of Scots pine stands are formulated with considerable caution.It is possible that in stands composed of more diverse tree species, the incorporation of Sentinel-2 reflectance values into IPC-based predictive models might lead to higher accuracy than was found in the present study.Our study might be limited because we did not report the predictive power of common Vegetation Indices (VI) for V estimation.However, there is evidence from previous studies [4,38] that VI may be less correlated to forest stand parameters compared to the original spectral bands.For that reason, and because we wanted to use a more straightforward approach, we decided not to analyze the usability of VI for growing stock volume estimation in this study.
Our findings suggest that multiple linear regression is a more powerful regression method than random forest for V estimation in Scots pine stands.However, it should be kept in mind that these findings apply only to the described study area and a specific set of predictor variables.Despite this, we believe that examining the performance of traditional statistical methods like linear regression before deciding to use more advanced methods such as random forest is important and worthwhile.We are aware that in more complex forest stands with a nonlinear relationship between growing stock volume and predictor variables, the RF method may surpass the LM approach.

Figure 1 .
Figure 1.Location of the study area with 94 field plots established using stratified random sampling.

Figure 1 .
Figure 1.Location of the study area with 94 field plots established using stratified random sampling.

Figure 2 .
Figure 2. Obtained measurements of H, G and V at plot level (minimum, first quartile, median, third quartile, and maximum; mean value is marked as the red point).

Figure 2 .
Figure 2. Obtained measurements of H, G and V at plot level (minimum, first quartile, median, third quartile, and maximum; mean value is marked as the red point).
observed growing stock volume for the ith sample plot in a hold-out set, ŷi = predicted growing stock volume for the ith sample plot in the hold-out set, y = mean of n observed values in the hold-out set, n = number of observations in the hold-out set.

Figure 3 .
Figure 3. Relationships between growing stock volume (V) at plot level and Sentinel-2 surface reflectance dimensionless (dl) values used as predictor variables (band center wavelength in nm is given in brackets).Adj.R 2 is the Adjusted R-squared.

Figure 3 .
Figure 3. Relationships between growing stock volume (V) at plot level and Sentinel-2 surface reflectance dimensionless (dl) values used as predictor variables (band center wavelength in nm is given in brackets).Adj.R 2 is the Adjusted R-squared.

Figure 4 .
Figure 4. Observed versus predicted growing stock volume for different predictor variable sets and regression methods.Predicted values are averaged from 500 bootstrap runs.

Figure 4 .
Figure 4. Observed versus predicted growing stock volume for different predictor variable sets and regression methods.Predicted values are averaged from 500 bootstrap runs.

Figure 5 .
Figure 5. Accuracies of the growing stock volume models created using RF and LM in terms of R 2 , RMSE% and Bias%.Violin-and box-plots display distribution of the results from the 500 bootstrap runs.

Figure 5 .
Figure 5. Accuracies of the growing stock volume models created using RF and LM in terms of R 2 , RMSE% and Bias%.Violin-and box-plots display distribution of the results from the 500 bootstrap runs.

Figure 6 .
Figure 6.Relative variable importance obtained for LM and RF regression methods within each predictor set.

Figure 6 .
Figure 6.Relative variable importance obtained for LM and RF regression methods within each predictor set.

Table 1 .
Parameters used for processing of airborne images in Agisoft PhotoScan Professional software.

Table 1 .
Parameters used for processing of airborne images in Agisoft PhotoScan Professional software.

Table 2 .
Predictor variables obtained from image-derived point clouds (IPC).

Table 3 .
Spectral bands and resolutions of Sentinel-2 MSI sensor.

Table 4 .
The Spearman correlation coefficient (r S ) values between V and evaluated IPC metrics (* correlation is significant at the 0.05 level).

Table 5 .
The Spearman correlation coefficient (rS) values between V and evaluated Sentinel-2 spectral bands (* correlation is significant at the 0.05 level).

Table 5 .
The Spearman correlation coefficient (r S ) values between V and evaluated Sentinel-2 spectral bands (* correlation is significant at the 0.05 level).

Table 6 .
Results of removing highly correlated predictors within variable sets.The number of evaluated models in each predictor set and final mtry values used in RF are also given (* predictors selected in the final LM models).

Table 6 .
Results of removing highly correlated predictors within variable sets.The number of evaluated models in each predictor set and final mtry values used in RF are also given (* predictors selected in the final LM models).

Table 7 .
Averaged R 2 , RMSE, RMSE%, bias and bias% values for the analyzed models and regression method.

Table 7 .
Averaged R 2 , RMSE, RMSE%, bias and bias% values for the analyzed models and regression method.

Table 8 .
Results of inferential assessment of model performance differences in terms of R 2 .The pair-wise differences between the models R 2 from 500 bootstrap runs were tested with Paired Student's t-test (α = 0.05).The upper diagonal shows the estimates of the difference.The lower diagonal shows the p-value for H0: difference = 0.