Sentinel-1 Time Series for Predicting Growing Stock Volume of Boreal Forest: Multitemporal Analysis and Feature Selection

: Copernicus Sentinel-1 images are widely used for forest mapping and predicting forest growing stock volume (GSV) due to their accessibility. However, certain important aspects related to the use of Sentinel-1 time series have not been thoroughly explored in the literature. These include the impact of image time series length on prediction accuracy, the optimal feature selection approaches, and the best prediction methods. In this study, we conduct an in-depth exploration of the potential of long time series of Sentinel-1 SAR data to predict forest GSV and evaluate the temporal dynamics of the predictions using extensive reference data. Our boreal coniferous forests study site is located near the Hyytiälä forest station in central Finland and covers an area of 2500 km 2 with nearly 17,000 stands. We considered several prediction approaches and ﬁne-tuned them to predict GSV in various evaluation scenarios. Our analyses used 96 Sentinel-1 images acquired over three years. Different approaches for aggregating SAR images and choosing feature (predictor) variables were evaluated. Our results demonstrate a considerable decrease in the root mean squared errors (RMSEs) of GSV predictions as the number of images increases. While prediction accuracy using individual Sentinel-1 images varied from 85 to 91 m 3 /ha RMSE, prediction accuracy with combined images decreased to 75.6 m 3 /ha. Feature extraction and dimension reduction techniques facilitated the achievement of near-optimal prediction accuracy using only 8–10 images. Examined methods included radiometric contrast, mutual information, improved k-Nearest Neighbors, random forests selection, Lasso, and Wrapper approaches. Lasso was the most optimal, with RMSE reaching 77.1 m 3 /ha. Finally, we found that using assemblages of eight consecutive images resulted in the greatest accuracy in predicting GSV when initial acquisitions started between September and January.


Introduction
Space-borne synthetic aperture radar (SAR) is a highly versatile information source for Earth observation that can provide terrain-related data with fine resolution regardless of weather conditions.This makes it particularly useful in areas with persistent cloud cover or near polar regions.SAR images have been extensively used in forest remote sensing, particularly for wide-area mapping [1][2][3], where models of relationships between forest structural parameters and measured backscatter signatures are essential [4].SAR signals have been demonstrated to be highly sensitive to various forest structure variables, including above-ground tree biomass (AGB) and growing stock volume (GSV) (m 3 /ha) [5].Depending on the wavelengths of the sensor, the relationship between SAR backscattering data and GSV can quickly become saturated.In addition, environmental and ionospheric conditions can affect this relationship.When signal saturation is reached, the radiometric data become less useful for predicting volume or biomass, especially when only one image is used.This means that data from commonly used space-borne SAR systems operating in the C-band and X-band are often considered suboptimal for wide-area forest inventory [5].
Cross-polarised (cross-pol) backscatter demonstrates greater sensitivity to forest biomass than co-polarised (co-pol) backscatter, although multiple polarizations are recommended for use in forest biomass mapping algorithms.C-band SAR is mostly useful for forests with very small biomass levels (30-50 t/ha) when a single SAR intensity image is used.The shorter wavelengths have larger extinction, and limited penetration within the forest layer compared to the L-and P-bands, although multi-temporal and texture analysis of fine resolution C-band data may provide useful inputs [6][7][8][9][10].On the other hand, C-band radar data acquired by multiple space-borne sensors, historically including ERS-1, ERS-2, and Envisat ASAR, presently RADARSAT-2 and ESA Sentinel-1, and only recently RADARSAT Constellation, represent attractive sources of information for timely assessment of forest cover due to their global coverage (in the cases of ASAR and Sentinel-1) and frequent revisits.This is particularly important for Sentinel-1 satellites [11] with data offered free of charge to users.
To date, methodological studies using long time series of C-band SAR intensity data for GSV mapping over boreal forests have been relatively rare [10, [12][13][14] compared to studies on multitemporal approaches for GSV estimation using L-band SAR imagery in boreal forests, particularly with JERS [12,15,16] and ALOS PALSAR [17][18][19].The common understanding is that C-band data are suboptimal compared to L-and P-band data [5]; however, due to the availability of C-band data (Envisat ASAR in Global Monitoring mode), wide-area maps of GSV have been produced in boreal forests [13].Parametric semiempirical and physics-based models have been used with historical C-band SAR data, such as ERS-1 and Envisat ASAR, to predict forest stem volume and GSV [13,14,20].Detailed historical studies of the temporal dynamics of C-band SAR backscatter in boreal forests are scarce, whereas in notable examples [14,20] the SAR data were represented by ERS-1 single-pol VV imagery at a 23 • nominal incidence angle.In [20], the auxiliary parameters of a semi-empirical model were estimated separately from each processed SAR image, then a vector of backscatter measurements for each areal unit was iteratively inverted to predict the forest stem volume.Correlation analysis with reference plot-level stem volume was used to select suitable images.In [14], the parameters of a semi-empirical model and the stem volume were estimated simultaneously for each SAR image.Separate stem volume estimates were later combined using a linear regression model calibrated using the training data set.In addition, use of multiple linear regression (MLR) for optimally combining predictions from individual SAR images has been reported in [12].A popular approach in forest biomass mapping is to initially predict the GSV using the Water Cloud Model (WCM) [21] for individual SAR images and then combining these predictions via optimal weighting of the predictions [13].Weights can be estimated using image-wise accuracy statistics in the presence of reference data or with auxiliary datasets [13,22].
Recently, nonparametric approaches such as random forests (RF) and support vector regression (SVR) have been tested to predict GSV using features calculated from multiple C-band SAR images, often in combination with L-band and optical images [23].Using features calculated from several SAR images as independent variables in GSV prediction has rarely been studied [10,24], and has typically included both SAR intensity and textural image features.Use of feature selection approaches has often been limited to RF-based rankings and simple correlation to forest biomass [24][25][26][27][28].In [29,30], Wrapper methods such as stepwise regression analysis, recursive feature elimination, and Boruta were applied to select feature variables from multi-source satellite data in forest height mapping.In [31], the performance of importance-based selection methods, including RF and XGBoost, was analyzed in the context of forest biomass estimation.A binary genetic algorithm was tested for selecting the features for forest GSV prediction to reduce the computational complexity of a SVR kernel [32].
The availability of freely available very long time series data from Sentinel-1, the presence of two polarizations (co-and cross-polarizations), and access to better reference data motivates more in-depth exploration of seasonal dynamics and their influence on GSV predictions compared to earlier studies with ERS-1 and ASAR.Studying various ways of combining C-band images via feature selection to predict GSV, as well as studying the temporal dynamics of GSV predicted using selected features (including analysis of long sets of consecutive SAR images) has now become possible.
Several recent studies using multiple Sentinel-1 acquisitions in the context of forest monitoring or classification [33][34][35][36] have concentrated on analysis of the temporal patterns of Sentinel-1 data.However, these did not elaborate on the influence of the length of the time series on classification accuracy, perform explicit feature selection trials, or study ways of combining consecutive observations for predicting continuous forest variables.
In the context of forest biomass prediction, research has been conducted on the fusion of Sentinel-1 data with SAR data from instruments operating at different wavelength, as well as with optical data.In particular, the combination of Sentinel-1, ALOS-2, and Sentinel-2 data has produced greater accuracy, although the role of Sentinel-1 data in this case was relatively limited [23].Another study compared single-date and multi-temporal Sentinel-1 data and demonstrated that multi-temporal Sentinel-1 substantially improved AGB estimation, though with a much smaller adjusted R 2 because of the limitations of the short wavelengths [37].When using WCM-based approaches for AGB prediction from ASAR hypertemporal data, a relative root mean square error (rRMSE) in the range of 34.2-48.1% was achieved for 1 km pixel predictions [13].Use of multiple stand-level features was instrumental in achieving an rRMSE of 30% for a Finnish test site [10].There, stand-level backscatter intensity and other stand-level features such as the standard deviation of the intensities and the averages of the ratios of the intensities of the images from different dates were used as predictor variables, resulting in good GSV prediction accuracies.Several sets of RADARSAT-2 images were studied using machine learning approaches in combination with other SAR and optical data using various features and machine learning approaches, achieving rRMSE values of 42-47% [24].
A general conclusion that can be reached is that the use of a single Sentinel-1 image does not lead to accurate forest biomass mapping.Therefore, the present research concentrates on the fusion of Sentinel-1 and L-band SAR or optical satellite data, while investigating the potential of multitemporal Sentinel-1 datasets.The performance of multitemporal Sentinel-1 approaches is not yet satisfactory on either the forest plot or stand level.Most critically, in-depth analysis of the potential of very long time series Sentinel-1 data, their seasonal fluctuations (dynamics), and the effects of these fluctuation on the accuracy of GSV prediction is lacking, although the data for multiple years are readily available.Furthermore, a statistically sound method for optimising image and season selections for purpose of increasing prediction accuracy when using dual-pol Sentinel-1 data has not been demonstrated over boreal forests.Reported studies of feature selection are relatively scarce, as well as disparate and non-exhaustive.This motivates us to address these knowledge gaps.In the present study, we use representative reference forest stand-level data to study various GSV prediction models and feature selection approaches, including several that have not been previously studied with SAR data.
Here, we should acknowledge the long history of the operational use of optical area space-borne data, as well as recent studies for forest and agricultural applications, e.g., [38][39][40][41][42][43] and the potential of upcoming sensors [44].However, it needs to be taken into account that optical area data are applicable in the boreal zone only during the growing season.Furthermore, the weather conditions often prevent data acquisition during that season.One purpose of the current article is to study and develop methods of utilizing remote sensing data that are applicable independently of weather and lightning conditions and to take advantage of the seasonal variation of the backscattering mechanism of SAR in forestry applications.

Goals and Contributions of This Study
In this study, we expand on a lineage of forest biomass mapping using multitemporal radar imagery by evaluating the use of large numbers of Sentinel-1 images for a lengthy time period for Finnish boreal forests.We focus on one key variable, GSV (m 3 /ha), defined here as the above-ground volume of living stems above stump to the stem top over a specified area.Included are the stem volumes of living trees standing or lying with a diameter at breast height (dbh) of more than 10 cm and height of more than 1.30 m, at which height dbh is measured.GSV is strongly correlated with other forest variables, such as tree basal area, AGB, and total tree carbon content.The latter is approximately (0.3-0.4 t/m 3 ) × GSV in boreal forests depending on the tree species [45,46].Thus, our results can be compared to the results from other relevant studies that use SAR to predict tree biomass for boreal forests.
The primary and overarching objective of this study was to develop a method for using Sentinel-1 imagery to maximize GSV prediction accuracy.Three supporting and subordinate objectives were addressed: (1) to analyse the effect of the acquisition date, weather conditions, and number of Sentinel-1 images on the GSV prediction accuracy; (2) to test and develop feature selection and transformation methods for backscattering coefficients and multitemporal features such as principal component analysis (PCA), radiometric contrast (RC), mutual information (MI), improved k-Nearest Neighbors (ik-NN), random forests selection (RF-Sel), Lasso and Wrapper approaches, and to evaluate the effects of the features and transformations on the prediction accuracies; and (3) to test a few commonly used optional forest variable prediction methods and their performance in GSV prediction.
These objectives were addressed using extensive analyses of a 3.5-year stack of Sentinel-1 images acquired over a large test site representative of boreal forestland.Both established and advanced statistical and machine learning approaches were used for feature selection and forest biomass prediction, in particular MLR, RF, and SVR.With management inventory needs in mind, we focused on stand-level GSV predictions and their errors, not necessarily on large area level estimates.The criterion was the mean standard error at the stand level when using separate validation data.Note that estimators with large standard errors at the stand level may produce consistent large area estimators, while precise stand-level estimators may be biased when used as large area estimators.

Study Site
The 50 km × 50 km study area in Southern Finland was chosen around the Hyytiälä Forestry Field station of the University of Helsinki, with the following centre coordinates: 61.8 • N, 24.3 • E, as shown in Figure 1.The study site includes forest, agricultural land, lakes, and several population centres.The forests are dominated by Norway spruce (Picea abies (L) H. Karst.),Scots pine (Pinus sylvestris L.), and birch (Betula pendula Ehrh.and Betula pubescens Roth).The soils are mainly glacial drift, with sandy and clayey soils common as well, especially in agricultural areas.Mires and open bogs are frequent.The area is relatively hilly, with terrain variation ranging between 95 and 230 m above sea level.

Sentinel-1 Data
Ninety-six Sentinel-1 images acquired between 9 October 2014 and 21 May 2018 were used in the study as Level-1 (Ground-Range Detected High-resolution, GRDH) products with VV and VH polarizations.The images were acquired in Interferometric Wide-swath (IW) mode on descending orbits.
The Sentinel-1 images were multilooked with factor 2 × 2 (range × azimuth) to obtain images with pixel dimensions corresponding to the 20 m grid spacing.The bilinear interpolation method was used for resampling in connection with orthorectification.A digital elevation model (DEM) from the National Land Survey of Finland was used and averaged to 20 m × 20 m pixels.Radiometric normalization with respect to the projected area of the scattering element was used to eliminate the topography-induced radiometric variation.In this way, a time series of co-registered images in the terrain-corrected "gamma-naught" coefficient was constructed [47] with a pixel size of 20 m × 20 m.A 3 × 3 median window was used to filter the speckle noise.

Ground Reference Data
Stand-level forest resource data collected, prepared, and intended for forest management purposes were used for both the training and validation data.Figure 2 presents the stand-level reference and weather conditions.The data for the test site corresponded to the year 2016 and were obtained from the Finnish Forest Centre [48,49].These forest resource data consist of predictions from statistical models using airborne laser scanning (ALS) data and data from field measurements.A single remote sensing site has a total area of about 300,000 hectares.From one inventory site, 700-800 circular sample plots or 150-200 new types of larger tree map sample plots are measured.Different plot types can be combined [48].Field checks on each stand were used to correct noticeable deviations in predictions, e.g., in tree species level predictions, following the practical procedure recommended by the Forest Centre.It is important to note that despite this procedure stand-level predictions always include some errors.The estimated average relative stand level root mean square error (RMSE) of the GSV predictions is around 9-11% [49], with the error depending on the actual GSV.Earlier, new data were produced on every tenth year, and are now measured on every fifth year.The data are updated annually based on growth models and the reported forestry regimes, such as harvests; thus, they are in principle up to date.Because the stand edges may influence prediction, one-pixel morphological erosion was applied to the stand mask, that is, only pixels separated from the closest stand boundary by at least one pixel were included in the analyses.Further, only stands with areas of at least 1.0 ha after the erosion and with a GSV of 2.0 m 3 /ha or more were used.The GSV threshold was used to remove stands with recent regeneration cuts from the lengthy time series datasets.In total, 17,762 stands were included in our approach, of which 8881 were randomly selected for training and 8881 remained for validation (Table 1).

Methodology
Our approach focused on predicting GSV at the stand level using optional regression techniques, including MLR (Section 3.1.1),SVR (Section 3.1.2),and RF regression (Section 3.1.3).
The general approach is shown schematically in Figure 3. First, a subset of images and optional features was selected, then the backscattering was averaged within each stand for each feature, spatially guided by the forest stand masks.Later, using the stand-level averaged features as the pedictors, the response variable GSV was predicted utilizing the different regression approaches.We used RMSE as the criterion for evaluating regression performance.To ensure that the evaluation was not affected by the size of the forest stands, stand size after a morphological erosion was used as a weighting factor to normalize the RMSE, which was calculated as follows: where v i denotes the GSV of stand i, vi is its prediction, s i the area of the stand i, and N is the total number of stands.Three supervised GSV prediction methods are described in Section 3.1.The approaches for selecting the optimal subsets of features are described in Section 3.2.The multi-temporal analysis method is described in Section 3.3.

Multiple Linear Regression
MLR was used as one of the parametric methods for predicting GSV.This is a basic regression approach often used for modeling the relationship between GSV or forest biomass and SAR backscattering.In certain cases, transformations can be applied to the predictor and response variables to further linearize the modeled relationship [15,18,26,27,[50][51][52][53][54][55][56].In this study, SAR image features were used as predictor variables and multiple optional transformations of the response variable GSV were tested, including exponential (e.g. with a power of 0.5) and logarithmic, in addition to a non-transformed GSV.The final model was of the form where v i is the GSV for stand i, f i,j denotes the SAR feature j for stand i, N f is the number of the SAR features, w j with j ∈ 0, . . ., N f are the coefficients to be estimated, a is a small positive number, and i is random error that is assumed to be independently distributed.
In this study, the SAR feature represents the logarithmic scaled stand-level intensity at the VV or VH polarizations, or both.A bias correction was made to GSV predictions due to the use of a logarithmic transformation in model estimation [57].The correction was based on Taylor series expansion, assuming that i is normally distributed and is calculated as half of the variance estimate of i .Note that the correction is additive, is made to the prediction on the log scale before back-transforming, and is half the residual variance on the log scale.
To evaluate the separate contributions to GSV prediction, the values of f n were selected from different scenes (timestamps) and polarizations (VV-only, VH-only, or VV and VH), as was the case for the other regression approaches as well.

Support Vector Regression
Support vector machine (SVM) classification was first proposed by [58] to solve traditional classification problems.It is based on statistical learning theory.Later, [59] extended SVM to regression problems, namely, SVR.This converts a regression problem to a convex quadratic programming problem.The introduction of the -insensitive loss, which ignores the error near true values, makes SVR less sensitive to overfitting.Using a nonlinear kernel function, SVR can be transformed from a nonlinear problem into a linear one in a higher-dimensional feature space [60,61].
The Gaussian Radial Basis Function (RBF) kernel was used in our study.This kernel has previously shown good performance in forest biomass estimation tasks [26,62].As SVR is sensitive to hyperparameters, it is necessary to find the optimal hyperparameters by cross-validation on the training data.

Random Forests
RF is a supervised ensemble learning algorithm that uses a bagging strategy to fit a large collection of classification and regression trees (CART), leading to improved overall modelling prediction ability [63].First proposed in 1995, it has been widely used for forest classification, change detection, and biomass estimation [64][65][66][67] Compared to SVR and linear regression, RF has multiple advantages: (i) it can avoid overfitting by introducing two layers of randomisation, i.e., bootstrapping when sampling and a random feature subset when splitting nodes [63]; (ii) parallel decision trees accelerate computation while making nonlinear modelling possible; (iii) high-dimensional data can benefit from its automated feature selection, which is especially needed in SAR long time series image processing; and (iv) by using out-of-bag (OOB) data in internal examination of the model, RF can substantially decrease the systematic prediction error without any need for more training data [68].
In the training stage, data subsets for each decision tree are randomly sampled from the original data using bootstrapping, then each decision tree is trained independently until each leaf belongs to one specific label.When splitting nodes, for regression tasks impurity criterion such as mean squared error (MSE) or mean absolute error (MAE) are used to select the splitting variable (feature) and splitting point.In the end, the predictions are averaged over trees to provide the final results.
In addition, the impact of each feature on the impurity of the decision trees can reflect the importance of that feature.This principle is used for feature ranking, namely, RF selection (RF-Sel), along with the other methods elaborated in Section 3.2.
After five-fold cross-validation on a small subset of training data, the number of trees within the model was set to 400 and MSE was chosen as the impurity criterion (variance reduction).To minimise the computational intensity and achieve greater accuracy, the maximum depth of trees was varied from 5 to 35 according to the number of input features.The minimum number of sample units required to split an internal node was set to 5.

Feature Transformation and Selection
Because the Sentinel-1 time series has 96 images (timestamps) covering more than three years and each image has two bands (VV/VH), considerable redundancy can be expected in the feature space.To circumvent the effects of this redundancy and to reduce the computational burden, feature extraction or reduction can be useful.Here, we describe several approaches for transforming or selecting an optimal subset of SAR features.We discuss both the optimal subset and insights into the choice of the optimal observation dates.By carefully choosing a few specific dates, we expect to achieve a prediction accuracy close to what can be achieved using the entire original time series.

Principal Component Analysis
Principal Component Analysis (PCA) is a common approach for reducing the dimensionality of the feature space [69].PCA can be used to convert the original features into a set of linearly uncorrelated components using an eigenvalue decomposition.The importance of each component is evaluated by its eigenvalue.The components corresponding to the largest eigenvalues are expected to retain most information about variation in the data.In this way, a subset of principal components can be used as a set of new features, thereby effectively reducing the original data features while preserving the majority of information.However, because PCA is essentially a feature transformation rather than a feature selection approach, it is not suitable for certain tasks, such as determining the optimal season for acquiring SAR imagery.Therefore, PCA analysis is primarily used as a standard for comparing feature selection approaches, as described below.

Feature Selection Using Radiometric Contrast
Radiometric contrast (RC) is used for choosing a set of images with feature variables that have the greatest radiometric variation between forested and non-forested terrain.This is similar to the approaches used for feature selection in earlier studies, and is preferred because of its simplicity and lack of need for reference data [13].The essential idea is that SAR image features with greater radiometric contrast between forested and non-forested areas are better able to discriminate different values of GSV due to a greater dynamic range of forest backscattering measurements.
Forested and non-forested areas are each represented by a specific number of stands that have the least or greatest GSV, respectively.First, we sorted the stands in descending order according to their GSV.Denoting the 10 stands with the greatest GSV as the subset P max and the 10 stands with the least GSV as P min for a specific polarization of a certain image, the radiometric contrast γ is evaluated as the ratio of the mean amplitude by using the formula below: where A i denotes the stand-averaged feature/backscattering amplitude of stand i ∈ P max .Further, all the images were ranked according to the calculated radiometric contrast.The selected images and their relative performance are analysed further in Section 4.2.

Feature Selection Using Mutual Information
Mutual information (MI) is a Shannon entropy-based measure of dependence between random variables.It uses the naive idea that more mutual information means more contributions to prediction.Unlike Pearson linear correlation, mutual information can capture nonlinear dependency [70,71].The mutual information is calculated as where f is the SAR feature variable, v is the GSV, p denotes the probability distribution, and φ is always non-negative and equals zero when and only when p(F , V ) = p(F )p(V ).
Accurate estimation of the probability distribution is not trivial, particularly from finite samples.Here, we use a k-Nearest Neighbors (k-NN) based approach to estimate the joint distribution p(F , V ) [72].The number of neighbours k was set to k = 7 for this task [70].

Feature Selection Using Lasso
Least Absolute Shrinkage and Selection Operator (Lasso) is another linear regression method [73].For totals of N s stands and N f features, the Lasso loss function is where f i,j denotes the SAR feature j for stand i, v i denotes the corresponding reference GSV, and w j is the coefficient of feature j.The loss is similar to least squares estimation (LSE) for conventional linear regression; the difference is that in Lasso an L1 regularization λ ∑ N f j=1 w j is applied on the coefficients w j .The regularization makes some of the weak feature weights equal to 0, thereby achieving selection of important features in high dimensional cases.λ is the weight to control the strength of regularization.Lasso has known shortcomings; when there is a set of highly relevant variables, Lasso tends to choose one of them while ignoring others, leading to instability of the regression/feature selection process.

Feature Selection Using ik-NN
The well-known k-NN prediction method is well-suited for cases in which multiple target parameters or variables are estimated simultaneously.In addition, during the estimation process k-NN can obtain the weights of different features, and as such can be an optional method for feature selection [10, [38][39][40].
In this study, we utilize an improved k-nearest neighbors (ik-NN) with a genetic algorithm in the context of feature selection.Consider a stand p and denote its k nearest feasible field stands as p 1 , . .., p k , where the distance is measured in the feature space.The weight β i,p of field stand i with respect to stand p is defined as where p r is a field stand with its index r ∈ {1, . . ., k}.We fixed the value of k to be 15 after tests on the validation subset, for which we used the RMSE and mean deviation as criteria.The distance weighting power t is a real number, usually t ∈ [0, 2]; here, we set t = 1.A small quantity greater than zero is added to d when d = 0.
The distance metric d employed was where f p,j is the j-th SAR feature variable for stand p, N f is the number of SAR feature variables, and w j is the weight of the j-th SAR feature variable.
The values of the elements w j of the weight vector w were selected using a genetic algorithm [38,39].The fitness function to be minimized when calculating the w vector for GSV estimation was where: δr (w) is the standard error of the prediction of the forest variable r at the stand level, êr (w) is the mean deviation of the forest variable r at the stand level, N e the number of the forest variables used in the algorithm (here, one variable), γ δ and γ e are fixed constant vectors.
After trial and error tests as well as heuristic trade-offs between the standard error and mean deviation, the values of γ δ = 10 and γ e = 1 were selected, resulting in RMSE having greater weight.The other parameters of the genetic algorithm itself were the same as in [38].
As an iterative procedure, genetic algorithms are computationally intensive and may produce a local rather than global optimum.To circumvent this possibility, multiple runs are usually needed to find a near-global optimum.

Feature Selection Using Wrapper Methods
Based on ranking mechanisms, feature selection approaches can be classified into three categories: Filter, Embedded, and Wrapper methods.Filter methods select features by statistically measuring their correlation with the target variable.These methods typically assume the independence of the features.Furthermore, the selection process is not directly related to subsequent tasks.Although filter methods are fast, they may not yield the best features.In the current study, RC and MI are examples of filter methods.On the other hand, embedded methods incorporate the feature selection process into model training.For instance, RF has a built-in feature selection mechanism.With a fully trained model, the importance of different features can be extracted in parallel.In our study, RF-Sel, Lasso, and ik-NN are typical representatives.In contrast to filter methods, embedded methods can take into account information redundancy among the features.
In addition to filter and embedded methods, we employed a typical wrapper method called sequential feature selection (SFS) in our study.SFS utilizes a forward search direction to select key features from a large number of features.It follows a greedy search approach, where at each iteration the SFS-based wrapper selects one promising feature from the remaining feature pool based on the performance of its proxy task model.If the performance criteria are met, the feature is incorporated into the locally-optimal subset of features for the next iteration.After a certain number of iterations, a collection of key features is obtained.The flowchart illustrating the method is shown in Figure 4. Compared to filter and embedded methods, wrapper methods are generally considered to be more precise; however, they require substantially more computational resources due to their iterative nature and the need to evaluate the performance of multiple subsets of features.For the choice of the proxy task model, we opted for RF regression in our study.The reason for this choice was its nonlinear modelling capability and relatively modest computation workload compared to other methods.To estimate the performance of the proxy model, we employed five-fold cross validation.In this case, the RMSE of GSV prediction was the evaluation criteria.We additionally tested MLR as the proxy model, with the selection performance being slightly worse.SVR was another option; however, the computational requirements appeared to be too heavy to accomplish within a reasonable time.

Multitemporal Analysis Using "Sliding Window"
The aim of multitemporal analysis of SAR imagery in this study was to assess the prediction accuracy gain achieved on multiple SAR images.These images were used as either sequences of consecutively collected images or as specific images chosen using the feature selection approaches (Section 3.2).For the latter, optimal SAR images could be spread over the three-year observation period.However, using all the data seems impractical, as strong saturation can be observed; indeed, any additional value when using more than 10 images is questionable.The use of such a long time series can be computationally demanding as well.Therefore, we attempted to use RMSE as the criterion in order to determine the best time to acquire SAR images, i.e., the optimal observation season, and to analyze the seasonal characteristics of the SAR image time series.
The size of the sliding window corresponds to the number of images that are aggregated and used to derive the feature variables.Several aggregation approaches are possible in the context of GSV estimation.The first, multitemporal filtering, facilitates reduction of speckle by calculating the combined estimate from several images via averaging [74].The filtered image is further used to calculate image features to predict GSV.Another approach performs image-by-image GSV estimation, then merges individual image-based estimates using a multiple regression approach [12].Individual GSV estimates can be obtained using methods such as inversion of physics-based models [18,75].However, for this study a set of feature variables derived from SAR images was used directly as input to a multiple regression approach, thereby constituting a possible third approach that seems to be the most practical.
An illustration of the sliding window is shown in Figure 5.The choice of the size of sliding window, i.e., the number of images used to select the feature variables for regression, required separate experimentation.On the one hand, it is expected that including more images will improve GSV prediction performance.On the other hand, the marginal value of additional images can be limited as the image quantity increases beyond a certain number, and there is increased computational intensity associated with larger sliding windows.Thus, several window sizes from 1 to 32 were tested for this study.

Results
We started our analysis by assessing the improvement in GSV prediction as the number of Sentinel-1 images increased when using the three GSV prediction approaches.The effect of the image acquisition time on the prediction accuracy and relative performance of the studied prediction methods is reported in Section 4.1.Images were aggregated in the order of their acquisition.Furthermore, the potential of the feature selection and transformation approaches and their effects on the accuracy of GSV prediction are analysed in Section 4.2.The effects of combining a small number of images within the "sliding window" for GSV prediction are reported in detail in Section 4.3.

Single SAR Images and Seasonal Variation of the Predictions
Figure 6 shows the accuracy of the GSV predictions when using feature variables derived from single SAR images acquired from November 2014 to July 2018.The accuracy metrics are shown in the order of image acquisition.RMSE varies within the range 85-94 m 3 /ha (rRMSE 50-55%) for various polarizations.Overall, the dynamics are quite variable, as expected for C-band imagery, which is strongly affected by environmental conditions.For MLR, the greatest accuracies were obtained using feature variables selected for images for the month of June for 2015, 2016, and 2017.For SVR and RF a similar observation can be made, with an optimum in early 2016.For all image and polarization combinations, the combined VV and VH polarizations provide smaller RMSEs than any single polarization alone, and provide VV-pol somewhat smaller than VH-pol.Scatterplots of the different GSV prediction methods for several selected sets of Sentinel-1 image features are shown in Figure 7.

Accumulated SAR Time Series
We further tested the accuracy of the GSV predictions by using feature variables for multiple images (SAR time series) in modelling with MLR, SVR, and RF, and with three polarization options, VH, VV, and VV plus VH.The variations in RMSE for the GSV predictions are illustrated in Figure 8.Here, the images were accumulated in the order of their acquisition.GSV prediction for any given date was performed using all images acquired until the date on x-axis.Altogether, twelve RMSE curves obtained as a function of the number of accumulated images used in GSV prediction are presented in Figure 8.A nearly monotonic decrease in RMSE for the GSV predictions with all the polarization combinations and all three prediction methods can be seen.The RMSE rapidly decreases initially, then levels off shortly after July 2015, by which time more than 15 images have accumulated in the SAR time series.RMSE decreases somewhat faster for VV-pol than for VH-pol, with an RMSE of approximately 80 m 3 /ha for VV and approximately 82 m 3 /ha for VH for July 2015.Combined VH and VV had a smaller RMSE for GSV prediction than either VV or VH alone, less than 79 m 3 /ha (rRMSE 47%) for the same date.
For the majority of our experiments, combinations of image sets, polarizations, and prediction methods SVR produced somewhat smaller RMSEs than MLR or RF, about 1.0 m 3 /ha smaller, whereas there was practically no difference between the RMSEs for MLR and RF except for a short period from 2016 to 2017 for VH-pol.

Feature Selection
Here, we evaluated the potential of Sentinel-1 for GSV prediction using selected sets of images, in contrast to using images in the order of their acquisition dates as in Section 4.1.2.The idea was to achieve the greatest GSV prediction accuracy using a minimum number of images.In further analyses, we evaluated several feature selection approaches.When particular images were selected, their features were used in stand-level GSV prediction.
The results of GSV prediction using different feature selections and transformation approaches to form multitemporal SAR stacks are presented in Figure 9.Both polarizations were used together; thus, the original feature pool consists of 192 features.The PCA results are shown in Figure 9 as a benchmark.In the figures, only the first 50 components are shown, as there is no accuracy improvement beyond this number.
The SVR method produced slightly smaller RMSEs than MLR or RF, similar to the results in Section 4.1.2(Figure 8).RF produced somewhat smaller RMSEs than MLR.In particular, MLR had larger RMSEs when using the initial 20-25 features and the features selected using radiometric contrast and mutual information.
All three studied regression methods demonstrated smaller RMSEs when using Lasso, RF-Sel, and Wrapper compared to RC, MI, or ik-NN.
Interestingly, GSV prediction accuracy with RF started to decrease when more than 20 PCA features were included in the model.A likely reason for this is that the PCA tail components consist largely of noise.
Other approaches, particularly MLR and SVR, appeared to be more robust in this respect, with consistent decreases even when incorporating noise-like PCA features.The achieved "plateau" of RMSE was approximately 75.1 m 3 /ha (rRMSE 44%).Among all examined feature selection methods, Lasso, RF-Sel, and Wrapper had similar levels of performance.The RMSEs for all regression methods reached less than 78 m 3 /ha when only the top 20 features were used.With only 20 features, we were able to obtain a prediction accuracy similar to the PCA upper bound benchmark.The performance of ik-NN was slightly worse, achieving 79 m 3 /ha RMSE for GSV prediction.The accuracy measure of MI exhibited an inconsistent "stepped" trend of decrease, suggesting that this feature selection approach is not ideal, with slow convergence to smaller RMSE values.Radiometric contrast appeared to be the least effective in selecting features, with RMSE decreasing at the slowest pace and not reaching the best possible values within the 50 top features.
This indicates that a blind feature selection approach with no reference data, such as RC, is generally suboptimal.However, in the absence of reference data it was the only approach with RMSEs that were better than the prediction results in Figure 8, where no feature selection was done at all.
To facilitate better comparison of the Lasso, RF-Sel, and Wrapper approaches, we show the respective RMSE dependencies in Figure 10.It is evident from the plot that wrapper has a smaller RMSE, particularly when fewer than 10 features are selected.This suggests that the SFS mechanism helps to improve the selection performance of the wrapper method even though it uses the same proxy model as RF-Sel.However, as the number of selected features increases, the RMSE curves of all studied selection approaches tend to converge.This implies two points: (1) wrapper can be considered the best feature selection in the context of our study, particularly when a small number of features is required; (2) when more than the top 10 features are allowed in a prediction, the superiority of wrapper is less apparent.Additionally, considering that the heavy computational workload associated with wrapper grows exponentially with the number of features selected, we recommend using Lasso instead in such cases.Lasso is significantly faster and demonstrates good feature selection performance.In Figure 7, we present the scatter plots using the top 10 Lasso features, which provides GSV prediction performance similar to the scenario in which all features are used.This further supports our conclusion on the efficiency of the Lasso approach for SAR feature selection.
Upon analyzing the feature selection performance using various polarizations, we found that the effect of polarizations was relatively small, especially for Lasso, RF-Sel, and Wrapper.These algorithms consistently selected appropriate features for GSV prediction regardless of whether VV-pol or VH-pol was utilized.Higher accuracy was achieved by incorporating both VH and VV polarizations.MI had the strongest sensitivity in feature selection with respect to different polarizations.Feature selected using VV-pol was much better than with VH-pol in terms of overall prediction performance.This resulted in the deteriorated selection performance of VH and VV in combination.This phenomenon can be clearly seen in Figure 11a.Regarding RC-based feature selection, it was difficult to achieve competitive performance regardless of the polarization used.In summary, the wrapper method demonstrates the most accurate feature selection, consistently yielding the best GSV accuracy for all regression models.Lasso proved a viable option when computational resources are limited.Features selected using VH polarization are slightly better than those selected using VV polarization.Lastly, SVR is confirmed to be the most accurate regression method in our study.
Table 2 shows the feature selection results for the three polarization combinations.For both VV and VH, the key features belonged mostly to images acquired under frozen conditions.However, several "dry summer" images acquired in May and June were included by the most accurate Lasso and ik-NN approaches.We performed additional experiments using Monte Carlo testing of random combinations of 10 images representing frozen ground conditions (VV and VH features combined) in GSV prediction using the SVR approach.After fifty trials, the RMSE was 79.3 ± 1.1 m 3 /ha and never exceeded the accuracy of the top 10 Lasso approach of 77.1 m 3 /ha when all images were used (Table 2).The corresponding accuracy for 10 ik-NN selected features using all possible images reached 78.5 m 3 /ha, exceeding the absolute majority of trials with ten exclusively frozen condition images.Similar dynamics were observed for other regression approaches, indicating that the idea of using dry summer images in addition to frozen condition images should be examined in further studies.Cross-pol performed slightly more accurately than co-pol.Several images, such as those acquired on March 2nd 2015 and January 8th 2016, had both polarizations among the selected features, indicating that near-optimal GSV prediction can be achieved while using only a few carefully selected images instead of the whole SAR image stack.This observation is in line with PCA feature selection, as discussed above.

Sliding Window Analysis
Our sliding window analyses started by selecting a suitable window size for further tests (Section 3.3).For this purpose, we used SVR as the prediction method and tested several window sizes from one (corresponding to only one image) to as many as 15 images.
The results of GSV prediction using sliding windows of different sizes are shown in Figure 12.The date of the first image within a sliding window is shown on the x-axis.The RMSE varied from 85.3 m 3 /ha to 93.5 m 3 /ha, with an average RMSE of ∼90 m 3 /ha for a sliding window size N w equal to one, that is, when only one image at a time was used to predict GSV.The greatest accuracy was achieved using the image acquired on 8 January 2016.No pronounced RMSE seasonal dynamics can be observed.Furthermore, as N w grew larger, the RMSE gradually decreased.The largest RMSE with N w = 4 was about 92 m 3 /ha.The largest RMSE was smaller than 88 m 3 /ha with N w = 15.The seasonal patterns indicated by the RMSE curves first become apparent and then more pronounced when expanding the sliding window size.
For example, in Figure 12b the seasonal pattern is not very clear, whereas in Figure 12c annual variations can be readily observed; the local minima of the RMSE curves are clearly visible in every winter, while the local maxima can be found every summer.When the window size is very large, approaching six months, the second peaks are almost eliminated and the three main annual peaks become clearly visible (Figure 12d).
Note that the x-axis tick marks denote acquisition dates for the first Sentinel-1 image within each sliding window, whereas all images within the window contribute to GSV prediction.The temporal resolution decreases when the window size increases, with the RMSE temporal variation becoming smooth.The width of the extreme is particularly affected.Consider the situation around January 2016 as an example.When the window size is small, i.e., 1 to 4, there is a strong temporal fluctuation in RMSE, and the "valleys" (e.g., 8 January 2016) are sharp, as shown in Figure 12a.When the window size increases to half a year in Figure 12d, the fluctuation becomes smoother, as do the major "peaks".The reason for this is that the image acquired on 8 January 2016 contributed strongly to GSV prediction with all sliding windows that include this image, exhibiting large prediction accuracies.This complicated the selection of an optimal season for GSV prediction.Thus, a balance between preserving seasonal patterns without sacrificing temporal resolution needs to be made.Based on our trade-off analysis, a window size of eight is most suitable.Using this window size, we proceeded to analyse GSV prediction using different polarizations and estimation methods.
The actual date of the RMSE minimum can be somewhat delayed compared to the date shown on x-axis due to larger window size.A window size of eight corresponds to approximately three months in the Sentinel-1 time series.The RMSE minimum around December means that the optimal timing for SAR images acquisition is in the winter around January and February.The peaks around May indicate that the acquisition times in the summer around June and July are not optimal for GSV prediction in boreal forests when using C-band SAR time series imagery.

Different Regressions over an Eight-Image Sliding Window
In Figure 12, a window size of eight is suggested as the best sliding window size.
To study the influence of different regression methods on the sliding window size, we used MLR and RF in addition to SVR in this subsection.As shown in Figure 13, the seasonal pattern is obvious with all regression methods.Seasonal RMSE extrema occurred every summer and winter.The overall RMSE levels and dynamic range were nearly the same for all regression approaches, with MLR underperforming in the autumn of 2016.The differences among polarizations were more apparent, favoring VV over VH and overall suggesting combined use of both polarization channels.In summary, our sliding window analysis provided a more detailed perspective on the temporal dynamics and facilitated our determination of the optimal timing for image acquisition.

On GSV Prediction Methods
Among the different prediction methods, SVM, log-MLR, and RF showed similar temporal dynamics, with no strong differences in GSV prediction.MLR with no log-transformation was less accurate, and seems to be the least suitable.RF and SVR produced somewhat greater accuracy than log-MLR in the majority of our experiments.MLR with log-transformation of the target variable had somewhat greater variance particularly for larger GSV values; however, it provided the most accurate explanatory model for greater GSV values (Figure 7).No earlier benchmarking studies with C-band data in boreal forests could be found in the literature, while studies with C-band data have been reported tropical forests [26].
When using only one polarization as the feature variable, no large differences in accuracy were found, with VV alone providing slightly more accurate GSV predictions than VH alone in non-frozen conditions and VH being more optimal in frozen conditions.When limiting analysis to stands with only smaller values of GSV, VH polarization starts to produce consistently more accurate results.In the small-biomass scenario VH is more sensitive to biomass, as cross-pol mostly contains volumetric scattering in forest environments.This discrepancy can be explained by somewhat smaller extinction at VV, contributing to its greater sensitivity to large-biomass forest when the ground is not visible.Under frozen conditions, when the ground is more visible, VV becomes suboptimal compared to VH even when all forest stands are considered, as VV-pol contains both scattering from the ground and volumetric scattering.For both scenarios of GSV values, the use of both polarizations in combination provided the most accurate predictions.The observation that VH is more suitable is supported by earlier studies; however, no analysis of different biomass classes was reported in those studies.

The Effects of Seasonality on GSV Prediction
The observation that C-band backscatter images acquired in frozen conditions are more suitable for forest prediction in boreal forests has been reported in earlier studies with ERS by [14,20] and ASAR [13].Our study confirms this observation using Sentinel-1 images, though the difference was relatively weak when single images were used.Moreover, our feature selection experiments (when the 10 most accurate features were selected using the Lasso and ik-NN approaches) indicate that adding dry summer images in addition to frozen condition images provides more accurate GSV predictions compared to using exclusively frozen condition images.Naturally, combining dry summer SAR images with frozen SAR images provides a more comprehensive dataset that captures seasonal variations and various levels of canopy penetration, facilitating data fusion.In our opinion, important contributing factors include deeper penetration into frozen canopy in winter (sensitivity to somewhat larger GSV values) and better sensitivity to relatively biomass within dry summer images.On the other hand, when only dry summer images are used the SAR signal becomes saturated very quickly, and separation of moderate and large GSV becomes impossible.Autumn and spring images can be most problematic in this context, as rainy conditions reduce radiometric contrast between tall vegetation and sparcely vegetated areas, a finding that is in line with previous studies [13].
Using the sliding window approach suggested in this study for combining features calculated from consecutive Sentinel-1 images as inputs to the regression approach was able to facilitate detection of strong seasonality in GSV predictions compared to using single images.This difference is clearly visible in Figure 12.It is important to note that seasonality is not studied here at the level of backscatter (input variable), but rather at the level of GSV prediction (output variable).Using assemblages of features calculated from consecutive images with varying widths of sliding windows suggests there are optimal timing periods when Sentinel-1 images should be acquired to produce optimal GSV predictions.The use of a longer sliding window permits compensation for the presence of features from suboptimal images and taking advantage of "multitemporal variability" within multivariable regression models, as features calculated from several consecutive images can be used.Our results suggest that even autumn and spring images can be used in principle.This is in contrast with single image retrieval followed by weighting estimates according to accuracy of predictions from individual images [13].In the latter approach, the most accurate strategy is to completely discard autumn and spring images, when the forest biomass to C-band SAR relationship can be most unpredictable.Our results with multivariable regression models indicate that late autumn and early spring datatakes can be included.For the window size of eight, which corresponds to three months of observations, the optimal start can be early as December and as late as February based on three years of observations.
The use of PCA and several feature selection approaches confirms that a subset of features calculated using a subset of Sentinel-1 images can be enough to achieve a near optimal GSV prediction.Interestingly, feature selection using MI and RC led to the least accurate GSV predictions.However, RC is the most feasible approach for image selection, particularly when no reference data are available, and is often used for image removal or weighting [13].Lasso was the most suitable option for selecting the features for GSV prediction, and approached PCA-based performance in terms of how quickly the nearoptimal prediction was achieved.Lasso was closely followed by the ik-NN method.Based on feature ranking with majority voting, the Sentinel-1 image #32 acquired on 8 January 2016 was the most popular.In addition to images acquired in winter, images acquired in the spring and autumn were sometimes selected.Among the most selected features, VH appeared somewhat more often than VV.No earlier benchmarking studies on feature selection with C-band data in boreal forests could be found in the literature.

Comparison with Previous Studies in Terms of GSV Accuracy and Unit Size
Presently, the number of relevant publications on C-band-based prediction of GSV is relatively limited compared to L-band and multisensor studies, primarily because of the relatively small prediction accuracy.The role of C-band data, particularly Sentinel-1, is quite minor in the multi-sensor studies where C-band images are used along with other SAR and optical datasets [23,24], although accurate results have been reported with a combination of sensors.The results are quite unsatisfactory when using C-band data alone.Our results compare favourably with other published research using C-band SAR data in terms of GSV prediction accuracy [4,5,[12][13][14]23,24], and are more representative in terms of reference data, feature selection approaches, and studied regression models, and compared to the majority of other studies use smaller measurement units.Below, we compare our experiments with several representative studies to highlight the relevance of our results.
Use of a parametric semiempirical model in a study by [14] with set of ERS-1 VV-pol images showed that rRMSEs of 25-30% can be obtained in boreal forests with a forest block size larger than 20 ha.Combining ERS-1 images acquired in November and August 1993 was considered optimal, which supports our observation that C-band data acquired in non-winter seasons can be used.The accuracy with smaller of forest stands was much lower.
In a study by [12] with ERS-1 data in boreal forests pursuing GSV prediction using inversion of a parametric physics-based model, limiting analysis to stands larger than 10 ha led to rRMSEs of 34-71%, with the most accurate estimates for mid-December and mid-March.This result is in agreement with our Sentinel-1 images selected using different feature selection approaches.However, incorporating stands with sizes as small as 1 ha increased the rRMSE to more than 120% for separate images.Combining predictions from individual SAR images using multiple regression facilitated a strong reduction in RMSE.However, when ERS-1 images were mixed with JERS (L-band SAR) it was difficult to judge specific contributions, especially because JERS data are much more suitable for GSV prediction.
The BIOMASAR algorithm [13], which has WCM model inversion at its core, was used with hypertemporal C-band ASAR data in ScanSAR mode (inferior to Sentinel-1 in terms of resolution, but exhibiting high radiometric quality due to a large number of looks) for boreal forest GSV prediction, resulting in rRMSEs between 34.2% and 48.1% at a 1 km pixel size.Larger errors were obtained at 100 m (47.7-60.4% over Swedish test sites and 70.9-96.2%over Central Siberian test sites).
A study of boreal forests in the Krasnoyarsk region of Russia [24] tested, among other experiments, time series of strip map RADARSAT-2 data acquired in regular and ultrafine modes for AGB prediction, achieving rRMSEs of 42% and 47% in AGB prediction with a machine learning approach.The mean AGB was approximately 86 t/ha.The time series dynamics were not explored.AGB was predicted using various cross-pol ratios and Haralick features in addition to the average backscatter.
In a recent comprehensive study with several satellite C-band sensors, the rRMSE was approximately 90% when predicting AGB [76].However, the error for branch biomass was smaller at around 80%.In the case of VV polarization, which included ERS-2 images acquired in all seasons, the error was greater than 100%.The prediction performance of C-band data depended strongly on the imaging conditions and varied in a large range.The smallest errors were associated with images acquired under stable frozen conditions in winter, which is similar to our findings.The greatest errors were generally associated with images acquired in periods of snow melt in the spring, which is consistent with our results when using separate SAR images.As ERS-2 has only VV-pol, no separate polarization analysis was possible.It is important to keep in mind that the average and peak values of GSV in Hyytiälä forest are somewhat smaller compared to Remningstorp forest [76].
When comparing our findings to previous studies, as well as to [10], where 30% RMSE was achieved using only twelve Sentinel-1 images, it is important to note that we used only stand-level backscatter as an independent variable.No ratios or textural features were used, as the aim was to analyse the seasonal dynamics and optimal timing of Sentinel-1 data backscatter.It is very likely that RMSEs would further decrease if other stand-level variables (features) were included in the analysis, which is in line with other previous studies ([10,77]).This issue needs extensive additional analysis.Furthermore, we did not pursue analysis of GSV prediction accuracy as a function of unit area size; however, we expect the prediction accuracy to improve as the unit area size becomes larger, similar to [12,14,76].

Conclusions
The utility of very long times series of Sentinel-1 SAR data for boreal forest GSV prediction was tested.We discovered seasonality in GSV predictions from multitemporal C-band time series, suggesting an optimal timing for image collection in early spring and autumn.Several prediction methods led to similar results.The minimum RMSE was 75.6 m 3 /ha (rRMSE 44%) when all 96 images were used.Temporal aggregation of images over one year greatly decreased RMSEs compared to a single image.Feature selection facilitated nearly optimal results when using only 10 images (45% relative RMSE) instead of the entire SAR stack, resulting in reduced computational complexity.Among the feature selection approaches, Lasso produced the greatest prediction accuracy, followed by ik-NN and RF.When an eight-image sliding window time series was used instead of feature selection, a relative RMSE of 47% was be achieved when the timing of image acquisition was optimal.
A comparison to prior works indicates that no one has yet studied the temporal dynamics of GSV prediction in boreal forests (or elsewhere) in the fashion presented here, i.e., using assemblages of consecutive Sentinel-1 images (a "sliding window" approach).Use of a sliding window resulted in much stronger annual periodicity in GSV prediction compared to single images at the 1 ha stand level.Feature selection approaches have been used very sporadically, and the methods used here (correlation, radiometric contrast) were suboptimal, as indicated by our experiments.The possibility of combining winter and dry summer images instead of using only winter images needs further investigation, as our feature selection experiments support the idea of combining corresponding features within a multivariate model where temporal variability is inherently captured.

Figure 1 .
Figure 1.Location of the study site in Southern Finland (left) along with representative Sentinel-1 image as RGB composite (Red: VV, Green: VH, Blue: VV/VH; Date: 9 October 2014).

Figure 2 .
Figure 2. Forest reference data and weather conditions: (a) masks of the forest stands; (b) locations of training and testing stands; (c) three-day accumulated precipitation amount; (d) minimum temperature; and (e) snow depth.Sentinel-1 acquisition times are highlighted with red dot marks, while blue dots indicate the weather history throughout the entire observation period.

Figure 3 .
Figure 3. General processing flowchart for growing stock prediction using selected Sentinel-1 images.

Figure 4 .
Figure 4. Flowchart of the wrapper method.

Figure 5 .
Figure 5. Illustration of the sliding window approach.

Figure 7 .
Figure 7. Scatterplots illustrating prediction performance for various methods and features.Rows correspond to the MLR, log-MLR, SVR, and RF methods.The first column shows predictions using only one image (acquired 8 January 2016), the second column shows predictions using eight consecutive scenes (21 November 2015-25 February 2016), the third column shows predictions using the 10 best Lasso features, and the last column shows predictions using all 192 features.

Figure 8 .
Figure 8. Accuracy of predicted GSV with image features in the order of image acquisition.

Figure 9 .
Figure 9. RMSE dependence for various feature sets.All 192 features (VH+VV) are ranked, with the top 50 shown in the figure.Features are ranked using RC, MI, ik-NN, RF-Sel, Lasso, and PCA, respectively.All four regression methods are used in the evaluation.

Figure 10 .
Figure 10.RMSE dependence on the number of input features at various polarizations with SVR as the regression method.

Figure 11 .
Figure 11.RMSE dependence on the number of input features at various polarizations with MLR/RF/SVR as regression methods: (a) Mutual Information feature selection and (b) Lasso feature selection.

Figure 12 .
Figure 12.RMSE of GSV prediction as a function of image acquisition date for various sliding windows when using the SVR approach.The window sizes are 1, 4, 8, and 16 (shown in (a-d), respectively).Solid curves show the RMSE for experiments with all stands, while dashed curves show the RMSE results when using stands with biomass of less than 150 m 3 /ha.

Figure 13 .
Figure 13.RMSEs of various prediction methods using an eight-image sliding window.