1. Introduction
The advent of national forest inventories (NFIs) has led to a comprehensive shift in forest growth measurements and analytical practices, thus supporting the implementation, monitoring, and promotion of sustainable forest management practices globally. Advanced inventory systems, such as NFIs, have been developed in many countries across Europe and beyond, evolving over decades and sometimes for more than a century [
1,
2]. Although an NFI is described in the Food and Agriculture Organization of the United Nations (FAO)’s Voluntary Guidelines on National Forest Monitoring (VGNFM) as a technical process for compiling data and analyzing forest resources at the national level [
3], its content, concept, and definition are continually refined to align with users’ needs [
2]. A country’s inventory capacity development is customized to its unique needs, covering activities such as planning, data collection, analysis, integration with remotely sensed data, quality assurance and control, data archiving, documentation, dissemination, and reporting [
4]. Initially, NFIs focused on assessing timber resources and supporting sustainable forestry. They have been expanded over the years to encompass forest health monitoring, management practices, carbon sequestration, biodiversity, ecosystem services, and other variables, alongside greater diversity in sampling protocols and a more comprehensive, holistic approach [
5].
Forest biophysical variables, such as diameter at breast height (dbh) and tree height (h), used to be the primary variables measured directly in the field. However, there has been a growing interest in a wider range of variables to provide comprehensive insights into forest ecosystems. Metrics such as growing stock volume (GSV, m
3/ha), aboveground biomass (AGB), and belowground biomass are classified as predicted variables. Among these, GSV and AGB are particularly essential in many NFIs, especially when focusing on forest productivity, policy development, carbon storage, sustainability, and enhanced decision-making processes [
6,
7]. To monitor forest sustainability, track management practices, and evaluate forests’ contribution to the carbon cycle, organizations, such as the FAO, United Nations Framework Convention on Climate Change (UNFCCC), and the Kyoto Protocol, rely on estimates of forest conditions and changes [
1]. NFIs are instrumental in providing these essential estimates, which are critical for effective forest management planning. Having accurate and current data is crucial [
8]; therefore, member countries are required to report the status of their forests every five years for transparency and accountability [
9,
10].
The availability of remotely sensed auxiliary data has helped in the development of techniques that can be used to predict forest variables and enhance precision [
10]. Remotely sensed data, such as optical imagery (e.g., Landsat, Sentinel-2 MSI), lidar data, radar data (e.g., SAR), and hyperspectral imagery, are widely used for predicting variables including GSV, forest structure, biomass, carbon stocks, tree species composition, forest health, and canopy cover [
11,
12,
13]. For example, Lee and Lee [
14] predicted forest height using discrete-return lidar data, SRTM, satellite L-band SAR data, and optical data, achieving improved results through the application of small baseline subset (SBAS) algorithms and linear regression. Condés and McRoberts [
15] combined auxiliary site data, such as topographic slope, mean annual temperature, annual precipitation, and the Martonne aridity index, with remotely sensed data such as Landsat imagery, to update GSV estimates for the Spanish National Forest Inventory (SNFI) through hybrid inference. One of their findings was that predictions using field and satellite data could be leveraged to update NFI estimates in years for which field observations were absent but spectral data were available. White et al. [
13] concluded that remotely sensed data enhance NFIs in four main ways: (1) enabling quicker and cheaper methods for estimating forest attributes; (2) improving the accuracy of large-area inventory estimates, often through stratification or weighted estimation; (3) offering inventory estimates with acceptable error and precision for small areas lacking sufficient field data; and (4) producing forest thematic maps that support timber production, procurement, and ecological studies.
Despite the issue of data saturation with optical data [
7,
16,
17], such data have been extensively used for forest GSV estimation due to their diverse spatial, spectral, radiometric, and temporal resolutions, advanced processing technologies, abundant data sources, and extensive coverage [
18,
19,
20,
21,
22,
23]. Landsat has been a longstanding auxiliary data source in support of forest estimation, but the emergence of Sentinel-2 data has introduced competition and provided a compelling alternative. Zhou and Feng [
24] evaluated the accuracy of GSV estimates based on Sentinel-2 and Landsat 8 OLI as auxiliary datasets, finding that models based on Sentinel-2 delivered greater precision. In 2020, Clark [
25] conducted a comparison of multi-season Landsat 8, Sentinel-2, and hyperspectral imagery for classifying forest alliances in northern California and highlighted Sentinel-2’s greater overall prediction accuracy than Landsat 8. Mura et al. [
26] assessed the ability of Sentinel-2’s Multi-Spectral Instrument (S2-MSI) to predict GSV in Italy. They tested its performance against data from Landsat 8 OLI and the RapidEye scanner using a consistent experimental protocol and found that Sentinel-2 delivered more accurate outcomes.
Change estimation from remotely sensed data can be approached either indirectly or directly. Indirect estimation involves constructing models of the relationship between the response variable and remotely sensed auxiliary variables separately for two temporal points, with change estimated as the difference between the predictions. Direct estimation involves constructing models of change directly using observations of change in the response variable and the corresponding changes in remotely sensed auxiliary variables over the same two temporal points. Mean change is then estimated from the average change predictions, with the model’s prediction errors incorporated in the uncertainty estimation process [
6,
27]. While achieving temporal alignment between remotely sensed data and field inventory measurements is recognized as optimal for enhancing forest inferences [
27], the utility of auxiliary satellite data remains significant even when temporal gaps exist [
15,
27]. In cases where models are employed to predict forest attributes due to the lack of direct observations for sample plot locations at the time of interest, uncertainties in the predictions must be incorporated to maintain the unbiasedness of variance estimators [
15]. Despite the active use of remotely sensed data and machine learning (ML) methods in forest mapping, the estimation of the effects of prediction uncertainty is relatively uncommon [
28]. Therefore, this study aims to create a model for estimating changes in GSV using NFI data and predicting GSV values for permanent sample plots before the next revisit. It also investigates Sentinel-2 optical data as a cost-effective, independent auxiliary data source for improving forest management and the associated uncertainties.
4. Discussion
Traditional and RS approaches to predicting forest variables are complementary in most cases [
7]. Field methods (e.g., plot sampling, harvest records) provide accurate values for forest variables through direct observation [
46] but are labor-intensive and costly, with limited temporal continuity in large areas [
47,
48]. By contrast, RS provides spatially explicit, wall-to-wall assessments of forest variables [
26]. This study explored the potential of readily available Sentinel-2-derived unitemporal and bitemporal remote sensing metrics, coupled with RF prediction and a MAAVC approach to estimate plot and population-level GSV (m
3/ha) changes in Estonian forests over a 5-year period. The findings highlighted a promising outcome for GSV change assessment and modeling using satellite-based predictors, especially with multitemporal observation when field inventories are not available or difficult to implement. The methodological choice was necessitated by the Sentinel-2 mission’s operational timeline, as systematic image acquisition began only in 2015, precluding direct correspondence with the 2013 NFI cycle. This approach aligns with established practices: Puliti et al. [
27] used 2015 Sentinel-2 imagery as a surrogate for 2014 field data, while Condés and McRoberts demonstrated the feasibility of using imagery acquired approximately 2–4 years before or after NFI cycles.
A more notable result from the Boruta feature selection is the exclusion of NDVIre783, band2, NDVIre740, and band3 in the unitemporal dataset, and NDVIre783, band3, and band5 in the bitemporal dataset, as their importance values did not exceed the maximum importance of the corresponding randomized shadow features (
Figure 3a,b). According to Zhu and Liu [
49], the use of optical satellite data acquired during the peak growing season may lead to spectral saturation, resulting in reduced accuracy in forest variable estimation. Previous studies have shown that inclusion of red-edge bands and NDVIre-based vegetation indices improves the magnitude of the relationship between forest variables and optical remote sensing metrics, particularly Sentinel-2 metrics [
24,
50]. For example, Zhou and Feng [
24] estimated forest stock volume (FSV) using Sentinel-2 and Landsat 8 OLI imagery in combination with forest inventory data, and found models based on Sentinel-2 achieved higher accuracy than those using Landsat 8. They further reported that the red-edge bands of Sentinel-2 showed stronger correlations with FSV and had the potential to reduce model error. In contrast, the exclusion of several red-edge and spectral variables in the present study suggests that these predictors contributed limited or unstable information, particularly when temporal information was incorporated, indicating possible redundancy or saturation effects.
In predicting average annual changes, the unitemporal RF model explained more variance than the bitemporal model (R
2 = 0.40 versus R
2 = 0.26), representing approximately a 54% increase in explained variance in the Dataset A (
) test data. This difference may be related to the timing of satellite acquisitions relative to the field measurement interval, as remotely sensed and field data are rarely temporally coincident, and forests may change between acquisition dates, which can influence the correspondence between spectral change signals and field-observed growth [
51,
52]. Fayad et al. [
34] noted that, although there were minor temporal and spatial mismatches between field plots and remote sensing data, such discrepancies are unlikely to bias large-scale regression estimates because field measurements capture broad environmental trends in AGB rather than short-term forest dynamics. Notably, several studies show that, despite temporal mismatches, considerable accuracy can still be achieved [
15,
27,
34]. Interestingly, among the results for annual change prediction, change prediction, and GSV estimation at
(
Figure 4,
Figure 5 and
Figure 6), consistent patterns emerged. For both the annual and interval GSV change, the unitemporal RF model demonstrated greater performance than MAACV and the bitemporal RF configuration, which showed comparable error levels. In contrast, for GSV estimation at
, the two RF models performed similarly, and MAACV achieved the strongest overall performance. Unitemporal Sentinel-2 metrics performed best for predicting GSV change, whereas the three approaches showed similar performance for predicting GSV at
. Puliti et al. [
27] assessed the aboveground biomass change using NFI data with Sentinel-2 and Landsat metrics in a boreal forest in south-eastern Norway. In their study, model-assisted estimation using bitemporal Sentinel-2 data produced the most precise estimate with a standard error (SE) = 1.7 Gg when compared to the unitemporal approach, which produced SE = 1.8 Gg. They found that using bitemporal data resulted in only a slight increase in precision compared to unitemporal data; however, they concluded that bitemporal data are the most precise overall and noted that ΔAGB can also be estimated when remotely sensed data are available at the end of the monitoring period. Similar results were also reported by McRoberts et al. [
6]. In their study, using indirect and direct estimation of forest biomass change based on forest inventory and airborne laser scanning data, they discovered that the use of the ALS auxiliary information greatly increased the precision of change estimates, regardless of whether indirect or direct methods were used.
For the plot-level GSV change between
and
(
Figure 5), the unitemporal RF model demonstrated greater performance (R
2 = 0.82, RMSE = 10.87) than both MAACV (R
2 = 0.49, RMSE = 19.76) and the bitemporal RF configuration (R
2 = 0.40, RMSE = 19.65). The latter two approaches produced nearly identical RMSE values, although MAACV explained more variance modestly. These results demonstrate that Sentinel-2 metrics are effective for predicting GSV change. A linear model (Equation 4), which uses input from MAAVC and RS unitemporal and bitemporal approaches, was applied to estimate population- and plot-level GSV (m
3/ha) at
within the study window. This model combined the field-derived MAAVC of Equation (4) with field data from
. The results were then compared with the predictions obtained using RS-based approaches (
Figure 6). In the Dataset B analysis, the MAAVC model showed greater precision than the RS-based models, as indicated by its R
2 of 0.91 and RMSE of 45.11 m
3/ha, compared to the RS-unitemporal model with R
2 of 0.73 and RMSE of 83.79 m
3/ha, and bitemporal with R
2 of 0.72 and RMSE of 83.61 m
3/ha. These results indicate that although MAAVC provides the most precise predictions, Sentinel-2 auxiliary metrics can still be effectively used to update GSV at the plot level using the temporal approach. At the population level, the temporal model produced comparable mean GSV estimates at
; a linear mixed-effects model revealed a significant effect of the proposed method on the GSV population mean at
(F (3, 654) = 8.66,
p < 0.001). Post hoc comparisons showed that MAAVC and GSV were significantly less than uni- and bitemporal population-level estimates at
, with small effect sizes (
Table A3).
Various concerted research efforts have been geared towards estimating forest variable changes using repeated data or single data [
27,
53]. For example, Noordermeer et al. [
53] showed that bitemporal data acquired as part of repeated ALS-based forest inventories can be used to classify various changes in forest structure reliably. However, because change is derived from the difference between two error-prone values, it is inherently difficult to predict, and predictions of changes in a response variable are often less precise than predictions of the variable itself [
6]. According to FAO [
3], integrating uncertainty estimation requires further emphasis in NFI workflows.
Adherence to the IPCC Good Practice Guidance requires that estimates minimize bias and reduce uncertainty to the degree possible [
43]. Such uncertainty stems from the inherent variability associated with random sampling and limited sample sizes. These variations lead to sampling uncertainty, which can be estimated through repeated simulations, such as bootstrap resampling. However, McRoberts [
44] noted that a key concern with bootstrapping is the number of replications (B) required for
to approximate SE
closely enough, and, more importantly, the criterion used to determine B and therefore proposed a statistically robust stopping criterion for Monte Carlo iterative procedures based on the stabilization of the SE estimate. In this study, we adopted this criterion to ensure convergence of the uncertainty of the mean GSV estimate. We further enhanced the stopping rule by introducing an additional requirement that the convergence threshold be met in three consecutive checks prior to terminating the simulation (Equation (12)). Applying this rule increased the number of iterations required for convergence but provided more reliable variance estimates. Stabilization was achieved after 19,600 iterations (
Figure 7a) for the unitemporal and 19,800 iterations (
Figure 7b) for the bitemporal RF models, ensuring stable variance estimates (
Table 5). The Monte Carlo estimated mean for the unitemporal model was 255.09 m
3/ha (Total SE = 10.48 m
3/ha), while the bitemporal model produced a very similar estimated mean of 255.04 m
3/ha (Total SE = 10.40 m
3/ha). As shown in
Table 5, sampling variability accounts for a larger portion of the total standard error than model prediction uncertainty. This occurs even in cases where model prediction accuracy is not large, a trend that has been noted in previous studies [
54]. Melo et al. [
55] demonstrated, using a hybrid bootstrap framework, that sampling variability represents the main source of uncertainty in short-term predictions, exceeding the contribution of model-related variance.
Overall, the enhanced Monte Carlo stopping rule provided stable uncertainty estimates and showed that, even with a slight difference in deviation at the plot level, the temporal RS approach still produces population-level GSV estimates that are fully consistent with both the design-based reference mean and the MAAVC model.