Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring

Francini, Saverio; Cavalli, Alice; D’Amico, Giovanni; McRoberts, Ronald E.; Maesano, Mauro; Munafò, Michele; Scarascia Mugnozza, Giuseppe; Chirici, Gherardo

doi:10.3390/rs15061638

Open AccessArticle

Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring

by

Saverio Francini

^1,2,3

,

Alice Cavalli

^4,*

,

Giovanni D’Amico

^1,5

,

Ronald E. McRoberts

⁶,

Mauro Maesano

⁷

,

Michele Munafò

⁴

,

Giuseppe Scarascia Mugnozza

⁷

and

Gherardo Chirici

^1,2

¹

Department of Agricultural, Food and Forestry Systems, University of Florence, 50145 Firenze, Italy

²

Fondazione per il Futuro delle Città, 50133 Firenze, Italy

³

National Biodiversity Future Center (NBFC), 90133 Palermo, Italy

⁴

Italian Institute for Environmental Protection and Research (ISPRA), Via Vitaliano Brancati 48, 00144 Rome, Italy

⁵

CREA, Research Centre for Forestry and Wood, Viale Santa Margherita 80, 52100 Arezzo, Italy

⁶

Department of Forest Resources, University of Minnesota, Saint Paul, MN 55108, USA

⁷

Department of Innovation in Biology, Agri-Food and Forest Systems (DIBAF), University of Tuscia, Via San Camillo de Lellis SNC, 01100 Viterbo, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(6), 1638; https://doi.org/10.3390/rs15061638

Submission received: 13 February 2023 / Revised: 13 March 2023 / Accepted: 15 March 2023 / Published: 17 March 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

Afforestation is one of the most effective processes for removing carbon dioxide from the atmosphere and combating global warming. Landsat data and machine learning approaches can be used to map afforestation (i) indirectly, by constructing two maps of the same area over different periods and then predicting changes, or (ii) directly, by constructing a single map and analyzing observations of change in both the response and remotely sensed variables. Of crucial importance, no comprehensive comparisons of direct and indirect approaches for afforestation monitoring are known to have been conducted to date. Afforestation maps estimated through the analysis of remotely sensed data may serve as intermediate products for guiding the selection of samples and the production of statistics. In this and similar studies, a huge effort is dedicated to collecting validation data. In turn, those validation datasets have varying sampling intensities in different areas, which complicates their use for assessing the accuracies of new maps. As a result, the work done to collect data is often not sufficiently exploited, with some validation datasets being used just once. In this study, we addressed two main aims. First, we implemented a methodology to reuse validation data acquired via stratified sampling with strata constructed from remote sensing maps. Second, we used this method for acquiring data for comparing map accuracy estimates and the precision of estimates for direct and indirect approaches for country-wide mapping of afforestation that occurred in Italy between 1985 and 2019. To facilitate these comparisons, we used Landsat imagery, random forest classification, and Google Earth Engine. The herein-presented method produced different accuracy estimates with 95% confidence interval and for different map classes. Afforestation accuracies ranged between 53 ± 5.9% for the indirect map class inside the buffer—defined as a stratum within 120 m of the forest/non-forest mask boundaries—and 26 ± 3.4% for the direct map outside the buffer. The accuracy in non-afforestation map classes was much greater, ranging from 87 ± 1.9% for the indirect map inside the buffer to 99 ± 1.3% for the direct map outside the buffer. Additionally, overall accuracies (with 95% CI) were estimated with large precision for both direct and indirect maps (87 ± 1.3% and 89 ± 1.6%, respectively), confirming (i) the effectiveness of the method we introduced for reusing samples and (ii) the relevance of remotely sensed data and machine learning for monitoring afforestation.

Keywords:

remote sensing; landsat; google earth engine; random forests; machine learning; cloud computing; land cover; change detection; estimates; precision

1. Introduction

Forests are a critical component of the global terrestrial carbon cycle. During the G20 Riyadh Summit (November 2021), forest restoration activities—together with emissions reduction—were stated to be among the most effective strategies for climate change mitigation. One of the most effective strategies for removing carbon dioxide from the atmosphere and combating global warming is afforestation [1], a process that entails the conversion of other land uses to forests [2].

Remote sensing technology and related applications play a crucial role in processes for monitoring afforestation. Remote sensing is indeed a powerful technique for land monitoring, facilitating global scale analysis [3] using open-access imagery available at different spatial resolutions [4,5,6] and exploiting cloud computing platforms and big data processing [7]. In particular, the more than 40 years of surface reflectance data acquired by the Landsat satellite missions represent a key source of information that is widely used across the globe to monitor Earth’s surface phenomena and dynamics [8,9]. In addition, the new Google Earth Engine (GEE) cloud computing platform [3] provides unprecedented opportunities for processing this huge amount of data. GEE implements random forests (RF) [10], a currently popular supervised algorithm for the classification of Landsat and remotely sensed imagery. RF is a classification technique that uses a series of decision trees based on random resampling of training data. RF has a powerful predictive performance [11], is mostly unaffected by overfitting [12,13], and can accommodate non-normal responses and nonlinear relationships [14].

Afforestation processes can be monitored using Landsat data and RF, using either direct or indirect classification methods. The indirect method, sometimes characterized as post-classification, involves constructing two models of the relationship between the land cover response and the remotely sensed predictor variables, one for each of two time periods, followed by the calculation of the map unit-by-map unit differences in the model predictions, which then represent the change. The direct method involves the construction of just one model of the relationship between observations of change in both the response variable and the remotely sensed predictor variables between the two dates [15]. The direct method focuses on characterizing the spectral response to afforestation over the entire observation period. In general, the direct approach is considered more reliable and accurate [16,17,18]. However, this general assumption may be questionable because, although the indirect method produces two sets of errors, those errors are expected to be smaller than those produced by the direct method. Specifically, variables representing land cover at a given time (in this case, forest/non-forest classes) can generally be predicted more accurately than variables representing changes.

The direct method has been used with 500-m MODIS imagery to predict afforestation in central-eastern China [19] and Europe [20] and to monitor the effect of reforestation programs in Inner Mongolia [21]. Although these studies produced large accuracies, the coarse spatial resolution of MODIS imagery is not suitable for the fragmented and morphologically diverse forests typical of the Mediterranean Basin. This is particularly relevant in Italy, where, due to centuries of land management, afforestation processes are characterized by patches of very limited size in abandoned agriculture fields [22], thereby requiring finer spatial resolution data to produce accurate afforestation maps. Accordingly, the direct method in combination with images at a finer resolution than MODIS was recently implemented in Italy [23].

The indirect method has been used in the Guangdong Chinese Province with Landsat and PALSAR imagery at the regional level to construct annual forest masks, which are then compared to predict afforestation areas [24] and at the global level for assessing forest cover change [25]. However, these studies consider annual land cover maps produced through photointerpretation of fine-resolution imagery, thus requiring large efforts and costs. Consequently, the resulting masks are not frequently updated and, as a result, are usually not temporally aligned with the vegetation growth rate. In summary, afforestation assessment with both direct and indirect approaches has generally not been sufficiently investigated, and although of crucial importance [15,26], a comprehensive comparison is not known to have been conducted to date.

Afforestation maps, and more generally maps based on remotely sensed data, can be used as intermediate products for producing statistically rigorous statistics by assisting in the selection of sample units [27]. On the other hand, remote sensing-based maps can be used to increase the precision of sample-based statistical inferences by serving as the source of strata in support of stratified sampling and analysis [28,29]. Increasing the sizes of samples collected in strata where error rates are greater produces more precise estimates of afforestation areas and map accuracy. Francini et al. (2022b) [30] constructed a forest disturbance map of Italy for guiding the selection of 18,000 points that, following photointerpretation, facilitated a statistically rigorous estimation of the areas of forest disturbances that occurred in 2018 in Italy. Cavalli et al., (2023) [23] used an afforestation map to guide the selection of a sample of 4000 points that, following photointerpretation, supported stratified estimation of areas of Italian landscapes that changed from non-forest to forest between 1985 and 2019. Indeed, because mapped areas are expected to be different from the actual areas because of map classification errors [31] (Francini et al., 2021), it is fundamental to perform an accuracy assessment before executing additional analysis [32]. Olofsson et al., (2013) [27] described methods for reducing those errors and, using the stratified estimator, demonstrated how to avoid the potential measurement bias associated with pixel counting.

Huge efforts are often dedicated to collecting validation data for the selected sample units in these and similar studies. However, the selection of those sample units is done through stratification based on specific maps, with the result that within-strata sampling intensities are related to these specific maps. In turn, when the same within-strata samples are used with strata based on different maps, the within-strata sampling intensities are not constant, and, therefore, these samples cannot be used to estimate the accuracy of new maps. As a result, the huge amount of work done for collecting validation data is often not sufficiently exploited because the sample data cannot be re-used.

This study has two primary aims: (i) to demonstrate a methodology that can be exploited to re-use samples acquired through stratification based on remote sensing-based products, and (ii) to exploit this approach for calculating and comparing accuracy estimates of afforestation maps and the related precision of estimates obtained using direct and indirect methods. Herein we focused on country-wide mapping of afforestation that occurred in Italy between 1985 and 2019 using Landsat imagery, RF, and GEE.

2. Materials and Methods

2.1. Study Area and Forest Mask

The study area is the entire area of Italy, consisting of 301,338 km² (Figure 1). Italy is characterized by great morphological diversity, with 70% of the national area characterized by hills or mountains. Moreover, the geographical position of Italy is responsible for different climate types (alpine, continental, and Mediterranean), which contribute to the richness of vegetation and forest biodiversity.

To predict afforestation areas and to analyze afforestation separately inside and outside predicted forest land, a fine-resolution forest mask of Italy [33] was used. More specifically, the forest mask was used to increase the sampling intensity near forest/non-forest map boundaries, i.e., where most forest/non-forest classification errors are expected to occur. The binary forest/non-forest mask was constructed using the FAO (2001) definition of forest, which is defined as having a canopy cover greater than 10% and a minimum area of 0.5 ha. The accuracy of this forest mask was estimated as greater than 85% [13]. More details on this product are provided in D’Amico et al., (2021) [33].

2.2. Landsat Best Available Composite Imagery

In this study, Landsat images acquired for Italy and available in the GEE archive were used to analyze afforestation at the national level. GEE is a cloud platform for analyzing and processing geospatial data that, by exploiting the computational capacity of Google servers, facilitates the study and monitoring of environmental phenomena such as changes in land cover, including afforestation [3]. GEE enables the integration of numerous image collections and datasets that are ready to be processed [30].

In this study, Landsat images were pre-processed to orthorectify surface reflectance and brightness (the thermal infrared), atmospherically corrected, and masked for clouds. To construct the Italian annual composites, we used all six 30 m spatial resolution Landsat bands, three in the visible spectrum and three in the infrared spectrum (NIR and SWIRs bands), from Landsat 5, 7, and 8, from 1 June to 31 August from 1985 to 2019. The composites were constructed using the Best Available Pixel (BAP) procedure [34] (Figure 1), which selects the pixel reflectance values that best satisfy four criteria: (i) sensor score; (ii) target day score; (iii) distance to cloud/cloud shadow score; and (iv) opacity score. Sensor scores and target day are applied to the whole image, while the other two scores are applied to each pixel. The BAP procedure was recently implemented in GEE, with open access code (https://code.earthengine.google.com/?accept_repo=users/sfrancini/bap) (accessed on 14 March 2023), documentation on GitHub (https://github.com/saveriofrancini/bap) (accessed on 14 March 2023), and references for related scientific articles [35,36,37,38,39].

2.3. Training Dataset

The training dataset (Figure 1) was used to calibrate both the direct and indirect classification models using the RF algorithm (Section 3), with the BAP composites as predictors (Section 3.2). The training data were collected over the entirety of Italy for 1578 polygons and distributed as follows:

526 polygons (A) that experienced a change from non-forest to forest in the period 1985–2020,
526 polygons (B) in non-forest areas that did not change between 1985 and 2020
526 polygons (C) in forest areas that did not change between 1985 and 2020.

The training dataset was constructed using data from the Land Use Inventory of Italy (IUTI, Sallustio et al., 2016) and photointerpretation of Landsat imagery (1985–1988, 30 m spatial resolution), aerial imagery (1988–2012, 50 cm), and very fine-resolution images (2012–2020, 30 cm). Through the photointerpretation process, each polygon was classified as either afforestation or non-afforestation. The afforestation class was assigned to polygons that satisfied the forest definition at the end of the study period but not at the beginning of the period.

3. Methods

3.1. Direct and Indirect Afforestation Map Construction

The six bands of Landsat BAP composites were augmented with seven vegetation indices (VI) that served as additional predictor variables (Table 1): Normalized Difference Vegetation Index NDVI [40], Normalized Burnt Ratio NBR [41], Enhanced Vegetation Index EVI [42], and Tasseled Cap Brightness B, Wetness W, Greenness, G [43] and Angle A [44]. The six bands together with the seven VIs resulted in a dataset of 13 time series of 34 years, one time series for each band or VI.

For the direct method, we further calculated 18 temporal statistics for each of the 13-time series: (1) mean, (2) standard deviation, (3) Kendall correlation or tau [45], (4–14) deciles, (15) mean of the first five years of the study period and (16) mean of the last five years of the study period, (17) year in which the maximum value was registered and (18) year in which the minimum value was registered [30,38,46]. The result was 13 time series * 18 temporal statistics = 234 direct predictors.

For the indirect method, we used the means of the 13 time series predictors over the first five years (1985–1989, initial condition) and the last five years (2015–2019, final condition). As a result, 26 indirect predictors were obtained.

For constructing both the direct and the indirect maps, we used RF, a decision tree algorithm introduced by Breiman (2001) [10] and often used for the spatial prediction of forest response variables using remotely sensed predictor variables [33,47,48,49]. RF has powerful predictive performance [11] (Hawrylo et al., 2020), mostly does not overfit [12,33], and accommodates non-normal responses and nonlinear relationships [14]. RF also produces an estimate of each predictor variable’s importance in terms of the Mean Decrease of the Gini Index (MDG) which is directly proportional to the contribution of the variable to increasing the classification accuracy [50]. For more details on the RF model, we refer to Breiman (2001) [10] and Belgiu and Drăgu (2016) [12]. In this study, we used the GEE implementation of the RF model. The number of trees was set to 500, while for the remaining parameters, default values were used, for which we refer to the GEE documentation (https://developers.google.com/earth-engine/apidocs/ee-classifier-smilerandomforest) (accessed on 14 March 2023).

For the direct method, the RF algorithm was calibrated using the training data from polygons A, B, and C and the direct predictors and then applied over the entirety of Italy.

For the indirect method, training data for polygons B and C were used, while training data for polygons A were excluded because they represent non-stable areas. In this case, indirect predictors were used for constructing two different RF-based forest/non-forest maps, one depicting the initial condition and the other depicting the final condition. For the indirect map, pixels that were classified as non-forest in the initial map and as forest in the final map were finally classified as afforestation.

3.2. First and Second Phases of Validation Data Selection

The first and second phases acquired stratified random validation data, where the strata were derived from the classes of the afforestation map constructed using the direct method. These sample data are independent of the training data used to construct the two afforestation maps (Section 2.3 and Section 3.1). The validation data were selected to increase the precision of the accuracy estimates and to assess map accuracy in the different map classes. In particular, the greatest gains in overall precision can be achieved by concentrating sample units in strata for which classification error rates are expected to be greatest [28]. To this end, we used the forest mask (Section 3.1) to define a buffer of 120 m (4 Landsat pixels) on each side of the predicted forest/non-forest boundary in the forest mask. Combining this buffer and the direct afforestation map, four map classes were formed: (i) afforestation inside the forest buffer, (ii) non-afforestation inside the forest buffer, (iii) afforestation outside the forest buffer, and (iv) non-afforestation outside the forest buffer. These four map classes served as strata in support of a two-step stratified random sampling scheme whereby the sampling intensity was increased in strata where greater classification error rates were expected to occur, thereby increasing the precision of overall and class accuracy estimates. This two-step, stratified sampling scheme was exploited by Cavalli et al. (2023) [23] for estimating afforestation areas. Although herein we aim at estimating map accuracy, the statistical principles of the sampling scheme are the same as those detailed in Cavalli et al. (2023) [23] and summarized below.

In the first phase, an initial random sample of 2000 points was selected, for which within-strata sample sizes were proportional to expected within-strata classification error rates. As previously mentioned, this scheme serves to increase the precision of overall and class accuracy estimates. We expected the largest error rates in the map classes inside the forest buffer (map classes A and B, Table 2), where we indeed concentrated the largest part of the sample. Specifically, 660 points (33% of the total) were randomly selected in each of the strata corresponding to map classes A and B, while 340 points (17%) were randomly selected in each of the strata corresponding to the afforestation and non-afforestation map classes outside the forest buffer (map classes C and D, Table 2). These points were photointerpreted as described in Section 3.3 and then used to construct a confusion matrix for the afforestation/non-afforestation classification.

In the second phase, the sample was selected to further increase the precision of the estimates obtained in the first phase. A second sample of 2000 points was selected for which the within-strata sample sizes were proportional to the first-phase within-strata variances; strata (map classes) with greater first-phase variances were re-sampled with greater intensities, i.e., more sample points per unit area. The resulting second phase distribution was 194 points in the stratum corresponding to map class A, 835 points in the stratum corresponding to map class B, 75 points in the stratum corresponding to map class C, and 896 points in the stratum corresponding to map class D. The second phase sample was photointerpreted, and the results were merged with the results of the first photointerpreted sample, resulting in a total of 4000 sample points distributed as follows: 21% in the stratum corresponding to map class A, 29% in the stratum corresponding to map class B, 18% in the stratum corresponding to map class C, and 31% in the stratum corresponding to map class D.

3.3. Validation Data Adjustment Phase

To estimate the accuracies of the indirect maps, the second-phase sample of size 4000 cannot be used as it is. The probability of including the second-phase sample units in the indirect afforestation map classes is indeed unknown because the map class boundaries of the direct afforestation map do not align with the boundaries of the class boundaries of the indirect afforestation map. To overcome this issue, we augmented the second-phase sample by using an additional sample adjustment phase. The procedure detailed below can be exploited to reuse stratified random validation samples acquired using a map-based stratification for the accuracy assessment of new maps.

In the validation data adjustment phase, the second-phase sample points were integrated as follows: each of the direct and indirect maps included the same four map classes, although the four classes were not coincident because the two maps differed. The intersection of the four classes, one set for each map, produced a total of eight classes, hereafter characterized as the combination classes. The total number of combination classes is 8, not 16 because the forest buffer was the same for both maps.

We analyzed the eight combination classes obtained by intersecting the two afforestation/non-afforestation maps and the forest buffer (Table 2, Figure 2) and augmented their within-strata samples if there were fewer than 30 points. For this sample, we photointerpreted 21 additional points for a total of 4021 points.

3.4. Accuracy Assessment

The final validation dataset of 4021 photointerpreted points was used to estimate and compare the accuracies and their confidence intervals for the direct and indirect afforestation maps. The direct map per class accuracies, overall accuracy (OA), and 95% confidence intervals were estimated as shown in Table 3. The OA of the indirect map was estimated, as shown in Table 4, considering the eight combination classes. The per-class accuracies of the indirect map were then calculated by aggregating the accuracies of the eight combination classes (Table 5). The precision of the OA and per-class accuracy estimates was assessed by calculating the 95% confidence intervals (CI).

4. Results

4.1. Direct and Indirect Afforestation Maps

Figure 3 illustrates the result of the afforestation classifications using the direct and indirect methods. The first figures (from A1 to D1) show the aerial images in 1988, while the second figures (from A2 to D2) show the aerial images in 2020. Figure 3(A3–D3) contain the predicted afforestation maps, both direct and indirect; the two maps generally overlap for larger predicted afforestation areas. The afforestation map constructed with the direct method often predicted larger afforestation patches than those predicted using the indirect method.

4.2. Direct and Indirect Map Accuracy Comparisons

Table A1, Table A2 and Table A3 show the results of the calculations presented in Table 3, Table 4 and Table 5, respectively. More in detail, Table 3 and Table A1 are for OA and per-class accuracy and CI assessments for the direct map, Table 4 and Table A2 refer to the OA and CI assessment for the indirect map, and Table 5 and Table A3 refer to the per-class accuracy and CI assessments for the indirect map. All the accuracy and CI assessments are summarized in Figure 4.

For both maps, OA was greater than 85% with OA = 87 ± 1.3% (95% CI) for the direct map and OA = 89 ± 1.6% (95% CI) for the indirect map. Because the confidence intervals overlap, the two accuracy estimates are not statistically significantly different. Thus, the results of this study are contrary to the general opinion that the direct method produces greater accuracy. The estimate of the accuracy and the relative 95% CI limits of the afforestation inside the buffer was 49 ± 3.2% for the direct map and 53 ± 5.9% for the indirect map, while for the afforestation outside the buffer, the CI of the direct map accuracy estimate was 26 ± 3.4% and 27 ± 6.6% for the indirect map. Regarding non-afforestation class accuracy estimates, they were greater in the direct map than in the indirect map both outside the buffer (99 ± 1.3% versus 87% ±1.9%) and inside the buffer (95 ± 0.6% versus 97 ± 0.9%).

Most of the direct (93%) and indirect (88%) map errors (Table 5) occurred in the afforestation classes, which cover minor proportions of the total area in both the direct (19%) and indirect maps (7%). Thanks to the herein presented stratification procedure and multi-phase sampling, we can thus provide more insights on map accuracies and guide a potential user by informing them which map area estimates are more accurate and vice versa.

4.3. RF Variable Importance Ranking in Direct and Indirect Approach

The importance rankings of predictors used in the direct and indirect approaches are illustrated in Figure 5 and Figure 6, respectively. For the direct approach (Figure 5), the predictors were too many (234) to be shown in a figure, so we showed only the mean values of MDG for each band, index, and temporal statistic (Section 3.1). For the direct method, the band with the greatest importance ranking was swir2, the index with the greatest importance ranking was Tasselled Cap Wetness (TCW), and the temporal statistic with the greatest importance ranking was tau, whose importance was more than twice the ranking of the temporal statistic with the next greatest ranking (year min, or the year in which the band/index had the minimum value).

For the indirect method, the importance ranking was assessed independently, considering both the initial and final maps. In the initial map (1985–1989), the predictors with the greatest importance rankings were the green band among the spectral bands and the Enhanced Vegetation Index (EVI) for the indices, while for the final map (2015–2019), the bands with the greatest importance rankings were the green and red bands, and the index with the greatest importance ranking was the TCW.

5. Discussion

5.1. Contextualization of the Study

Monitoring afforestation and estimating its extension across large areas is essential for properly understanding the effects of climate change and for efficiently planning future actions globally and in EU member states [51].

In addition to international decisions, in Italy, national legislation on forestry and the forestry supply chain (legislative decree 34/2018) supports the development of forest mapping tools to support forest management activities. Many elements of the legislative decree 34/2018 are also part of the National Forest Strategy for 2030, another tool to guide forest policies, and which has the greatest priority in the forest planning activities in Italy.

5.2. Summary of the Issues We Addressed and How We Did So

In this study, we used Landsat imagery (USGS), RF [10], and GEE [3] to predict afforestation that occurred in Italy in the last four decades and to address two main aims. First, we demonstrated a methodology that can be exploited to reuse validation sample data acquired through remote sensing-based stratifications for different stratifications. Second, we implemented that methodology for comparing accuracy estimates and the precision of the estimates for both the direct and indirect methods.

5.3. Validation Ample Adjustment Method

Validation samples are indeed often selected using stratified sampling based on maps and remote sensing-based products. This procedure, unlike simple random sampling, permits the sampling intensity to be increased in areas where classification error rates are expected to be greater [30] and, consequently, to increase the precision of accuracy estimates [23] and, more generally, many kinds of estimates obtained using remote sensing-based maps [52]. Accordingly, in this study, 93% of direct map errors were concentrated in 19% of the area, and 88% of indirect map errors were in 7% of the map.

However, the inclusion probabilities of validation sample units selected using stratified random sampling are closely related to the map used as the basis for the stratification, which suggests that such validation samples cannot be used to assess the accuracies of different remote sensing-based products. Moreover, because the sampling intensity must be constant within strata, precise estimation of map accuracies usually requires a large validation sample dataset, which, in turn, could influence the time needed for performing the analysis. The direct consequence is that the huge amount of work done to collect validation data is often not sufficiently exploited by reusing the data. However, the time and cost of acquiring ground-truth data motivate using them in multiple studies. Several data-sharing systems have been developed for this purpose, such as data paper journals (e.g., Data in Brief) or dedicated repositories (e.g., Zenodo). Although some research has investigated the reuse of forest field data [53], the methodology for reusing sample data acquired via stratified sampling with strata derived from remote sensing-based products is lacking. Moreover, in this study, just two maps were compared, but keeping the number of additional sample points small can be crucial if the method must be used to compare several maps.

In this study, this issue was successfully addressed. Using the herein-shown validation sample adjustment phase (Section 3.3), we were able to reuse the initial sample of 4000 points, which was selected using stratified random sampling with strata based on the afforestation map constructed using the direct method. The photointerpretation of each of the 4000 points required about 1 min per point, for a total of about 67 h. By augmenting this sample, we were able to reuse this dataset by photointerpreting just 21 additional points, or only 0.5% of the 4000 sample points. The new adjusted validation dataset was then used to compare the estimated accuracies of the direct and indirect afforestation maps.

5.4. Map Accuracy

The per-class accuracy assessment is important for focusing future research on the classes that have smaller accuracies and developing effective methods to increase them. Additionally, providing insights on map accuracy for specific map regions helps in understanding where the map can be considered more reliable and where, instead, map unit predictions may need to be double-checked. For example, the qualitative accuracy assessment we performed showed that most of the commission errors were located in urban and agricultural areas, which were sometimes wrongly identified as afforestation, probably due to the phenological cycle of croplands that can be confused with afforestation. This means that remote sensing afforestation maps should be considered less reliable in agricultural regions, while very few errors are expected in the other land cover classes.

The quantitative accuracy assessment results confirmed the potential of Landsat data and the herein demonstrated method for mapping afforestation areas. The OAs (with 95% CI) of 87 ± 1.3% and 89 ± 1.6% for the direct and indirect maps, respectively, are contrary to the general opinion that the direct method produces greater accuracy. Indeed, while the direct approach is sometimes described as more reliable and accurate because it produces only one set of errors [16,17,18], the two sets of errors produced by the indirect approach tend to be smaller than those committed by the direct method. However, the per-class accuracy assessment showed that differences between accuracy estimates obtained using direct and indirect methods were not statistically significant. Direct and indirect maps had smaller accuracies in afforestation outside the forest buffer class, probably because outside the buffer both classification methods were more influenced by the spectral response of other land cover types such as agricultural areas, which can have a similar spectral response as afforestation. This confirms the qualitative assessment of the obtained maps. Of interest, those large proportions of errors were concentrated in very small proportions of areas (8% for the direct map and 3% for the indirect map).

5.5. Variable Importance Ranking

The RF variable importance analysis is an important guide to selecting the optimal input variables, limiting their number, and accelerating processing time. The analysis of the variables’ importance rankings for the direct approach (Figure 4) showed Kendall’s correlation (tau) as the greatest. The predictor variables that had the greatest importance rankings in the algorithm were Swir2 tau (MDG = 15.22), Swir1 tau (MDG = 14.79), TCW, and green band tau (MDG = 13.60).

For the indirect approach in the initial map, the green band and EVI had the greatest importance rankings (MDG > 100), while for the final map, the green, red, and swir2 bands and the TCW (MDG > 60) had the greatest importance rankings.

On the other hand, the implementation of additional auxiliary variables (e.g., slope, pedology, aspect), remotely sensed optical data (e.g., Sentinel-2, PlanetScope), and 3D data (e.g., lidar, GEDI) should be investigated in the future, since they may further improve accuracy.

6. Conclusions

In this study, we predicted afforestation that occurred in Italy in the last four decades by both direct and indirect methods, using Landsat BAP time series and machine learning implemented on GEE. Specifically, we demonstrated how the validation data selected for an afforestation map by stratified random sampling can be efficiently reused to evaluate the accuracy of a different map. This is a key aspect of making the most of the time and costs needed for acquiring validation data and solving the issue of stratified validation datasets being used just once. The stratification sampling allowed the majority of errors to be concentrated in a minor proportion of the map: 93% of direct map errors and 88% of indirect map errors (Table 5) occurred in the afforestation classes, which, as desired, cover very minor proportions of the total area in both the direct (19%) and indirect maps (7%).

Direct and indirect methods were compared in terms of accuracy estimates and the precision of the estimates, revealing that the indirect method tends to be as accurate as the direct method, contrary to the widespread perception that the direct method is generally considered more reliable and accurate. On the other hand, very similar results were obtained, and both direct and indirect methods produced large accuracy estimates. Specifically, direct and indirect OA accuracy estimates were 87 ± 1.3% and 89 ± 1.6%. Such small CIs indicate that the herein presented method provides very precise estimates of map accuracies and also for different map classes: the accuracy of afforestation was 49 ± 3.2% for the direct map and 53 ± 5.9% for the indirect map inside the buffer class, while accuracies of 26 ± 3.4% for the direct map and 27 ± 6.6% for the indirect map were reached outside the buffer class. The non-afforestation accuracy was 95 ± 0.6% for the direct map and 87 ± 1.9% for the indirect map inside the buffer class, 99 ± 1.3% for the direct map, and 97 ± 0.9% for the indirect map outside the buffer class.

The herein presented results contribute to a greater understanding of the assessment and mapping of afforestation, which is fundamental for monitoring climate change. Hence, it will be fundamental to continue to improve the analysis to guarantee high-quality information in support of decision-making.

Author Contributions

Conceptualization, S.F., A.C. and R.E.M.; methodology, S.F., A.C. and R.E.M.; formal analysis, S.F., A.C. and G.D.; writing—original draft preparation, S.F., A.C., G.D. and R.E.M.; writing—review and editing, S.F., A.C., G.D., M.M. (Mauro Maesano) and R.E.M.; supervision, G.C., M.M. (Michele Munafò) and G.S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the following projects: MULTIFOR “Multi-scale observations to predict Forest response to pollution and climate change” PRIN 2020 Research Project of National Relevance funded by the Italian Ministry of University and Research (prot. 2020E52THS); SUPERB “Systemic solutions for upscaling of urgent ecosystem restoration for forest related biodiversity and ecosystem services” H2020 project funded by the European Commission, number 101036849 call LC-GD-7-1-2020; EFINET “European Forest Information Network” funded by the European Forest Institute, Network Fund G-01-2021; PNRR, funded by the Italian Ministry of University and Research, Missione 4 Componente 2, “Dalla ricerca all’impresa”, Investimento 1.4, Project CN00000033; FORWARDS: the forestward observatory to secure resilience of european forests (Project 101084481).

Acknowledgments

The authors acknowledge the support of NBFC to University of Florence, Department of Agricultural, Food and Forestry Systems, funded by the Italian Ministry of University and Research, PNRR, Missione 4 Componente 2, “Dalla ricerca all’impresa”, Investimento 1.4, Project CN00000033.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Overall and per-class accuracies assessment for the direct map. Results of Table 2 calculations.

Map Class	Validation Data		Sum	$w_{j} {^{# 1}}^{}$	${\hat{p}}_{j}^{# 2}$	$w_{j} * {\hat{p}}_{j}$	$C I ({\hat{p}}_{j}) %$
	Afforestation	Non- Afforestation	Sum	$w_{j} {^{# 1}}^{}$	${\hat{p}}_{j}^{# 2}$	$w_{j} * {\hat{p}}_{j}$	$C I ({\hat{p}}_{j}) %$
Afforestation inside buffer	418	436	854	0.11	0.49	0.05	3.24
Afforestation outside buffer	191	544	735	0.08	0.26	0.02	3.42
Non-afforestation inside buffer	61	1130	1191	0.30	0.95	0.29	0.64
Non-afforestation outside buffer	16	1225	1241	0.52	0.99	0.51	1.28
Overall Accuracy						87%	1.33

^#1

w_{j}

is the proportion of the map in each class. ^#2

{\hat{p}}_{j}

is the class accuracy.

Table A2. Overall accuracy assessment for indirect map. Results of Table 3 calculations.

Maps Classes Combination		Validation Data		Sum	$w_{j}^{# 1}$	${\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	$C I ({\hat{p}}_{j}) %$
Indirect	Direct	Afforestation	Non- Afforestation	Sum	$w_{j}^{# 1}$	${\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	$C I ({\hat{p}}_{j}) %$
Afforestation inside buffer	Afforestation inside buffer	163	119	282	0.04	0.58	0.02	5.88
Afforestation inside buffer	Non-afforestation inside buffer	1	29	30	0.00	0.03	0.00	6.55
Afforestation outside buffer	Afforestation outside buffer	70	133	203	0.02	0.34	0.01	6.67
Afforestation outside buffer	Non-afforestation outside buffer	1	29	30	0.01	0.03	0.00	6.55
Non-afforestation inside buffer	Afforestation inside buffer	255	317	572	0.07	0.55	0.04	4.16
Non-afforestation inside buffer	Non-afforestation inside buffer	60	1101	1161	0.30	0.95	0.28	1.30
Non-afforestation outside buffer	Afforestation outside buffer	121	411	532	0.05	0.77	0.04	3.63
Non-afforestation outside buffer	Non-afforestation outside buffer	15	1196	1211	0.51	0.99	0.50	0.64
Overall accuracy							89%	1.62

^#1

w_{j}

is the proportion of the map in each class.

Table A3. Per-class accuracy assessment for the indirect map.

Maps Classes Combination		Validation Data		Sum	$^{# 1} a_{j}$	$^{# 2} {w_{j}}^{}$	${\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	Class Accuracy %	$C I ({\hat{p}}_{j}) %$
Indirect	Direct	Afforestation	Non- afforest ation
Afforestation inside buffer	Afforestation inside buffer	163	119	282	13,909,582	0.92	0.58	0.53		5.88
Afforestation inside buffer	Non-afforestation inside buffer	1	29	30	1,195,672	0.08	0.03	0.00		6.55
Afforestation inside buffer					15,105,254				^#3 53	5.93 ^#4
Afforestation outside buffer	Afforestation outside buffer	70	133	203	9,245,508	0.75	0.34	0.26		6.65
Afforestation outside buffer	Non-afforestation outside buffer	1	29	30	3,053,527	0.25	0.03	0.01		6.55
Afforestation outside buffer					12,299,035				^#3 27	6.63 ^#4
Non-afforestation inside buffer	Afforestation inside buffer	255	317	572	28,589,586	0.20	0.55	0.11		4.16
Non-afforestation inside buffer	Non-afforestation inside buffer	60	1101	1161	117,906,305	0.80	0.95	0.76		1.30
Non-afforestation inside buffer					146,495,891				^#3 87	1.87 ^#4
Non-afforestation outside buffer	Afforestation outside buffer	121	411	532	21,365,513	0.10	0.77	0.07		3.63
Non-afforestation outside buffer	Non-afforestation outside buffer	15	1196	1211	201,081,401	0.90	0.99	0.89		0.64
Non-afforestation outside buffer					222,446,914				^#3 97	0.94 ^#4

^#1

a_{j}

is the pixel number of each class, ^#2

w_{j}

is the proportion of the map in each map classes combination, ^#3

p

is the class accuracy,

\sum_{1}^{2} CI {({\hat{p}}_{j})}^{# 4}

is the 95% Confidence Interval for each class.

References

Intergovernamental Panel on Climate Change (IPCC). Climate Change 2021 The Physical Science Basis; IPCC: Nykoping, Sweden, 2021. [Google Scholar]
FAO. Global Forest Resources Assessment 2020—Guidelines and Specifications. Forest Resources Assessment; FAO: Rome, Italy, 2020; Volume 42. [Google Scholar]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Wulder, M.A.; Coops, N.C. Satellites: Make Earth observations open access. Nature 2014, 513, 30–31. [Google Scholar] [CrossRef]
Francini, S.; Chirici, G. A Sentinel-2 derived dataset of forest disturbances occurred in Italy between 2017 and 2020. Data Brief 2022, 42, 108297. [Google Scholar] [CrossRef] [PubMed]
Nabuurs, G.-J.; Harris, N.; Sheil, D.; Palahi, M.; Chirici, G.; Boissière, M.; Fay, C.; Reiche, J.; Valbuena, R. Glasgow forest declaration needs new modes of data ownership. Nat. Clim. Change 2022, 12, 415–417. [Google Scholar] [CrossRef]
Gomes, V.C.F.; Queiroz, G.R.; Ferreira, K.R. An Overview of Platforms for Big Earth Observation Data Management and Analysis. Remote Sens. 2020, 12, 1253. [Google Scholar] [CrossRef]
Wulder, M.A.; Loveland, T.R.; Roy, D.P.; Crawford, C.J.; Masek, J.G.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Belward, A.S.; Cohen, W.B.; et al. Current status of Landsat program, science, and applications. Remote Sens. Environ. 2019, 225, 127–147. [Google Scholar] [CrossRef]
Wulder, M.A.; Masek, J.G.; Cohen, W.B.; Loveland, T.R.; Woodcock, C.E. Opening the archive: How free data has enabled the science and monitoring promise of Landsat. Remote Sens. Environ. 2012, 122, 2–10. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hawryło, P.; Francini, S.; Chirici, G.; Giannetti, F.; Parkitna, K.; Krok, G.; Mitelsztedt, K.; Lisańczuk, M.; Stereńczak, K.; Ciesielski, M.; et al. The Use of Remotely Sensed Data and Polish NFI Plots for Prediction of Growing Stock Volume Using Different Predictive Methods. Remote Sens. 2020, 12, 3331. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Vangi, E.; D’Amico, G.; Francini, S.; Giannetti, F.; Lasserre, B.; Marchetti, M.; McRoberts, R.E.; Chirici, G. The Effect of Forest Mask Quality in the Wall-to-Wall Estimation of Growing Stock Volume. Remote Sens. 2021, 13, 1038. [Google Scholar] [CrossRef]
Hermosilla, T.; Bastyr, A.; Coops, N.C.; White, J.C.; Wulder, M.A. Mapping the presence and distribution of tree species in Canada’s forested ecosystems. Remote Sens. Environ. 2022, 282, 113276. [Google Scholar] [CrossRef]
McRoberts, R.E.; Næsset, E.; Gobakken, T.; Bollandsås, O.M. Indirect and direct estimation of forest biomass change using forest inventory and airborne laser scanning data. Remote Sens. Environ. 2015, 164, 36–42. [Google Scholar] [CrossRef]
Bollandsås, O.M.; Gregoire, T.G.; Næsset, E.; Øyen, B.-H. Detection of biomass change in a Norwegian mountain forest area using small footprint airborne laser scanner data. Stat. Methods Appl. 2013, 22, 113–129. [Google Scholar] [CrossRef]
Fuller, R.M.; Smith, G.M.; Devereux, B.J. The characterisation and measurement of land cover change through remote sensing: Problems in operational applications? Int. J. Appl. Earth Obs. Geoinf. 2003, 4, 243–253. [Google Scholar] [CrossRef]
Skowronski, N.S.; Clark, K.L.; Gallagher, M.; Birdsey, R.A.; Hom, J.L. Airborne laser scanner-assisted estimation of aboveground biomass change in a temperate oak–pine forest. Remote Sens. Environ. 2014, 151, 166–174. [Google Scholar] [CrossRef]
Qiu, B.; Zou, F.; Chen, C.; Tang, Z.; Zhong, J.; Yan, X. Automatic mapping afforestation, cropland reclamation and variations in cropping intensity in central east China during 2001–2016. Ecol. Indic. 2018, 91, 490–502. [Google Scholar] [CrossRef]
Ramírez-Cuesta, J.M.; Minacapilli, M.; Motisi, A.; Consoli, S.; Intrigliolo, D.S.; Vanella, D. Characterization of the main land processes occurring in Europe (2000–2018) through a MODIS NDVI seasonal parameter-based procedure. Sci. Total. Environ. 2021, 799, 149346. [Google Scholar] [CrossRef]
Yin, H.; Pflugmacher, D.; Li, A.; Li, Z.; Hostert, P. Land use and land cover change in Inner Mongolia—Understanding the effects of China’s re-vegetation programs. Remote Sens. Environ. 2018, 204, 918–930. [Google Scholar] [CrossRef]
Haller, A.; Bender, O. Among rewilding mountains: Grassland conservation and abandoned settlements in the Northern Apennines. Landsc. Res. 2018, 43, 1068–1084. [Google Scholar] [CrossRef]
Cavalli, A.; Francini, S.; McRoberts, R.E.; Falanga, V.; Congedo, L.; de Fioravante, P.; Maesano, M.; Munafò, M.; Chirici, G.; Mugnozza, G.S. Estimating Afforestation Area Using Landsat Time Series and Photointerpreted Datasets. Remote Sens. 2022, 15, 923. [Google Scholar] [CrossRef]
Shen, W.; Li, M.; Huang, C.; Tao, X.; Li, S.; Wei, A. Mapping Annual Forest Change Due to Afforestation in Guangdong Province of China Using Active and Passive Remote Sensing Data. Remote Sens. 2019, 11, 490. [Google Scholar] [CrossRef]
Townshend, J.R.; Masek, J.G.; Huang, C.; Vermote, E.F.; Gao, F.; Channan, S.; Sexton, J.O.; Feng, M.; Narasimhan, R.; Kim, D.; et al. Global characterization and monitoring of forest cover using Landsat data: Opportunities and challenges. Int. J. Digit. Earth 2012, 5, 373–397. [Google Scholar] [CrossRef]
McRoberts, R.E.; Bollandsås, O.M.; Næsset, E. Modeling and Estimating Change BT—Forestry Applications of Airborne Laser Scanning: Concepts and Case Studies; Maltamo, M., Næsset, E., Vauhkonen, J., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 2014; pp. 293–313. ISBN 978-94-017-8663-8. [Google Scholar]
Olofsson, P.; Foody, G.M.; Stehman, S.V.; Woodcock, C.E. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sens. Environ. 2013, 129, 122–131. [Google Scholar] [CrossRef]
Fattorini, L. Design-based methodological advances to support national forest inventories: A review of recent proposals. iForest—Biogeosci. For. 2014, 8, 6–11. [Google Scholar] [CrossRef]
Stehman, S.V. Practical Implications of Design-Based Sampling Inference for Thematic Map Accuracy Assessment. Remote Sens. Environ. 2000, 72, 35–45. [Google Scholar] [CrossRef]
Francini, S.; McRoberts, R.E.; D’Amico, G.; Coops, N.C.; Hermosilla, T.; White, J.C.; Wulder, M.A.; Marchetti, M.; Mugnozza, G.S.; Chirici, G. An open science and open data approach for the statistically robust estimation of forest disturbance areas. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102663. [Google Scholar] [CrossRef]
Francini, S.; D’Amico, G.; Mencucci, M.; Seri, G.; Gravano, E.; Chirici, G. Remote sensing and automatic procedures: Useful tools to monitor forest harvesting. For.—Riv. Selvic. Ecol. For. 2021, 18, 27–34. [Google Scholar] [CrossRef]
Stehman, S.V.; Czaplewski, R.L. Design and Analysis for Thematic Map Accuracy Assessment—An Application of Satellite Imagery. Remote Sens. Environ. 1998, 64, 331–344. [Google Scholar] [CrossRef]
D’Amico, G.; Vangi, E.; Francini, S.; Giannetti, F.; Nicolaci, A.; Travaglini, D.; Massai, L.; Giambastiani, Y.; Terranova, C.; Chirici, G. Are we ready for a National Forest Information System? State of the art of forest maps and airborne laser scanning data availability in Italy. iForest—Biogeosci. For. 2021, 14, 144–154. [Google Scholar] [CrossRef]
White, J.C.; Wulder, M.A.; Hobart, G.W.; Luther, J.E.; Hermosilla, T.; Griffiths, P.; Coops, N.C.; Hall, R.J.; Hostert, P.; Dyk, A.; et al. Pixel-Based Image Compositing for Large-Area Dense Time Series Applications and Science. Can. J. Remote Sens. 2014, 40, 192–212. [Google Scholar] [CrossRef]
Griffiths, P.; van der Linden, S.; Kuemmerle, T.; Hostert, P. Erratum: A Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2088–2101. [Google Scholar] [CrossRef]
Francini, S.; D’Amico, G.; Vangi, E.; Borghi, C.; Chirici, G. Integrating GEDI and Landsat: Spaceborne Lidar and Four Decades of Optical Imagery for the Analysis of Forest Disturbances and Biomass Changes in Italy. Sensors 2022, 22, 2015. [Google Scholar] [CrossRef] [PubMed]
White, J.C.; Wulder, M.A.; Hermosilla, T.; Coops, N.C.; Hobart, G.W. A nationwide annual characterization of 25 years of forest disturbance and recovery for Canada using Landsat time series. Remote Sens. Environ. 2017, 194, 303–321. [Google Scholar] [CrossRef]
Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C.; Hobart, G.W. An integrated Landsat time series protocol for change detection and generation of annual gap-free surface reflectance composites. Remote Sens. Environ. 2015, 158, 220–234. [Google Scholar] [CrossRef]
Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C.; Hobart, G.W. Regional detection, characterization, and attribution of annual forest change from 1984 to 2012 using Landsat-derived time-series metrics. Remote Sens. Environ. 2015, 170, 121–132. [Google Scholar] [CrossRef]
Jönsson, P.; Cai, Z.; Melaas, E.; Friedl, M.A.; Eklundh, L. A Method for Robust Estimation of Vegetation Seasonality from Landsat and Sentinel-2 Time Series Data. Remote Sens. 2018, 10, 635. [Google Scholar] [CrossRef]
Roy, D.P.; Boschetti, L.; Trigg, S.N. Remote Sensing of Fire Severity: Assessing the Performance of the Normalized Burn Ratio. IEEE Geosci. Remote Sens. Lett. 2006, 3, 112–116. [Google Scholar] [CrossRef]
Bajocco, S.; Ferrara, C.; Alivernini, A.; Bascietto, M.; Ricotta, C. Remotely-sensed phenology of Italian forests: Going beyond the species. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 314–321. [Google Scholar] [CrossRef]
Zhang, X.; Schaaf, C.B.; Friedl, M.A.; Strahler, A.H.; Gao, F.; Hodges, J.C.F. MODIS tasseled cap transformation and its utility. Int. Geosci. Remote Sens. Symp. 2002, 2, 1063–1065. [Google Scholar] [CrossRef]
Gómez, C.; White, J.C.; Wulder, M.A. Characterizing the state and processes of change in a dynamic forest environment using hierarchical spatio-temporal segmentation. Remote Sens. Environ. 2011, 115, 1665–1679. [Google Scholar] [CrossRef]
Knight, W.R. A Computer Method for Calculating Kendall’s Tau with Ungrouped Data. J. Am. Stat. Assoc. 1966, 61, 436–439. [Google Scholar] [CrossRef]
Parisi, F.; Francini, S.; Borghi, C.; Chirici, G. An open and georeferenced dataset of forest structural attributes and microhabitats in central and southern Apennines (Italy). Data Brief 2022, 43, 108445. [Google Scholar] [CrossRef] [PubMed]
Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
Wagner, J.E.; Stehman, S.V. Optimizing sample size allocation to strata for estimating area and map accuracy. Remote Sens. Environ. 2015, 168, 126–133. [Google Scholar] [CrossRef]
Marcelli, A.; Mattioli, W.; Puletti, N.; Chianucci, F.; Gianelle, D.; Grotti, M.; Chirici, G.; D’Amico, G.; Francini, S.; Travaglini, D.; et al. Large-scale two-phase estimation of wood production by poplar plantations exploiting Sentinel-2 data as auxiliary information. Silva Fenn. 2020, 54, 10247. [Google Scholar] [CrossRef]
Nicodemus, K.K. Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures. Brief. Bioinform. 2011, 12, 369–373. [Google Scholar] [CrossRef]
European Commission (EC). EU Forest Strategy for 2030 COM(2021) 572 Final; European Commission: Bruxelles, Belgium, 2021. [Google Scholar]
Francini, S.; McRoberts, R.E.; Giannetti, F.; Mencucci, M.; Marchetti, M.; Scarascia Mugnozza, G.; Chirici, G. Near-real time forest change detection using PlanetScope imagery. Eur. J. Remote Sens. 2020, 53, 233–244. [Google Scholar] [CrossRef]
De Lera Garrido, A.; Gobakken, T.; Ørka, H.O.; Næsset, E.; Bollandsås, O.M. Reuse of field data in ALS-assisted forest inventory. Silva Fenn. 2020, 54, 10272. [Google Scholar] [CrossRef]

Figure 1. Study area (Section 2.1), forest mask (Section 2.1), BAP (Section 2.2), and training data (Section 2.3).

Figure 2. Graphical representation of the eight combination classes shown in Table 2. “In” and “out” refer to “inside” and “outside” the forest buffer while “aff” and “non aff” refer, respectively, to “afforestation” and to “non-afforestation”.

Figure 3. Afforestation maps with direct and indirect methods and map afforestation areas in Italian provinces. (A1–D1,A2–D2) are the aerial images referred to 1988 and 2020, whilst images (A3–D3) are the aerial images of 2020 with the predicted maps. The areas classified as afforestation by both maps are shown in orange, while the areas classified as afforestation by direct and indirect maps are shown in yellow and red, respectively. Top-right, the province (NUT3) afforestation area according to the direct map.

Figure 4. Top: direct and indirect map overall accuracy and class accuracy estimates and confidence intervals comparisons. Bottom: a focus on confidence intervals. The pie charts represent the class area distributions in the direct and indirect maps.

Figure 5. Importance ranking of predictors for the direct approach. In the figure, the illustrated mean values of the Mean Decrease Gini Index for each band, index, and temporal statistic.

Figure 6. Temporal predictors importance ranking for the indirect approach. In the figure, the illustrated values of the Mean Decrease Gini Index for each band and index for the initial map (1985–1989, left) and for the final map (2015–2019, right).

Table 1. Predictor variables for the time series construction.

Index	Formula	Reference
Normalized Difference Vegetation Index	$N D V I = \frac{(N I R - R)}{(N I R + R)}$	[40]
Normalized Burnt Ratio	$N B R = \frac{(N I R - S W I R)}{(N I R + S W I R)}$	[41]
Enhanced Vegetation Index	$E V I = 2.5 \frac{(N I R - R E D)}{(N I R + 6 R E D - 7.5 B L U E) + 1}$	[42]
Brightness	0.3037 Blue + 0.2793 Green + 0.4743 Red + 0.5585 NIR + 0.5082 SWIRI + 0.1863 SWIRII	[43]
Wetness	0.1509 Blue + 0.1973 Green + 0.3279 Red + 0.3406 Near_Infrared − 0.7112 SWIR_1 − 0.4572 SWIR_2	[43]
Greeness	−0.2848 Blue − 0.2435 Green − 0.5436 Red + 0.7243 Near_Infrared + 0.0840 SWIR_1 − 0.1800 SWIR_2	[43]
Angle	arctan (Greeness/Brightness)	[44]

Table 2. Number of validation dataset sample units per combination class (highlighted in grey).

		Map Class	Combination Class	1	2	3	4	5	6	7	8
		Map Class	Sample Units	282	30	203	30	572	1161	532	1211
Indirect map	Afforestation	A	Inside forest buffer
	Afforestation	C	Outside forest buffer
	Non-afforestation	B	Inside forest buffer
	Non-afforestation	D	Outside forest buffer
Direct map	Afforestation	A	Inside forest buffer
	Afforestation	C	Outside forest buffer
	Non-afforestation	B	Inside forest buffer
	Non-afforestation	D	Outside forest buffer

Table 3. Accuracy assessment for direct map.

Map Class	Validation Data		Sum	$w_{j} {^{# 1}}^{}$	${\hat{p}}_{j}^{# 2}$	$w_{j} * {\hat{p}}_{j}$	$C I {({\hat{p}}_{j})}^{# 3}^{}$
Map Class	Afforestation	Non- Afforestation	Sum	$w_{j} {^{# 1}}^{}$	${\hat{p}}_{j}^{# 2}$	$w_{j} * {\hat{p}}_{j}$	$C I {({\hat{p}}_{j})}^{# 3}^{}$
Afforestation inside buffer	$n_{11}$	$n_{12}$	$n_{1} = n_{11} + n_{12}$	$w_{1}$	${\hat{p}}_{1} = \frac{n_{11}}{n_{1}}$	$w_{1} * {\hat{p}}_{1}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{1})}$
Afforestation outside buffer	$n_{21}$	$n_{22}$	$n_{2} = n_{21} + n_{22}$	$w_{2}$	${\hat{p}}_{2} = \frac{n_{21}}{n_{2}}$	$w_{2} * {\hat{p}}_{2}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{2})}$
Non-afforestation inside buffer	$n_{31}$	$n_{32}$	$n_{3} = n_{31} + n_{32}$	$w_{3}$	${\hat{p}}_{3} = \frac{n_{31}}{n_{3}}$	$w_{3} * {\hat{p}}_{3}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{3})}$
Non-afforestation outside buffer	$n_{41}$	$n_{42}$	$n_{4} = n_{41} + n_{42}$	$w_{4}$	${\hat{p}}_{4} = \frac{n_{41}}{n_{4}}$	$w_{4} * {\hat{p}}_{4}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{4})}$
			$n = \sum_{j}^{4} n_{j}$	$\sum_{j}^{4} w_{j} = 1$		$OA = \sum_{j}^{4} w_{j} * {\hat{p}}_{j}^{# 3}$	$CI {(OA)}^{# 4} = \sum_{j}^{4} w_{j} * {\hat{p}}_{j}$

^#1

w_{j}

is the proportion of the map in each class, ^#2

{\hat{p}}_{j}

is the class accuracy,

CI {({\hat{p}}_{j})}^{# 3}

is the 95% confidence interval for the class accuracy estimate, and

CI {(OA)}^{# 4}

is the confidence interval of the overall accuracy.

Table 4. Accuracy assessment for indirect map.

Combination classes		Validation Data		Sum	$w_{j} {^{# 1}}^{}$	$^{# 2} {\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	$C I {({\hat{p}}_{j})}^{# 3}$
Indirect	Direct	Afforestation	Non- Afforestation	Sum	$w_{j} {^{# 1}}^{}$	$^{# 2} {\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	$C I {({\hat{p}}_{j})}^{# 3}$
Afforestation inside buffer	Afforestation inside buffer	$n_{11}$	$n_{12}$	$n_{1} = n_{11} + n_{12}$	$w_{1}$	${\hat{p}}_{1} = \frac{n_{11}}{n_{1}}$	$w_{1} * {\hat{p}}_{1}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{1})}$
Afforestation inside buffer	Non-afforestation inside buffer	$n_{21}$	$n_{22}$	$n_{2} = n_{21} + n_{22}$	$w_{2}$	${\hat{p}}_{2} = \frac{n_{21}}{n_{2}}$	$w_{2} * {\hat{p}}_{2}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{2})}$
Afforestation outside buffer	Afforestation outside buffer	$n_{31}$	$n_{32}$	$n_{3} = n_{31} + n_{32}$	$w_{3}$	${\hat{p}}_{3} = \frac{n_{31}}{n_{3}}$	$w_{3} * {\hat{p}}_{3}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{3})}$
Afforestation outside buffer	Non-afforestation outside buffer	$n_{41}$	$n_{42}$	$n_{4} = n_{41} + n_{42}$	$w_{4}$	${\hat{p}}_{4} = \frac{n_{41}}{n_{4}}$	$w_{4} * {\hat{p}}_{4}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{4})}$
Non-afforestation inside buffer	Afforestation inside buffer	$n_{51}$	$n_{52}$	$n_{5} = n_{51} + n_{52}$	$w_{5}$	${\hat{p}}_{5} = \frac{n_{51}}{n_{5}}$	$w_{5} * {\hat{p}}_{5}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{5})}$
Non-afforestation inside buffer	Non-afforestation inside buffer	$n_{61}$	$n_{62}$	$n_{6} = n_{61} + n_{62}$	$w_{6}$	${\hat{p}}_{6} = \frac{n_{61}}{n_{6}}$	$w_{6} * {\hat{p}}_{6}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{6})}$
Non-afforestation outside buffer	Afforestation outside buffer	$n_{71}$	$n_{72}$	$n_{7} = n_{71} + n_{72}$	$w_{7}$	${\hat{p}}_{7} = \frac{n_{71}}{n_{7}}$	$w_{7} * {\hat{p}}_{7}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{7})}$
Non-afforestation outside buffer	Non-afforestation outside buffer	$n_{81}$	$n_{82}$	$n_{8} = n_{81} + n_{82}$	$w_{8}$	${\hat{p}}_{8} = \frac{n_{81}}{n_{8}}$	$w_{8} * {\hat{p}}_{8}$	$2 * \sqrt{\hat{V a r} ({\hat{p}}_{8})}$
				$n = \sum_{j}^{8} n_{j}$	$\sum_{j}^{8} w_{j} = 1$		$OA = \sum_{j}^{8} w_{j} * {\hat{p}}_{j}^{# 2}$	$CI {(OA)}^{# 4} = \sum_{j}^{8} w_{j} * CI ({\hat{p}}_{j})$

^#1

w_{j}

is the proportion of the map in each class, ^#2

{\hat{p}}_{j}

is the combination class accuracy,

CI {({\hat{p}}_{j})}^{# 3}

is the 95% confidence interval for the class accuracy estimate, and

CI {(OA)}^{# 4}

is the confidence interval of the overall accuracy.

Table 5. Per class accuracy assessment for indirect map.

Maps Classes Combination		Validation Data		Sum	$^{# 1} a_{j}$	$^{# 2} w_{j}$	${\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	Class Accuracy	$C I ({\hat{p}}_{j})$
Indirect	Direct	Afforestation	Non- Afforestation	Sum	$^{# 1} a_{j}$	$^{# 2} w_{j}$	${\hat{p}}_{j}$	$w_{j} * {\hat{p}}_{j}$	Class Accuracy	$C I ({\hat{p}}_{j})$
Afforestation inside buffer	Afforestation inside buffer	$n_{11}$	$n_{12}$	$n_{1} = n_{11} + n_{12}$	$a_{1}$	$w_{1} = \frac{a_{1}}{a}$	${\hat{p}}_{1} = \frac{n_{11}}{n_{1}}$	$w_{1} * {\hat{p}}_{1}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{1})}$
Afforestation inside buffer	Non-afforestation inside buffer	$n_{21}$	$n_{22}$	$n_{2} = n_{21} + n_{22}$	$a_{2}$	$w_{2} = \frac{a_{2}}{a}$	${\hat{p}}_{2} = \frac{n_{21}}{n_{2}}$	$w_{2} * {\hat{p}}_{2}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{2})}$
Afforestation inside buffer					$a = a_{1} + a_{2}$				^#3 $\hat{p} = \sum_{1}^{2} w_{j} * {\hat{p}}_{j}$	$\sum_{1}^{2} w_{j} * CI {({\hat{p}}_{j})}^{# 4}$
Afforestation outside buffer	Afforestation outside buffer	$n_{31}$	$n_{32}$	$n_{3} = n_{31} + n_{32}$	$a_{3}$	$w_{3} = \frac{a_{3}}{b}$	${\hat{p}}_{3} = \frac{n_{31}}{n_{3}}$	$w_{3} * {\hat{p}}_{3}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{3})}$
Afforestation outside buffer	Non-afforestation outside buffer	$n_{41}$	$n_{42}$	$n_{4} = n_{41} + n_{42}$	$a_{4}$	$w_{4} = \frac{a_{4}}{b}$	${\hat{p}}_{4} = \frac{n_{41}}{n_{4}}$	$w_{4} * {\hat{p}}_{4}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{4})}$
Afforestation outside buffer					$b = a_{3} + a_{4}$				^#3 $\hat{p} = \sum_{3}^{4} w_{j} * {\hat{p}}_{j}$	$\sum_{3}^{4} w_{j} * CI {({\hat{p}}_{j})}^{# 4}$
Non-afforestation inside buffer	Afforestation inside buffer	$n_{51}$	$n_{52}$	$n_{5} = n_{51} + n_{52}$	$a_{5}$	$w_{5} = \frac{a_{5}}{c}$	${\hat{p}}_{5} = \frac{n_{51}}{n_{5}}$	$w_{5} * {\hat{p}}_{5}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{5})}$
Non-afforestation inside buffer	Non-afforestation inside buffer	$n_{61}$	$n_{62}$	$n_{6} = n_{61} + n_{62}$	$a_{6}$	$w_{6} = \frac{a_{6}}{c}$	${\hat{p}}_{6} = \frac{n_{61}}{n_{6}}$	$w_{6} * {\hat{p}}_{6}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{6})}$
Non-afforestation inside buffer					$c = a_{5} + a_{6}$				^#3 $\hat{p} = \sum_{5}^{6} w_{j} * {\hat{p}}_{j}$	$\sum_{5}^{6} w_{j} * CI {({\hat{p}}_{j})}^{# 3}$
Non-afforestation outside buffer	Afforestation outside buffer	$n_{71}$	$n_{72}$	$n_{7} = n_{71} + n_{72}$	$a_{7}$	$w_{7} = \frac{a_{7}}{d}$	${\hat{p}}_{7} = \frac{n_{71}}{n_{7}}$	$w_{7} * {\hat{p}}_{7}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{7})}$
Non-afforestation outside buffer	Non-afforestation outside buffer	$n_{81}$	$n_{82}$	$n_{8} = n_{81} + n_{82}$	$a_{8}$	$w_{8} = \frac{a_{8}}{d}$	${\hat{p}}_{8} = \frac{n_{81}}{n_{8}}$	$w_{8} * {\hat{p}}_{8}$		$2 * \sqrt{\hat{V a r} ({\hat{p}}_{8})}$
Non-afforestation outside buffer					$d = a_{7} + a_{8}$				^#3 $\hat{p} = \sum_{7}^{8} w_{j} * {\hat{p}}_{j}$	$\sum_{8}^{7} w_{j} * CI {({\hat{p}}_{j})}^{# 4}$

^#1

a_{j}

is the pixel number of each class, ^#2

w_{j}

is the proportion of the map in each map classes combination, ^#3

p

is the class accuracy,

\sum_{1}^{2} CI {({\hat{p}}_{j})}^{# 4}

is the 95% confidence interval for each class.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Francini, S.; Cavalli, A.; D’Amico, G.; McRoberts, R.E.; Maesano, M.; Munafò, M.; Scarascia Mugnozza, G.; Chirici, G. Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring. Remote Sens. 2023, 15, 1638. https://doi.org/10.3390/rs15061638

AMA Style

Francini S, Cavalli A, D’Amico G, McRoberts RE, Maesano M, Munafò M, Scarascia Mugnozza G, Chirici G. Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring. Remote Sensing. 2023; 15(6):1638. https://doi.org/10.3390/rs15061638

Chicago/Turabian Style

Francini, Saverio, Alice Cavalli, Giovanni D’Amico, Ronald E. McRoberts, Mauro Maesano, Michele Munafò, Giuseppe Scarascia Mugnozza, and Gherardo Chirici. 2023. "Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring" Remote Sensing 15, no. 6: 1638. https://doi.org/10.3390/rs15061638

APA Style

Francini, S., Cavalli, A., D’Amico, G., McRoberts, R. E., Maesano, M., Munafò, M., Scarascia Mugnozza, G., & Chirici, G. (2023). Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring. Remote Sensing, 15(6), 1638. https://doi.org/10.3390/rs15061638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reusing Remote Sensing-Based Validation Data: Comparing Direct and Indirect Approaches for Afforestation Monitoring

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Forest Mask

2.2. Landsat Best Available Composite Imagery

2.3. Training Dataset

3. Methods

3.1. Direct and Indirect Afforestation Map Construction

3.2. First and Second Phases of Validation Data Selection

3.3. Validation Data Adjustment Phase

3.4. Accuracy Assessment

4. Results

4.1. Direct and Indirect Afforestation Maps

4.2. Direct and Indirect Map Accuracy Comparisons

4.3. RF Variable Importance Ranking in Direct and Indirect Approach

5. Discussion

5.1. Contextualization of the Study

5.2. Summary of the Issues We Addressed and How We Did So

5.3. Validation Ample Adjustment Method

5.4. Map Accuracy

5.5. Variable Importance Ranking

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI